Machine Learning describes a collection of algorithms that learn from data. This is useful to gain insights and to make predictions. These algorithms can detect relations that were previously unknown to the users. Machine Learning contains many different algorithms, older and more recent ones, as well as complex and simple ones.
Brief history lesson: Many of these algorithms were previously collected under the term Data Mining. “Data Mining” was a metaphor, describing the process of digging in a “huge pile of data”, to find insights and “bring them to light”. Today these algorithms are part of data science and more specifically of Machine Learning (ML).
Statistica offers a huge variety of Machine Learning algorithms, which are easy to apply to data, with minimal configuration and simple deployment in the field.
Various Model Types
In Statistica you can find many different model types, covering various applications:
- Traditional and modern regression models with different levels of complexity and strong diagnostical functions.
- Decision trees and ensembles like CART, CHAID, Random Forest and Boosting Trees with high accuracy, robustness, and insightful visualization of results.
- Artificial neural networks and support vector machines as algorithms suited to solve problems of highest complexity.
- Additional model types like MARSplines, k-Nearest Neighbors and Naive Bayes and more
- Almost all model types are suited for solving classification and regression tasks alike. All can use continuous as well as categorical inputs directly.
- Algorithms for unsupervised learning (cluster analysis, association rules) are available as well.
Simple Configuration and Use
Using machine learning models in Statistica is surprisingly easy:
- The models can be connected to the data sources and be configured via point-and-click interfaces.
- The default settings are sensible starting points and help you come to first insights immediately.
- Statistica offers a sophisticated documentation for the algorithms and a thoughtful selection of parameters to control them.
- Different model types and configurations can be compared easily in the workspace and makes identifying the champion a breeze.
- Pre- and postprocessing of data and results can be fully automated.
Quick Deployment
When you have fitted a certain model type according to your configuration, the resulting model can be easily moved to production in Statistica:
- The Rapid Deployment Engine can execute models directly on new data.
- Models can be selected via business rules (Rules Builder) for specific cases configured by the organization.
- Predictions can be calculated by the Statistica Server.
- It is possible to export models as code in various programming languages (Java, C#, C++, etc.) so that they can be embedded in IoT devices.
Model Management
Management of predictive models can be accomplished via Statistica Server. It allows for automation and a robust environment to store, provide and execute models.
- Models have access right allowing changes by some users and consumption by other users.
- Models are versionized and can be governed through an approval process to ensure validity.
- Models can be moved from stage to stage to accomplish a (for example) DTAP scenario (Design, Test, Acceptance, Production).
- Models can be updated automatically, when a challenger model proves to reach a high accuracy.
- Model are managed independently of the model type (Regression, Tree, etc.), making it possible to exchange a model by a different type of model without changing the surrounding process.
Model Execution
Predictions can be calculated in different ways in the Statistica platform:
- The data scientist can use the model in the analytical environment either while creating it or via using a model stored in the centralized repository locally.
- By batch wise execution on the server. It could for example write predictions back to the database on a regular schedule.
- Predictions of a model can be made available via a SOAP webservice API. Through this a device or service could send data to the Statistica Server and receive model based predictions in return.
StatSoft as Your Partner
StatSoft is your reliable partner when it comes to the software Statistica. We assist you with the selection, configuration and installation and will also provide you with the needed software- and methodological-know-how. In analytical projects we can aid with consulting and execution. Together we can generate more insights from your data and deliver a sustainable value.