This article has been written by our valued partner Luigi Roggia of Apply Science (https://www.applyscience.it/). Luigi is an expert in the field of applied statistics and the corresponding software tools. It is a great honor to have him share his insights in our blog.
Discover why Statistica is a valid Minitab replacement
I’ve been a Minitab user since 2006 and, through the years, I used Minitab to teach Statistics to literally hundreds of people and to do a lot of consulting. This is just to say that I think I have quite a good expertise with Minitab and statistics. What follows in this article is strictly based on my personal direct experience and on the enthusiastic feedback we’re collecting from our customers.
Minitab is a renowned tool in the domain of statistical software. Yet, in 2020 I started to ask myself if a valid alternative might be available, because my duties as a senior consultant also include this type of activity: looking for the best technology for my customers.
So, I started to explore alternatives, covering both free and commercial solutions. I came across many different tools that I measured based on some general requirements that I consider of prime relevance. Considering the data related needs that analysts have in the modern age, an optimal statistical software should:
- Have a complete set of functionalities that cover the steps of Six Sigma methodology
- Be mature and stable to be adopted by a big company
- Be able to create and run automated pipelines of analyses
- Be able to import data from a wide range of sources, including all the popular ones
- Be able to export results in a variety of common file formats
- Be compliant with FDA guidelines, as described in 21 CFR Part 11
- Be able to run Python and/or R code if needed
- Include machine learning functionalities because in modern age that’s a natural extension/part of statistical learning
- Have a friendly user experience, especially when it comes to project and report management
- Support and facilitate the steps from exploration and model creation to production environment
In the end I found a solution that fulfills all the above requirements and represents for me a huge step forward. Not just as an alternative but a significative improvement.
The software I’m talking about is Statistica.
Moving to Statistica has been easier than expected. Once I understood the general logic, it was absolutely easy to learn and use.
Statistics and Six Sigma
From a practical point of view, Statistica has all the functionalities needed, so when you move to Statistica there’s nothing that you have to leave behind. If you are looking for any of the following functionalities, you will be more than satisfied:
- Descriptive statistics
- Statistical tests
- Analysis of variance
- Regression modelling
- Design of experiments
- Control charts
- Capability analysis
- Measure systems analysis
- Reliability
Going deeper into Statistics, if you are an advanced user and want to run a Six Sigma project, Statistica is very helpful, because it comes with a dedicated menu that leads you through the DMAIC stages:
As soon as you get confidence with the new environment, you realize that there’s much more and many details are designed to grant an easier life to the user.
Easy and quick exploration of results
For example, when you run an analysis, all the possible results and applications of the results are collected in a single dialog, structured with tabs that you can explore to get every possible insight. Or you can just look at the “Quick” tab and get the essential results. In any case you do not need to look for other tools in the menus or elsewhere:
Design of Experiments at a PRO level
In Statistica you can create any type of Design of Experiment and you can discover stunning improvement especially in the analysis of DoE. You immediately feel it as soon as you dive into the result tabs, discussed in previous paragraph and you then get a solid confirmation when, for example, you create a contour plot and discover that it can be represented in an interactive 3D plot, including the surface plot and your experimental data. This type of visualization is extremely useful for those who are learning how a regression model works to fit reality.
Obviously, there’s much more. In particular, Statistica includes a powerful simulator, called Model Profiler, that once you have used the included model optimizer, uses Monte Carlo simulations to predict the performance of your optimal configuration in production. The functionalities included in Statistica offer a lot of control and capabilities. This is even more evident if we consider the integration with R and Python, which opens endless opportunities for simulation, optimization and multi optimization.
Stop struggling with non-normal data
In Statistica you can handle non-normal data, but you do not need to do it manually because Statistica automatically creates control charts for non-normal data and automatically fits non gaussian distributions. So, when you create a control chart or run a capability analysis, your output will include analysis both for the normal and the non-normal case. Yes, that’s completely automatic and yes, there exist control charts for non-normal data.
Everything clear and under control with Workbooks
Statistica uses so-called Workbooks. They are logical structures to collect, organize and group data, analysis and reports. Everything is extremely clear and perfectly manageable. In other words, when you work on a complex project in Statistica you can be sure that you won’t go crazy trying to understand what happened and using what, because with as many Workbooks as you need inside your project, everything will have its perfect location and documentation according to your preferred logical schema.
Drag and drop in the Workspace
One of the best features I absolutely love in Statistica is the Workspace: it’s an amazingly simple drag and drop canvas where you can create and run your analytical pipelines. You literally drag and drop the functions from the menu, you use your mouse to connect the nodes and then press “Run”. Your pipeline will be run, and a detailed and structured report will be automatically created.
Our customers do really appreciate Workspaces, because we can create the analysis for them, the schema and logic can be visually understood and documented and they can use and reuse it every time data is updated, just clicking “Run”.
Important to mention: you can use workspaces as statistical computation engines in the background of Spotfire dashboards.
Exporting models and results to production environments
Talking about production environments, or business-oriented applications, when you create a model in Statistica, you can export the model and make it usable in production in a variety of ways, for example you can export it in coding languages or PMML meta language:
You can directly write results to a database as well, or use any of (or a combination of) R, Python, Scala, Visual Basic, C# to export results:
Data science and machine learning
For data scientists, Statistica delivers a wealth of features in one shared environment. Not only in classical and applied statistics but also for handling big datasets and more complex scenarios as well. Statistica has an extremely rich menu for machine learning:
And it also has a dedicated menu for big data. Functionalities, both for Statistics and machine learning, can be extended at any time via R and Python, also in Workspaces, that thus become a tool to make R and Python work together (in parallel or in pipeline).
Compliant with FDA guidelines
Statistica can be a stand-alone desktop application and include all the features discussed above, or it can have a client/server architecture. The latter case is named Statistica Server and it enormously enhances Statistica capabilities. We will not discuss all the features available with the Server version but will highlight that it forms the foundation to build a system compliant with FDA 21 CFR part 11. Based on my experience, Statistica is the most complete and compliant statistical software addressing the needs of the pharmaceutical industry, including data integrity.
Key takeaways
After many years as a Minitab professional user, moving to Statistica resulted in a significant technological improvement that is facilitating my job and making it faster. The transition was easy and every functionality I was using is available in Statistica.
Summing up in a list, my personal evaluation includes the following key points:
- Many more functionalities, including a complete set of machine learning capabilities
- Workspaces are a great tool to visually design analyses and run them effortlessly
- Projects can be managed in a much cleaner and more powerful way
- Analyzing design of experiments is a whole new experience, definitely richer
- Control charts and capability analysis no more suffer the “curse of normality”
- The ecosystem provided by TIBCO is incomparable: the opportunities for integration and expansion are infinite and enterprise quality/capabilities are granted
- This tool is both for statisticians and data scientists: finally a software that embraces the new professionals of the data driven era