AutoML: Hyperparameter Optimization

The Hyperparameter Optimization is a component for AutoML and serves as an environment to optimize the hyper parameters of data mining models in TIBCO Data Science / Statistica.

It offers the identification of optimal...

  • Model parameters
  • Misclassification costs
  • Stratification strategies
  • Feature selection

It enables to gain deep insights into:

  • Sensible ranges for the parameters
  • Interdependencies of parameters
  • Validation of the current configuration
  • Influence of sampling
  • Influence / Predictability of individual cases / observations
  • Configuration of the models
  • Relation of different error/accuracy measures

It offers additionally:

  • Easy setup of experiments
  • Automated visualization and summarization of the results
  • Simple expandability

Optimal Binning

The Optimal Binning Node combines the levels of single variables to groups (bins) with similar properties in regard to a target variable.

Why use (supervised) Binning?

  • Easier interpretation of the data
  • Helpful for data selection (based on a target variable)
  • Reduction of data complexity
  • Handling of missing data
  • Preparation for using methods with linearity assumption
  • Preparation for use of methods that only support categorical data

The bin variables (that are created) are always categorical in nature, the input variables can be categorical or continuous/metric.

To create the bins a CHAID tree is fitted to the data. The tree then is converted into Statistica’s formulas, that are being used to create the bins.
The CHAID tree is controlled via one parameter alone (the p-value for merging).
Cases with missing data are being excluded during the tree fitting and creation of the bin formulas but each formula is extended by a rule assigning cases with missing data to a separate bin “Missing”. New levels (not previously observed in the data) are assigned to the bin “Unknown”.


  • Simple deployment of the binning solution
  • Works for classification and regression tasks and metric and categorical inputs
  • Can be automized

The primary output of the node are Statistica formulas. These will calculate the bin-variables when applied to data. The formulas can be easily copied to other places and used where necessary. It is also possible to output the transformed data directly.

Report Node

The StatSoft Report Node makes the creation of huge and pretty reports based on analytical results a simple automatic procedure.
The node extracts the workbook "Reporting Documents" from the workspace and adds the workbook items (spreadsheets, graphs, reports) to a word report.

This process requires a word template (conveniently stored in Statistica Enterprise or the file system) to start with. At minimum, this template can be an empty word document (.docx).
The interaction with Microsoft Word is handled via Microsoft's own OpenXML library hence resulting in standard compliant word documents.

The most important properties of the items and their placement can be configured via a configuration table. The definition can be provided for individual items and for groups of items via wildcards.
The node and its interface are simple, but do not let it fool you, it is a powerful tool to bring the automation of your reporting to a new level (of convenience and effectiveness).


Shelf-Life Estimation / Stability

With Shelf-Life Estimation you can evaluate stability studies with the touch of a button and generate archivable result documents.

It enables an automated calculation of shelf life and retest period according to Q1E specifications (ICH-compliant):

  • Determination of the runtime for one or more batches
  • Covariance analysis for pooling of batch data (poolability)
  • Automatic model determination by statistical test for same slope and same axis section of stability function of several batches
  • Switch between German and English language for input masks and result output
  • Automatic generation of a result report in pdf format

A numerical validation package for this add-on is available.

Shelf-Life Estimation is delivered as an add-on for TIBCO® Data Science / Statistica™.

Method Comparison

Method Comparison clearly combines current calculation and visualization standards of the measurement method comparison.

This allows devices and methods to be efficiently tested and validated with regard to their measurement behaviour or compared with external or more cost-effective solutions. With the Method Comparison add-on, you can determine sophisticated statistics with graphical and numerical results at the touch of a button. The following methods include Passing-Bablok regression, Deming regression (single, weighted, IRGDR), Bland-Altman plots, special evaluations, such as error grids and bias calculations.

The add-on is FDA compliant, and a numerical validation package can be supplied.

Method Comparison is delivered as an add-on for TIBCO® Data Science / Statistica™.