Machine Learning in Statistica 

Machi­ne Lear­ning descri­bes a coll­ec­tion of algo­rith­ms that learn from data. This is useful to gain insights and to make pre­dic­tions. The­se algo­rith­ms can detect rela­ti­ons that were pre­vious­ly unknown to the users. Machi­ne Lear­ning con­ta­ins many dif­fe­rent algo­rith­ms, older and more recent ones, as well as com­plex and simp­le ones. 
Brief histo­ry les­son: Many of the­se algo­rith­ms were pre­vious­ly coll­ec­ted under the term Data Mining. “Data Mining” was a meta­phor, describ­ing the pro­cess of dig­ging in a “huge pile of data”, to find insights and “bring them to light”. Today the­se algo­rith­ms are part of data sci­ence and more spe­ci­fi­cal­ly of Machi­ne Lear­ning (ML). 
Sta­tis­ti­ca offers a huge varie­ty of Machi­ne Lear­ning algo­rith­ms, which are easy to app­ly to data, with mini­mal con­fi­gu­ra­ti­on and simp­le deploy­ment in the field. 

Various Model Types 
In Sta­tis­ti­ca you can find many dif­fe­rent model types, cove­ring various appli­ca­ti­ons:  

  • Tra­di­tio­nal and modern regres­si­on models with dif­fe­rent levels of com­ple­xi­ty and strong dia­gno­sti­cal func­tions. 
  • Decis­i­on trees and ensem­bles like CART, CHAID, Ran­dom Forest and Boos­ting Trees with high accu­ra­cy, robust­ness, and insightful visua­liza­ti­on of results. 
  • Arti­fi­ci­al neu­ral net­works and sup­port vec­tor machi­nes as algo­rith­ms sui­ted to sol­ve pro­blems of hig­hest com­ple­xi­ty. 
  • Addi­tio­nal model types like MARSpli­nes, k-Nea­rest Neigh­bors and Nai­ve Bayes and more 
  • Almost all model types are sui­ted for sol­ving clas­si­fi­ca­ti­on and regres­si­on tasks ali­ke. All can use con­ti­nuous as well as cate­go­ri­cal inputs direct­ly. 
  • Algo­rith­ms for unsu­per­vi­sed lear­ning (clus­ter ana­ly­sis, asso­cia­ti­on rules) are available as well. 

Simp­le Con­fi­gu­ra­ti­on and Use 
Using machi­ne lear­ning models in Sta­tis­ti­ca is sur­pri­sin­gly easy: 

  • The models can be con­nec­ted to the data sources and be con­fi­gu­red via point-and-click inter­faces. 
  • The default set­tings are sen­si­ble start­ing points and help you come to first insights imme­dia­te­ly.  
  • Sta­tis­ti­ca offers a sophisti­ca­ted docu­men­ta­ti­on for the algo­rith­ms and a thoughtful sel­ec­tion of para­me­ters to con­trol them. 
  • Dif­fe­rent model types and con­fi­gu­ra­ti­ons can be com­pared easi­ly in the workspace and makes iden­ti­fy­ing the cham­pi­on a bree­ze. 
  • Pre- and postpro­ces­sing of data and results can be ful­ly auto­ma­ted. 

Quick Deploy­ment 
When you have fit­ted a cer­tain model type accor­ding to your con­fi­gu­ra­ti­on, the resul­ting model can be easi­ly moved to pro­duc­tion in Sta­tis­ti­ca: 

  • The Rapid Deploy­ment Engi­ne can exe­cu­te models direct­ly on new data. 
  • Models can be sel­ec­ted via busi­ness rules (Rules Buil­der) for spe­ci­fic cases con­fi­gu­red by the orga­niza­ti­on. 
  • Pre­dic­tions can be cal­cu­la­ted by the Sta­tis­ti­ca Ser­ver. 
  • It is pos­si­ble to export models as code in various pro­gramming lan­guages (Java, C#, C++, etc.) so that they can be embedded in IoT devices. 

Model Manage­ment 
Manage­ment of pre­dic­ti­ve models can be accom­plished via Sta­tis­ti­ca Ser­ver. It allows for auto­ma­ti­on and a robust envi­ron­ment to store, pro­vi­de and exe­cu­te models. 

  • Models have access right allo­wing chan­ges by some users and con­sump­ti­on by other users. 
  • Models are ver­sio­ni­zed and can be gover­ned through an appr­oval pro­cess to ensu­re vali­di­ty. 
  • Models can be moved from stage to stage to accom­plish a (for exam­p­le) DTAP sce­na­rio (Design, Test, Accep­tance, Pro­duc­tion). 
  • Models can be updated auto­ma­ti­cal­ly, when a chal­len­ger model pro­ves to reach a high accu­ra­cy. 
  • Model are mana­ged inde­pendent­ly of the model type (Regres­si­on, Tree, etc.), making it pos­si­ble to exch­an­ge a model by a dif­fe­rent type of model wit­hout chan­ging the sur­roun­ding pro­cess. 

Model Exe­cu­ti­on 
Pre­dic­tions can be cal­cu­la­ted in dif­fe­rent ways in the Sta­tis­ti­ca plat­form: 

  • The data sci­en­tist can use the model in the ana­ly­ti­cal envi­ron­ment eit­her while crea­ting it or via using a model stored in the cen­tra­li­zed repo­si­to­ry local­ly. 
  • By batch wise exe­cu­ti­on on the ser­ver. It could for exam­p­le wri­te pre­dic­tions back to the data­ba­se on a regu­lar sche­du­le. 
  • Pre­dic­tions of a model can be made available via a SOAP web­ser­vice API. Through this a device or ser­vice could send data to the Sta­tis­ti­ca Ser­ver and recei­ve model based pre­dic­tions in return. 

Stat­Soft as Your Part­ner 
Stat­Soft is your relia­ble part­ner when it comes to the soft­ware Sta­tis­ti­ca. We assist you with the sel­ec­tion, con­fi­gu­ra­ti­on and instal­la­ti­on and will also pro­vi­de you with the nee­ded soft­ware- and metho­do­lo­gi­cal-know-how. In ana­ly­ti­cal pro­jects we can aid with con­sul­ting and exe­cu­ti­on. Tog­e­ther we can gene­ra­te more insights from your data and deli­ver a sus­tainable value. 

Categories
Latest News
Your contact

If you have any ques­ti­ons about our pro­ducts or need advice, plea­se do not hesi­ta­te to cont­act us direct­ly.

Tel.: +49 40 22 85 900-0
E-mail: info@statsoft.de

Sasha Shiran­gi (Head of Sales)