Predicting Risks and Seizing Opportunities

Why Analyse Risks and Opportunities? 

A company’s eco­no­mic suc­cess reli­es on the decis­i­ons of its cus­to­mers. When cus­to­mers can­cel a sub­scrip­ti­on (churn) or deci­de for a purcha­se, it has a direct influence on the company’s reve­nue. With pre­dic­ti­ve methods based on data com­pa­nies can pre­dict decis­i­ons ear­ly on and under­stand root cau­ses.

When pre­dic­ting events with nega­ti­ve impact we are tal­king about risk pre­dic­tion. With the same metho­do­lo­gi­cal approach, we can also pre­dict events with posi­ti­ve effect on the company’s suc­cess.

Spe­ci­fic sce­na­ri­os for risk pre­dic­tion are:

  • Churn pre­dic­tion whe­re we want to crea­te an esti­ma­te about which cus­to­mer is likely to not come back or can­cel a sub­scrip­ti­on
  • Cre­dit risk scoring whe­re we want to esti­ma­te the pro­ba­bi­li­ty of a cre­dit defaul­ting

Spe­ci­fic sce­na­ri­os with posi­ti­ve impact are:

  • Win­back ana­ly­sis whe­re the likeli­hood of a cus­to­mer retur­ning after a churn event is esti­ma­ted
  • Cam­paign opti­miza­ti­on whe­re the con­ver­si­on of par­ti­ci­pan­ts in the cam­paign is to be esti­ma­ted

The­re are many dif­fe­rent sce­na­ri­os like the ones abo­ve. They all share simi­lar aspects:

  • The esti­ma­ti­on aims to pre­dict human decis­i­ons

The event of inte­rest is usual­ly rare and valuable (mea­ning bene­fi­ci­al or cos­t­ly)

Data Acquisition and Preparation 

At the foun­da­ti­on of any pre­dic­ti­ve pro­ject is the data acqui­si­ti­on and fol­lo­wing pre­pa­ra­ti­on. For the pro­ject to suc­ceed, the data of the cus­to­mers and their tran­sac­tions have to be com­bi­ned and brought into a shape that is sui­ta­ble for ana­ly­sis. Espe­ci­al­ly in ear­ly ana­ly­ti­cal pro­jects, this step can make up a sub­stan­ti­al part of the pro­ject.

When scan­ning through the available data, the most rele­vant fea­tures need to be sel­ec­ted. Inte­res­t­ing cus­to­mer pro­per­ties…

  • …enable to compa­re cus­to­mers.
  • …reflect the inte­rests of cus­to­mers by quan­ti­fy­ing accept­ed offers, sum­ma­ri­zing cus­to­mer com­plaints, or reac­tions to ear­lier cam­paigns.
  • …descri­be the reac­ti­ve­ness of cus­to­mers by the dura­ti­on of the rela­ti­onship, the time sin­ce the last cont­act or the num­ber of cont­acts.

During data acqui­si­ti­on it is also neces­sa­ry to eva­lua­te and exclude data from the ana­ly­sis: 

  • Poten­ti­al­ly dis­cri­mi­na­ting pro­per­ties
  • Ana­ly­ti­cal­ly irrele­vant pro­per­ties (like uni­que iden­ti­fiers)
  • Pro­per­ties whe­re con­sis­tent avai­la­bi­li­ty is not gua­ran­teed
  • Other pro­per­ties might be anony­mi­zed to redu­ce unneces­sa­ry risks

In the fol­lo­wing data pre­pa­ra­ti­on step, the data is cor­rec­ted so that it is bet­ter sui­ted for ana­ly­sis. Here it is often neces­sa­ry to redu­ce com­ple­xi­ty of the data (by, for exam­p­le, com­bi­ning pro­duct codes to fewer pro­duct groups) and to make the data com­pa­ra­ble (by, for exam­p­le, con­ver­ting time points to inter­vals).

Two fur­ther aspects must be con­side­red:

  1. All the­se acqui­si­ti­on and pre­pa­ra­ti­on steps need to be imple­men­ted as a pro­cess so that they can be repea­ted. This is neces­sa­ry becau­se new data will be coll­ec­ted in the future and must be pro­ces­sed accor­din­gly.
  2. The­re will be many modi­fi­ca­ti­ons to the­se steps. This is becau­se in the sub­se­quent ana­ly­ti­cal steps new issues will be unco­ver­ed and need to be fixed here and new­ly coll­ec­ted data will show new issues that need to be tack­led.

The event of inte­rest (purcha­se decis­i­on, churn, etc.) is most likely rare when com­pared to the non-event. This ana­ly­ti­cal issue can to some ext­ent be tack­led during data pre­pa­ra­ti­on (for exam­p­le through using stra­ti­fied samples), and also during the fol­lo­wing model­ling pha­se (through tuning the right para­me­ters or using case weights).

Modelling

After a via­ble ana­ly­ti­cal data set has been crea­ted a first pre­dic­ti­ve model can be crea­ted. This step needs to respect all regu­la­to­ry and orga­niza­tio­nal requi­re­ments. The­r­e­fo­re, regres­si­on models are a popu­lar model type:

  • They are available in every data science tool­kit
  • Can be inter­pre­ted easi­ly and thus avo­id crea­ti­on of black-box-models
  • A fit­ted model can be cal­cu­la­ted very effi­ci­ent­ly and is easy to deploy
  • Can be con­ver­ted into a so-cal­led score­card (rele­vant for cre­dit risk model­ling)

Despi­te the­se advan­ta­ges, regres­si­on models are limi­t­ed in terms of the com­ple­xi­ty in the data that they can hand­le. This makes more com­plex model types via­ble can­di­da­tes as well. Tho­se – ran­ked by incre­asing com­ple­xi­ty - are decis­i­on trees (like CART and CHAID), ensem­ble models (like Ran­dom Forests or Boos­ting), sup­port vec­tor machi­nes and arti­fi­ci­al neu­ral net­works.

The­se models come with less rest­ric­tions regar­ding data qua­li­ty and com­ple­xi­ty, but will requi­re deeper under­stan­ding, more com­pu­ting power and are in gene­ral more dif­fi­cult to ana­ly­se.

Evaluation of Model Performance 

When the model types have been sel­ec­ted, and model can­di­da­tes have been trai­ned, the­se can be eva­lua­ted.

  1. Com­pa­ri­son of model can­di­da­tes
  2. Assess­ment regar­ding the sce­na­rio

The com­pa­ri­son of mul­ti­ple model can­di­da­tes enables us to

  • Get a bench­mark for the expec­ted model per­for­mance
  • Allows to sel­ec­tion the most appro­pria­te model type for the given sce­na­rio
  • Esti­ma­ti­on of the per­for­mance penal­ty when a dif­fe­rent model must be sel­ec­ted due to exter­nal requi­re­ments

When eva­lua­ting a sin­gle model regar­ding the capa­bi­li­ty to per­form for the given sce­na­rio, the initi­al results might not be satis­fac­to­ry. The hit rate (pro­por­ti­on of cor­rect­ly pre­dic­ted obser­va­tions, espe­ci­al­ly for the event of inte­rest) of a given model will usual­ly be low.

Keep in mind though: even the best pre­dic­ti­ve model will not be able to “look insi­de peo­p­les’ minds”. Human decis­i­ons are influen­ced by a multi­tu­de of fac­tors, from which most will not be repre­sen­ted in the company’s data sources. (Loo­king at it from ano­ther per­spec­ti­ve: if you can achie­ve a high hit rate, you should check your data. It might very well be that the model has pro­ces­sed inputs that were a result of the out­co­me – and should hence not be part of the data­set)

Ins­tead of eva­lua­ting the hit rate, other eva­lua­ti­on tech­ni­ques (depen­ding on the sce­na­rio) should be appli­ed. For exam­p­le, in Win-Back ana­ly­sis and in cam­paign opti­miza­ti­on it is more com­mon to eva­lua­te the capa­bi­li­ty of the model to sort the cus­to­mers by pro­ba­bi­li­ty. This can be eva­lua­ted with gains or lift charts. The sor­ted data can then be used to allo­ca­te a given bud­get (for incen­ti­ves or cam­paign acti­vi­ties) in the most effi­ci­ent way.

Model Deployment and Monitoring 

After a suc­cessful deve­lo­p­ment pha­se, the models are moved into pro­duc­tion. This requi­res that the models are inte­gra­ted with the company’s sys­tems to pro­cess data and crea­te pre­dic­tions. The models can be exe­cu­ted on a sche­du­led basis or demand-based and the cal­cu­la­ted pre­dic­tions need to be stored accor­din­gly to pro­cess them fur­ther and/or make them acces­si­ble to the experts.

The inte­gra­ti­on of the models should include that they can be updated easi­ly so that regu­lar model impro­ve­ments are pos­si­ble. This can include appr­oval mecha­nisms (like four eyes prin­ci­ple) and ver­si­on manage­ment.

In use the models must be moni­to­red con­ti­nuous­ly and with each exe­cu­ti­on, the pre­dic­tion should be stored with a time stamp and the model ver­si­on to allow tracea­bi­li­ty. Through the moni­to­ring it is pos­si­ble to esti­ma­te when a model update is neces­sa­ry, gua­ran­te­e­ing relia­ble ope­ra­ti­on.

Next Steps

Risk pre­dic­tion is – as descri­bed initi­al­ly – an important topic for many com­pa­nies. Lucki­ly the­re exist a lot of dif­fe­rent solu­ti­ons, be it with com­mer­cial tools like Spot­fi­re, Alte­ryx Desi­gner or Sta­tis­ti­ca or with cus­tom solu­ti­ons deve­lo­ped in R or Python.

To deci­de about the best “How” com­pa­nies should take into con­side­ra­ti­on how far the auto­ma­ti­on and inte­gra­ti­on should go, what exper­ti­se level the data science team and con­su­mers have and how the insights should be pro­pa­ga­ted in the orga­niza­ti­on.

Our team will glad­ly assist you with the­se decis­i­ons!

Categories
Latest News
Your contact

If you have any ques­ti­ons about our pro­ducts or need advice, plea­se do not hesi­ta­te to cont­act us direct­ly.

Tel.: +49 40 22 85 900-0
E-mail: info@statsoft.de

Sasha Shiran­gi (Head of Sales)