Big Data Analytics - Methodological Training in Data Science

November, 2nd - 4th 2020 in Hamburg

Prof. Dr. Diego Kuonen, CStat PStat CSci, Statoo Consulting

Training course objectives:
There is no question that “big data” (i.e. the simple yet seemingly revolutionary belief that data are valuable) and “machine learning” (i.e. simply put, a field of advanced statistics designed for a world of “big data”) have hit business and industry, academia, engineering and government. The demand for skills in data science is unprecedented in sectors where value, competitiveness and efficiency are driven by data. Nowadays, this is amplified by the digital transformation and the related data revolution.

Data science technology and methodology has been applied to understand and to optimise various processes within business and industry, academia, engineering and government. It is widely believed that data science will have a profound impact on our society and that data science can bring real value. But how can data science contribute to achieving operational excellence? Is data science worth the trouble or is it “statistical déjà vu”?

This three-day training course will provide you with an overview of the potential and limitations of data science and with a thorough methodological, practical and, most importantly, software-vendor independent coverage of state-of-art data science techniques (e.g. from advanced statistics, machine learning and artificial intelligence). It highlights its applicability to accumulated data, and it will enable you to apply the presented methodology and its underlying philosophy to benchmark or your own data

Course contents:
This training will provide you with a thorough methodological and practical coverage of state-of-art data science techniques (e.g. from advanced statistics, machine learning and artificial intelligence) that identify unexpected patterns, structures, models or trends in data to make crucial decisions. This course will provide you with practical data science experience and throughout the course illustrations of the concepts and methods will be given. Moreover, you will be able to apply what you have learnt within a state-of-art data science workbench using benchmark or your own data.

Course goals:
The naïve and blind “black-box” use of data science software packages has its obvious pitfalls and can, and probably often does, lead to practically worthless results and misleading conclusions. Data science is easy to do badly. It is therefore important to understand enough of the characteristics of the underlying data science methodologies (both their advantages and their pitfalls) to be able to make an informed choice about which data science methods to use and also to be able to critically appraise their own results and those of others. In this course we will apply a “white-box” methodology, which emphasises an understanding of the algorithmic and statistical model structures underlying the “black-box” software.

Overview of Data Mining Methodology:

  • Introduction
  • Demystifying the “big data” hype
  • Demystifying the “Internet of things” hype
  • Applicability of data science
  • What is data science?
    • Is data science “statistical déjà vu”?
    • What distinguishes data science from statistics?
  • Demystifying the “data science” hype
  • Demystifying the “machine learning” hype
  • A process model for data science
  • Data and data preprocessing
    • Data sources
    • Why data preprocessing?
    • Major tasks in data preprocessing (e.g. data integration, data cleaning, data transformation, data reduction, data discretisation)
  • Data science techniques and tasks
  • Description and visualisation
  • Characterising multivariate data
  • Dissimilarity and distance measures
  • Unsupervised methods (“class discovery”)
    • Principal component analysis
    • Multidimensional scaling
    • Correspondence analysis
    • Cluster analysis (e.g. hierarchical algorithms, partitioning algorithms, using clustering in practise)
    • Kohonen's self-organising maps
    • Affinity grouping or association rules
    • A look forward
  • Supervised methods (“class prediction”)
    • Introduction (e.g. inductive bias and model complexity, score functions, internal validation, external validation)
    • Classification modelling (e.g. discriminant analysis, support vector machines, nearest neighbour classification, naïve Bayes classifier)
    • Regression modelling (e.g. multiple linear models, generalised linear models, nonparametric regression models, generalised additive models, multivariate adaptive regression splines)
    • Neural networks
    • Tree-based methods (e.g. CART, C4.5 and C5.0, CHAID)
    • Ensemble learning (e.g. bagging, subagging, random forests, arcing, boosting, stochastic gradient tree boosting)
    • The curse of dimensionality (e.g. feature extraction, feature subset selection: filters, wrappers, embedded methods)
    • Evaluating and comparing classifiers
    • Comparing regression models
    • A look forward
    • Comparison of chosen supervised learning methods
    • Recent lessons – what has been learnt?
  • Criteria for potential data science success
  • Conclusion
  • References and resources

The lecture will be given in German. During the course questions may be asked in English, French or German. Training documents will be all in English. All participants will receive a printed version of the documentation for personal use only.

November, 2nd - 4th 2020 in Hamburg
(Subject to change.)

Participants should be familiar with basic statistics, including multiple linear regression. A laptop with preinstalled TIBCO Statistica course license will be provided. We will provide you with the details before the course begins.

Course fees and discounts:
Public course fee              EUR 2.500 (in Hamburg)

Academic discount           30% discount on the public course fee. No further discounts apply.

Group discounts               Group discounts are possible if two or more people from the same organisation register together at the same time. For further information please do not hesitate to contact us. No further discounts apply.

Early bird discount           10% discount on the public course fee if you register by 6 weeks before the course date. No further discounts apply.

The prices include printed documentation for personal use, coffee breaks and lunch and exclude VAT. All participants will receive a confirmation of participation.

Duration: 3 days           Time: 9:00 - 17:00 h            Price: EUR 2.500 (in Hamburg) per participant, VAT excluded


back to overview