Statistical Learning Theory, Spring Semester 2018


Prof. Dr. J.M. Buhmann


Luca Corinzia
Viktor Wegmayr
Alina Dubatovka
Carlos Cotrini Jimenez
Djordje Miladinovic

General Information

The ETHZ Course Catalogue information can be found here.

The course covers advanced methods of statistical learning. The fundamentals of Machine Learning as presented in the course "Introduction to Machine Learning" are expanded and, in particular, the following topics are discussed:

  • Theory of estimators: How can we measure the quality of a statistical estimator? We already discussed bias and variance of estimators very briefly, but the interesting part is yet to come.
  • Variational methods and optimization: We consider optimization approaches for problems where the optimizer is a probability distribution. Concepts we will discuss in this context include:
    • Maximum Entropy
    • Information Bottleneck
    • Deterministic Annealing
  • Clustering: The problem of sorting data into groups without using training samples. This requires a definition of ``similarity'' between data points and adequate optimization procedures;
  • Model Selection: We have already discussed how to fit a model to a data set in ML I, which usually involved adjusting model parameters for a given type of model. Model selection refers to the question of how complex the chosen model should be. As we already know, simple and complex models both have advantages and drawbacks alike;
  • Statistical physics models: approaches for large systems approximate optimization, which originate in the statistical physics (free energy minimization applied to spin glasses and other models); sampling methods based on these models

Time and Place

Type Time Place
Lectures Mon 14-16 ML H 44
Exercises Mon 16-18 ML H 44

Piazza website


Lecture Recordings



Date Lecture/Tutorial Slides Exercise Series, Hometasks Reading for Tutorial class
Feb 19 Lecture [pdf] Exercise [pdf] Solution [pdf] Calculus recap [pdf] LLE [pdf]
Feb 26 Lecture [pdf] Exercise [pdf] Solution [pdf] Discrete Prob. Theory [pdf] General Prob. Theory [pdf] Exercises on Gaussians [pdf]
Mar 5 Lecture [pdf] Extra [pdf] Exercise [pdf] Solution [pdf] Info Theory I [pdf] Info Theory II [pdf] Info Theory III [pdf] Solutions [pdf] MaxEntropy inference [pdf]
Mar 12 Lecture [pdf] Exercise [pdf] Solution [pdf] Solution part 2 [pdf] 4.5 MCMC
Mar 19 Lecture [pdf] Exercise [pdf] Notebook kl divergence [ipynb] Notes kl divergence [pdf] Notebook kmeans complx costs [ipynb] Solution [pdf] Kmeans with complexity cost
Mar 26 Exercise [pdf] Solution [pdf] EM algorithm [pdf] Exercises EM [pdf] Solutions [pdf] Probabilistic PCA [pdf] Mixture models and EM, Bishop [pdf]
Apr 9 Lecture [pdf]
Exercise [pdf] Solution [pdf] Tutorial Notes [pdf] PDC IBM
Apr 23 Lecture [pdf]
Exercise [pdf] Solution [pdf] CSE tutorial notes (2017) [pdf]
Apr 30 Exercise [pdf] PC with DA missing steps[pdf] Ncut Pairwise-data clustering with DA
May 7 Lecture [pdf]
Exercise [pdf] Solution [pdf] Superparamagnetic clustering of data [pdf]
May 14 Exercise [pdf] Solution [pdf] Model validation for 1D ising model

Past written Exams:

Exam 2016
Exam 2017


Projects are small coding exercises that concern the implementation of an algorithm taught in the lecture/exercise class. The final grade for the lecture is max(exam, 0.7 exam + 0.3 project).

Students who wish to get the advantage of the project bonus need to submit the filled notebook. There will be eight coding ecercises, and each one of them will be graded either as good, normal or not accepted/not submitted.

With no submitted/accepted notebook, the project grade is 4.0. Each normal notebook increases the project grade by 0.25, while a good one increases it by 0.5. This means eight normal notebooks result in a project grade of 6.0 and good notebooks help you to compensate for not submitted/accepted ones. The maximum project grade is 6.0.

Note that even if you are not interested in doing the projects, the written exam might include problems addressed in the coding exercises.
Project repository


  • Prelimiary course script (ver. May 2, 2017). This script has not been fully checked and thus comes without any guarantees, however, is good for getting oriented in the material.
  • Duda, Hart, Stork: Pattern Classification, Wiley Interscience, 2000.
  • Hastie, Tibshirani, Friedman: The Elements of Statistical Learning, Springer, 2001.
  • L. Devroye, L. Gyorfi, and G. Lugosi: A probabilistic theory of pattern recognition. Springer, New York, 1996

Web Acknowledgements

The web-page code is based (with modifications) on the one of the course on Machine Learning (Fall Semester 2013; Prof. A. Krause).