Statistical Learning Theory, Spring Semester 2019


Prof. Dr. J.M. Buhmann


Luca Corinzia
Viktor Wegmayr
Alina Dubatovka
Carlos Cotrini Jimenez
Paolo Penna
Fabian Laumer
Ivan Ovinnikov

General Information

The ETHZ Course Catalogue information can be found here.

The course covers advanced methods of statistical learning. The fundamentals of Machine Learning as presented in the course "Introduction to Machine Learning" and "Advanced Machine Learning" are expanded and, in particular, the following topics are discussed:

  • Theory of estimators: How can we measure the quality of a statistical estimator? We already discussed bias and variance of estimators very briefly, but the interesting part is yet to come.
  • Variational methods and optimization: We consider optimization approaches for problems where the optimizer is a probability distribution. Concepts we will discuss in this context include:
    • Maximum Entropy
    • Information Bottleneck
    • Deterministic Annealing
  • Clustering: The problem of sorting data into groups without using training samples. This requires a definition of ``similarity'' between data points and adequate optimization procedures;
  • Model Selection: We have already discussed how to fit a model to a data set in ML I, which usually involved adjusting model parameters for a given type of model. Model selection refers to the question of how complex the chosen model should be. As we already know, simple and complex models both have advantages and drawbacks alike;
  • Statistical physics models: approaches for large systems approximate optimization, which originate in the statistical physics (free energy minimization applied to spin glasses and other models); sampling methods based on these models

Time and Place

Type Time Place
Lectures Mon 14-16, Tue 9-10 HG E 5
Exercises Mon 16-18 HG E 5

Piazza website



< -->
Date Lecture/Tutorial Slides Exercise Series, Hometasks Reading for Tutorial class
Feb 18 Lecture [pdf] Exercise [pdf] Solution [pdf] Calculus recap [pdf] LLE [pdf]
Feb 25 Lecture [pdf] Lecture annotated[pdf] Exercise [pdf] Solution [pdf] Probability theory recap [pdf]
Mar 4 Lecture [pdf] Lecture notes 5/3[pdf] Exercise [pdf] Solution [pdf] Info. theory recap [pdf] Info. theory recap, solutions [pdf]
Mar 11 Lecture [pdf] Exercise [pdf] Solution [pdf] Solution part 2 [pdf] 4.5 MCMC 4.6 simulated anneling
Mar 18 Lecture [pdf] Lecture notes 19/03[pdf] Exercise [pdf] Solution [pdf] Tutorial slides [pdf] Tutorial exercises [pdf] Solutions [pdf]
Mar 25 Lecture notes [pdf] Exercise [pdf] Solution [pdf] Slides [pdf] Notes [pdf] GIF [gif] Exercises EM [pdf] Solutions EM [pdf]
Apr 1 Lecture [pdf]
Lecture notes[pdf]
Exercise [pdf] Tutorial Notes [pdf] PDC IBM
Apr 9 Lecture [pdf]
Lecture notes[pdf]
Apr 15 Lecture [pdf]
Exercise [pdf] Solution part 1 [pdf] Solution part 2 [pdf] Tutorial Notes [pdf]
Apr 29 Lecture [pdf]
Exercise [pdf] Solution [pdf] Tutorial slides [pdf]
May 6 Model validation for 1D ising model Exercise [pdf] Solution [pdf] Tutorial Notes [pdf]
May 13 Lecture [pdf] Notes [pdf] BIC
An extensive comparative study of cluster validity indices
Package clusterCrit for R [pdf]
May 20 Approximate sorting

Past written Exams:

Exam 2016
Exam 2017
Exam 2018


Projects are small coding exercises that concern the implementation of an algorithm taught in the lecture/exercise class.

There will be eight coding exercises, with a time span of two weeks per coding exercise. Each one of them will be graded as not passed or with a passing grade ranging from 4 to 6. The project part is passed if the student receives a passing grade in at least five coding exercises, and in that case the grade of the project part is the average of the five best coding exercises.

In order to be admitted to the exam the student has to pass the project part, and the final grade for the whole class is the weighted average 0.7 exam + 0.3 project. More details at the project repository.


  • Prelimiary course script (ver. Mar 2019). This script has not been fully checked and thus comes without any guarantees, however, is good for getting oriented in the material.
  • Duda, Hart, Stork: Pattern Classification, Wiley Interscience, 2000.
  • Hastie, Tibshirani, Friedman: The Elements of Statistical Learning, Springer, 2001.
  • L. Devroye, L. Gyorfi, and G. Lugosi: A probabilistic theory of pattern recognition. Springer, New York, 1996

Projects from the ISE group


Web Acknowledgements

The web-page code is based (with modifications) on the one of the course on Machine Learning (Fall Semester 2013; Prof. A. Krause).