Statistical Learning Theory, Spring Semester 2017


Prof. Dr. J.M. Buhmann


Luca Corinzia
Viktor Wegmayr

General Information

The ETHZ Course Catalogue information can be found here.

The course covers advanced methods of statistical learning. The fundamentals of Machine Learning as presented in the course "Introduction to Machine Learning" are expanded and, in particular, the following topics are discussed:

  • Statistical Learning Theory: How can we measure the quality of a classifier? Can we give any guarantees for the prediction error?
  • Variational Methods and Optimization: We consider optimization approaches for problems where the optimizer is a probability distribution. Concepts we will discuss in this context include:
    • Maximum Entropy
    • Information Bottleneck
    • Deterministic Annealing
  • Clustering: The problem of sorting data into groups without using training samples. This requires a definition of ``similarity'' between data points and adequate optimization procedures;
  • Model Selection: We have already discussed how to fit a model to a data set in ML I, which usually involved adjusting model parameters for a given type of model. Model selection refers to the question of how complex the chosen model should be. As we already know, simple and complex models both have advantages and drawbacks alike;
  • Reinforcement Learning: The problem of learning through interaction with an environment which changes. To achieve optimal behavior, we have to base decisions not only on the current state of the environment, but also on how we expect it to develop in the future;:

Time and Place

Type Time Place
Lectures Mon 14-16 ML H 44
Exercises Mon 16-18 ML H 44

Student Forum

Link to Forum Please feel free to use it for any questions, comments to the TA's, for sharing ideas and discussing assignments, projects and anything related to SLT with other students.


Date Lecture/Tutorial Slides Exercise Series, Hometasks Reading for Tutorial class
Feb 20 Lecture [pdf] Motivation [pdf] Exercise [pdf] Solution [pdf] LLE
Feb 27 Lecture [pdf] Exercise [pdf] Solution [pdf] MaxEntropy inference
Mar 6 Exercise [pdf] Solution [pdf] 4.5 MCMC
Mar 13 Lecture [pdf] Tutorial Notes [pdf] Exercise [pdf] Solution [pdf] Kmeans with complexity cost
Mar 20 Exercise [pdf] Solution [pdf] Probabilistic PCA Mixture models and EM, Bishops
Mar 27 Lecture [pdf] Tutorial Notes [pdf] Exercise [pdf] Solution [pdf] SolutionBis [pdf] PDC IBM
Apr 3 Lecture [pdf]
Exercise [pdf] Solution [pdf] CSE
Apr 10 Lecture [pdf]
Exercise [pdf] Solution [pdf] SolutionBis [pdf] Ncut Pairwise-data clustering with DA
May 8 Lecture [pdf] BIC
May 15 Lecture [pdf] Exercise [pdf] Solution [pdf] DAEM ASC for GMM Hamming distance problem
May 29 Lecture [pdf] Information content of sort-algorithm EM for linear regression
Written Exam of 2016:
Exam 2016


Projects are small coding exercises that concern the implementation of an algorithm taught in the lecture/exercise class. The final grade for the lecture is max(exam, 0.7 exam + 0.3 project).

Students who wish to get the advantage of the project bonus need to submit reports about their coding excercises. There will be eight coding ecercises, and each report will be graded either as good, normal or not accepted/not submitted.

With no submitted/accepted reports, the project grade is 4.0. Each normal report increases the project grade by 0.25, while a good report increases it by 0.5. This means eight normal reports result in a project grade of 6.0 and good reports help you to compensate for not submitted/accepted reports. The maximum project grade is 6.0.

Note that even if you are not interested in doing the projects, the written exam might include problems addressed in the coding exercises.
Project repository Here you can find the coding exercise grades. Grades [pdf]


  • Prelimiary course script (ver. 04 Aug 2015) (draft TeX by Sergio Solorzano). This script has not been fully checked and thus comes without any guarantees, however, is good for getting oriented in the material.
  • Duda, Hart, Stork: Pattern Classification, Wiley Interscience, 2000.
  • Hastie, Tibshirani, Friedman: The Elements of Statistical Learning, Springer, 2001.
  • L. Devroye, L. Gyorfi, and G. Lugosi: A probabilistic theory of pattern recognition. Springer, New York, 1996

Web Acknowledgements

The web-page code is based (with modifications) on the one of the course on Machine Learning (Fall Semester 2013; Prof. A. Krause).