Statistical Learning Theory, Spring Semester 2021
Instructors
Prof. Dr. Joachim M. BuhmannDr. Carlos Cotrini
Assistants
Dr. Paolo PennaAmi Beuret
Evgenii Bykovetc
Joao Carvalho
Alina Dubatovka
Mikhail Karasikov
Ivan Ovinnikov
General Information
The ETHZ Course Catalogue information can be found here.
The course covers advanced methods of statistical learning. The fundamentals of Machine Learning as presented in the course "Introduction to Machine Learning" and "Advanced Machine Learning" are expanded and, in particular, the following topics are discussed:
 Variational methods and optimization. We consider optimization approaches for problems where the optimizer is a probability distribution. We will discuss concepts like maximum entropy, information bottleneck, and deterministic annealing.
 Clustering. This is the problem of sorting data into groups without using training samples. We discuss alternative notions of "similarity" between data points and adequate optimization procedures.
 Model selection and validation. This refers to the question of how complex the chosen model should be. In particular, we present an information theoretic approach for model validation.
 Statistical physics models. We discuss approaches for approximately optimizing large systems, which originate in statistical physics (free energy minimization applied to spin glasses and other models). We also study sampling methods based on these models.
Time and Place
Type  Time  Place 

Lectures  Mon 1416, Tue 1718  Zoom link 
Exercises  Mon 1618  Zoom link 
Lectures and tutorials
All lectures and tutorials take place via Zoom. To access the Zoom link, use your NETHZ credentials. A recording will be made available in the webpage within 24h after the lecture or the tutorial.
Piazza website
This term we will be using Piazza for class discussion.
The system is highly catered to getting you help fast and efficiently from classmates and the teaching team.
Rather than emailing questions to the teaching team, I encourage you to post your questions on Piazza.
If you have any problems or feedback for the developers, email team@piazza.com.
Find our class page here. Use your NETHZ credentials to access this link.
Material
<Date  Lecture  Tutorial  Exercises  Reference  

Feb 22
Intro 
Lecture 1
Notes 1 Video 1 Video 2 
Tutorial 1
Video 1 
Exercise 1
Solution 1 

Mar 1
Max. entropy 
Lecture 2
Notes 2 Video 1 Video 2 
Tutorial 2
Video 2 
Exercise 2
Solution 2 
[Scr20] Ch 2.12.2  
Mar 8
Max. entropy Sampling 
Notes
Lecture 3.1 Lecture 3.2 Video 1 Video 2 
Tutorial 3
Video 
Exercise 3
Solution 3 
[Scr21] Ch 1, 2  
Mar 15
Deterministic annealing 
Lecture 4
Video 1 Video 2 
Tutorial 4
Video 
Exercise 4
Solution 4 
[Scr21] Ch 3  
Mar 22
Laplace method Histogram clustering 
Lecture 5.1
Lecture 5.2 Notes 5 Video 1 Video 2 
Tutorial 5 Video 
Exercise 5
Solution 5 
[Scr21] Ch 4, 5
[Scr20] Sec 2.7 

Mar 29
Param. distr. clustering Info. bottleneck 
See Slides from Lecture 5.2
Notes 6 Video Monday 1 Video Monday 2 Video Tuesday 
Tutorial 6 Video 
Exercise 6
Solution 6 
PDC paper
IBM paper [Scr20] Sec 2.72.8 

Apr 12
Pairwise clustering 
Lecture 7
Video 1 Video 2 
Tutorial 7 (annotated + older notes)
Video 
Exercise 7
Solution 7 (annotated) 
CSE paper
PCDA paper [Scr20] Ch 3 

Apr 20 
No lecture on Monday
Slides from Lecture 7 Video 
No tutorial  No exercise  [Scr20] Ch 4  
Apr 26
Mean field 
Lecture 8
Video 1 Video 2 
Tutorial 8
Video 
Exercise 8
Solution 8 

May 3
Model selection 
Lecture 8 recap
Lecture 9 Notes Video 1 Video 2 
Exercise 9
Solution 9 

May 10
Posterior agreement 
Lecture 10
Video 1 Video 2 
[Scr21] Ch 10  
May 17 
Lecture 11.1
Lecture 11.2 

May 24  No lecture on Monday  No tutorial  No exercise  
May 31
Conclusion 
No exercise 
Past written Exams:
Exam 2019
Draft Solution 2019
Exam 2020 (with solution)
Projects
Projects are coding exercises that concern the implementation of an algorithm taught in the lecture/exercise class.There will be seven coding exercises, with a time span of approximately two weeks per coding exercise. Each one of them will be graded as not passed or with a passing grade ranging from 4 to 6. The project part is passed if the student receives a passing grade in at least four coding exercises, and in that case the grade of the project part is the average of the four best coding exercises.
In order to be admitted to the exam the student has to pass the project part, and the final grade is 0.7 exam + 0.3 project, rounded to the nearest quarter of unit. More details and info are contained into the project repository (including the dates of the various projects and instructions on how to submit solutions).
Reading
 [Scr21] Course script. To be completed in the next 2 years.
 [Scr20] Previous script. It's no longer maintained, but it contains useful notes for some chapters not covered yet in [Scr21].
 Duda, Hart, Stork: Pattern Classification, Wiley Interscience, 2000.
 Hastie, Tibshirani, Friedman: The Elements of Statistical Learning, Springer, 2001.
 L. Devroye, L. Gyorfi, and G. Lugosi: A probabilistic theory of pattern recognition. Springer, New York, 1996
Web Acknowledgements
The webpage code is based (with modifications) on the one of the course on Machine Learning (Fall Semester 2013; Prof. A. Krause).