Statistical Learning Theory, Spring Semester 2019

Instructor

Assistants

Luca Corinzia
Viktor Wegmayr
Alina Dubatovka
Carlos Cotrini Jimenez
Paolo Penna
Fabian Laumer
Ivan Ovinnikov

General Information

The ETHZ Course Catalogue information can be found here.

The course covers advanced methods of statistical learning. The fundamentals of Machine Learning as presented in the course "Introduction to Machine Learning" and "Advanced Machine Learning" are expanded and, in particular, the following topics are discussed:

Theory of estimators: How can we measure the quality of a statistical estimator? We already discussed bias and variance of estimators very briefly, but the interesting part is yet to come.
Variational methods and optimization: We consider optimization approaches for problems where the optimizer is a probability distribution. Concepts we will discuss in this context include:
- Maximum Entropy
- Information Bottleneck
- Deterministic Annealing
Clustering: The problem of sorting data into groups without using training samples. This requires a definition of ``similarity'' between data points and adequate optimization procedures;
Model Selection: We have already discussed how to fit a model to a data set in ML I, which usually involved adjusting model parameters for a given type of model. Model selection refers to the question of how complex the chosen model should be. As we already know, simple and complex models both have advantages and drawbacks alike;
Statistical physics models: approaches for large systems approximate optimization, which originate in the statistical physics (free energy minimization applied to spin glasses and other models); sampling methods based on these models

Time and Place

Type	Time	Place
Lectures	Mon 14-16, Tue 9-10	HG	E 5
Exercises	Mon 16-18	HG	E 5

Piazza website

link

Material

< -->

Date	Lecture/Tutorial Slides	Exercise Series, Hometasks	Reading for Tutorial class
Feb 18	Lecture [pdf]	Exercise [pdf] Solution [pdf]	Calculus recap [pdf] LLE [pdf]
Feb 25	Lecture [pdf] Lecture annotated[pdf]	Exercise [pdf] Solution [pdf]	Probability theory recap [pdf]
Mar 4	Lecture [pdf] Lecture notes 5/3[pdf]	Exercise [pdf] Solution [pdf]	Info. theory recap [pdf] Info. theory recap, solutions [pdf]
Mar 11	Lecture [pdf]	Exercise [pdf] Solution [pdf] Solution part 2 [pdf]	4.5 MCMC 4.6 simulated anneling
Mar 18	Lecture [pdf] Lecture notes 19/03[pdf]	Exercise [pdf] Solution [pdf]	Tutorial slides [pdf] Tutorial exercises [pdf] Solutions [pdf]
Mar 25	Lecture notes [pdf]	Exercise [pdf] Solution [pdf]	Slides [pdf] Notes [pdf] GIF [gif] Exercises EM [pdf] Solutions EM [pdf]
Apr 1	Lecture [pdf] Lecture notes[pdf]	Exercise [pdf] Solution [pdf] Tutorial Notes [pdf]	PDC IBM
Apr 9	Lecture [pdf] Lecture notes[pdf]
Apr 15	Lecture [pdf]	Exercise [pdf] Solution part 1 [pdf] Solution part 2 [pdf]	Tutorial Notes [pdf]
Apr 29	Lecture [pdf]	Exercise [pdf] Solution [pdf]	Tutorial slides [pdf]
May 6	Model validation for 1D ising model	Exercise [pdf] Solution [pdf]	Tutorial Notes [pdf]
May 13	Lecture [pdf] Notes [pdf]		BIC An extensive comparative study of cluster validity indices Package clusterCrit for R [pdf]
May 20			Approximate sorting

Past written Exams:

Exam 2016
Exam 2017
Exam 2018

Projects

Projects are small coding exercises that concern the implementation of an algorithm taught in the lecture/exercise class.

There will be eight coding exercises, with a time span of two weeks per coding exercise. Each one of them will be graded as not passed or with a passing grade ranging from 4 to 6. The project part is passed if the student receives a passing grade in at least five coding exercises, and in that case the grade of the project part is the average of the five best coding exercises.

In order to be admitted to the exam the student has to pass the project part, and the final grade for the whole class is the weighted average 0.7 exam + 0.3 project. More details at the project repository.

Reading

Prelimiary course script (ver. Mar 2019). This script has not been fully checked and thus comes without any guarantees, however, is good for getting oriented in the material.
Duda, Hart, Stork: Pattern Classification, Wiley Interscience, 2000.
Hastie, Tibshirani, Friedman: The Elements of Statistical Learning, Springer, 2001.
L. Devroye, L. Gyorfi, and G. Lugosi: A probabilistic theory of pattern recognition. Springer, New York, 1996

Projects from the ISE group

Proposals

Web Acknowledgements

The web-page code is based (with modifications) on the one of the course on Machine Learning (Fall Semester 2013; Prof. A. Krause).