The structure is that the foundational topics are taught first and then move onto detailed/advanced topics with student presentations. The benefit is that we can ensure covering the core topics without being distracted towards the end of the semester, and students will learn the landscape of online learning early on. Also, for student presentations, we are forced to review the previously-learned topics, which might give us new perspectives now that we understand the landscape.
# | Topics | Readings | Homework | ||
---|---|---|---|---|---|
1: 01/15 | Introduction to online learning and multi-armed bandits 1 | slides | |||
2: 01/22 | Introduction to online learning and multi-armed bandits 2 | FO_01 | |||
3: 01/27 | Online gradient descent | FO_02 | |||
4: 01/29 | Subgradients and online-to-batch conversion | FO_03 | |||
5: 02/03 | Strong convexity, adaptivity to gradients and small loss | FO_04,FO_05 | |||
6: 02/05 | Lower bounds for online linear optimization | FO_06 | |||
7: 02/10 | Online mirror descent 1 | FO_07 | |||
8: 02/12 | Online mirror descent 2 | FO_08, FO_09 | |||
9: 02/17 | Follow-The-Regularized-Leader 1 | FO_10 | |||
10: 02/19 | Follow-The-Regularized-Leader 2 | FO_11, FO_12 | HW1 | ||
11: 02/24 | Adversarial multi-armed bandits (EXP3) | FO_18, FO_19 | |||
12: 02/26 | Stochastic multi-armed bandits 1 (ETC, elimination) | LS Sec. 6 | |||
13: 03/02 | Stochastic multi-armed bandits 2 (UCB) | LS Sec. 7 | |||
14: 03/04 | Stochastic multi-armed bandits 3 (asymptotically optimal UCB) | LS Sec. 8 | |||
15: 03/16 | Lower bound on multi-armed bandits | LS Sec. 16 | |||
16: 03/18 | Contextual bandits | LS Sec. 18 | |||
17: 03/23 | Linear bandits | LS Sec. 19 | |||
18: 03/25 | Pure exploration | Even-Dar+06, LS Sec. 33.3 | |||
19: 03/30 | Off-policy evaluation | Dudick+11 | HW2 | ||
20: 04/01 | (Xiaolan Gu) Optimistic Follow-The-Regularized-Leader | Chiang+12 | |||
21: 04/06 | (Pratik Bhandari) Algorithms for Portfolio Management based on the Newton Method | Agarwal+06 | |||
22: 04/08 | (Liyun Zeng) Imitation Learning | ||||
23: 04/13 | (Tianchi Zhao) Thompson sampling | Agrawal+13 | |||
24: 04/15 | (Zhiwu Guo) Combinatorial bandits | Kveton+14 | |||
25: 04/20 | (Reyan Ahmed) Gaussian process UCB (GP-UCB) | Srinivas+10 | |||
26: 04/22 | (Paulo da Silva Soares) Randomized Exploration in Generalized Linear Bandits | Kveton+19 | |||
27: 04/27 | (Dharma KC) Monte-Carlo Tree Search by Best Arm Identification | Kaufmann+17 | |||
28: 04/29 | (Guoxin Huang) Linear pure exploration | Soare+14 | |||
29: 05/04 | (Jesse Friedbaum) More contextual bandits | Langford+08 | |||
30: 05/06 | (Yanjun Pan) Bayesian multi-armed bandits | Russo+14 | |||
Student presentations can be replace with some other papers to accommodate the student’s interest. For example:
Off-policy optimization Swaminathan+15 | | |
Tracking the best expert (Fixed Share algorithm) Herbster+98
(2 people) Coin-betting and parameter-free online convex optimization. Francesco+16(main paper), Francesco+17(implementation)
Online Optimization : Competing with Dynamic Comparators Jadbabie+15
Boosting (AdaBoost algorithm) Freund+97
Almost optimal pure exploration Karnin+13
Design of experiments via regret minimization.
Online Control with Adversarial Disturbances
Pure exploration in adversarial worlds.
(2 people) Pure exploration for hyperparameter optimization (HyperBand) Li+17
(Pratik Bhandari) Algorithms for Portfolio Management based on the Newton Method Agarwal+06
Online transductive classification with relaxation framework.
On Equivalence of Martingale Tail Bounds and Deterministic Regret Inequalities
Online Control with Adversarial Disturbances
Doubly robust off-policy evaluation with shrinkage
The End of Optimism? An Asymptotic Analysis of Finite-Armed Linear Bandits
Smoothed Online Convex Optimization in High Dimensions via Online Balanced Descent
Strongly Adaptive Online Learning
(more to be added)