CSC 665: Online Learning and Multi-armed Bandits - Spring 2020

The structure is that the foundational topics are taught first and then move onto detailed/advanced topics with student presentations. The benefit is that we can ensure covering the core topics without being distracted towards the end of the semester, and students will learn the landscape of online learning early on. Also, for student presentations, we are forced to review the previously-learned topics, which might give us new perspectives now that we understand the landscape.

#	Topics	Readings	Homework
1: 01/15	Introduction to online learning and multi-armed bandits 1	slides
2: 01/22	Introduction to online learning and multi-armed bandits 2	FO_01
3: 01/27	Online gradient descent	FO_02
4: 01/29	Subgradients and online-to-batch conversion	FO_03
5: 02/03	Strong convexity, adaptivity to gradients and small loss	FO_04,FO_05
6: 02/05	Lower bounds for online linear optimization	FO_06
7: 02/10	Online mirror descent 1	FO_07
8: 02/12	Online mirror descent 2	FO_08, FO_09
9: 02/17	Follow-The-Regularized-Leader 1	FO_10
10: 02/19	Follow-The-Regularized-Leader 2	FO_11, FO_12	HW1
11: 02/24	Adversarial multi-armed bandits (EXP3)	FO_18, FO_19
12: 02/26	Stochastic multi-armed bandits 1 (ETC, elimination)	LS Sec. 6
13: 03/02	Stochastic multi-armed bandits 2 (UCB)	LS Sec. 7
14: 03/04	Stochastic multi-armed bandits 3 (asymptotically optimal UCB)	LS Sec. 8
15: 03/16	Lower bound on multi-armed bandits	LS Sec. 16
16: 03/18	Contextual bandits	LS Sec. 18
17: 03/23	Linear bandits	LS Sec. 19
18: 03/25	Pure exploration	Even-Dar+06, LS Sec. 33.3
19: 03/30	Off-policy evaluation	Dudick+11	HW2
20: 04/01	(Xiaolan Gu) Optimistic Follow-The-Regularized-Leader	Chiang+12
21: 04/06	(Pratik Bhandari) Algorithms for Portfolio Management based on the Newton Method	Agarwal+06
22: 04/08	(Liyun Zeng) Imitation Learning
23: 04/13	(Tianchi Zhao) Thompson sampling	Agrawal+13
24: 04/15	(Zhiwu Guo) Combinatorial bandits	Kveton+14
25: 04/20	(Reyan Ahmed) Gaussian process UCB (GP-UCB)	Srinivas+10
26: 04/22	(Paulo da Silva Soares) Randomized Exploration in Generalized Linear Bandits	Kveton+19
27: 04/27	(Dharma KC) Monte-Carlo Tree Search by Best Arm Identification	Kaufmann+17
28: 04/29	(Guoxin Huang) Linear pure exploration	Soare+14
29: 05/04	(Jesse Friedbaum) More contextual bandits	Langford+08
30: 05/06	(Yanjun Pan) Bayesian multi-armed bandits	Russo+14

Student presentations can be replace with some other papers to accommodate the student’s interest. For example:

Off-policy optimization Swaminathan+15 | | |
Tracking the best expert (Fixed Share algorithm) Herbster+98
(2 people) Coin-betting and parameter-free online convex optimization. Francesco+16(main paper), Francesco+17(implementation)
- => for two people
Online Optimization : Competing with Dynamic Comparators Jadbabie+15
Boosting (AdaBoost algorithm) Freund+97
Almost optimal pure exploration Karnin+13
Design of experiments via regret minimization.
Online Control with Adversarial Disturbances
Pure exploration in adversarial worlds.
(2 people) Pure exploration for hyperparameter optimization (HyperBand) Li+17
(Pratik Bhandari) Algorithms for Portfolio Management based on the Newton Method Agarwal+06
Online transductive classification with relaxation framework.
On Equivalence of Martingale Tail Bounds and Deterministic Regret Inequalities
Online Control with Adversarial Disturbances
Doubly robust off-policy evaluation with shrinkage
The End of Optimism? An Asymptotic Analysis of Finite-Armed Linear Bandits
Smoothed Online Convex Optimization in High Dimensions via Online Balanced Descent
Strongly Adaptive Online Learning
(more to be added)