CSC 665: Online Learning and Multi-armed Bandits - Spring 2020

The structure is that the foundational topics are taught first and then move onto detailed/advanced topics with student presentations. The benefit is that we can ensure covering the core topics without being distracted towards the end of the semester, and students will learn the landscape of online learning early on. Also, for student presentations, we are forced to review the previously-learned topics, which might give us new perspectives now that we understand the landscape.

#   Topics Readings Homework
1: 01/15 Introduction to online learning and multi-armed bandits 1 slides
2: 01/22 Introduction to online learning and multi-armed bandits 2 FO_01
3: 01/27 Online gradient descent FO_02
4: 01/29 Subgradients and online-to-batch conversion FO_03
5: 02/03 Strong convexity, adaptivity to gradients and small loss FO_04,FO_05
6: 02/05 Lower bounds for online linear optimization FO_06
7: 02/10 Online mirror descent 1 FO_07
8: 02/12 Online mirror descent 2 FO_08, FO_09
9: 02/17 Follow-The-Regularized-Leader 1 FO_10
10: 02/19 Follow-The-Regularized-Leader 2 FO_11, FO_12 HW1
11: 02/24 Adversarial multi-armed bandits (EXP3) FO_18, FO_19
12: 02/26 Stochastic multi-armed bandits 1 (ETC, elimination) LS Sec. 6
13: 03/02 Stochastic multi-armed bandits 2 (UCB) LS Sec. 7
14: 03/04 Stochastic multi-armed bandits 3 (asymptotically optimal UCB) LS Sec. 8
15: 03/16 Lower bound on multi-armed bandits LS Sec. 16
16: 03/18 Contextual bandits LS Sec. 18
17: 03/23 Linear bandits LS Sec. 19
18: 03/25 Pure exploration Even-Dar+06, LS Sec. 33.3
19: 03/30 Off-policy evaluation Dudick+11 HW2
20: 04/01 (Xiaolan Gu) Optimistic Follow-The-Regularized-Leader Chiang+12
21: 04/06 (Pratik Bhandari) Algorithms for Portfolio Management based on the Newton Method Agarwal+06
22: 04/08 (Liyun Zeng) Imitation Learning
23: 04/13 (Tianchi Zhao) Thompson sampling Agrawal+13
24: 04/15 (Zhiwu Guo) Combinatorial bandits Kveton+14
25: 04/20 (Paulo da Silva Soares) Randomized Exploration in Generalized Linear Bandits Kveton+19
26: 04/22 (Reyan Ahmed) Gaussian process UCB (GP-UCB) Srinivas+10
27: 04/27 (Dharma KC) Monte-Carlo Tree Search by Best Arm Identification Kaufmann+17
28: 04/29 (Guoxin Huang) Linear pure exploration Soare+14
29: 05/04 (Chenghao Xiong) More contextual bandits Langford+08
30: 05/06 (Yanjun Pan) Bayesian multi-armed bandits Russo+14

Student presentations can be replace with some other papers to accommodate the student’s interest. For example:

  • Off-policy optimization Swaminathan+15 | | |

  • Tracking the best expert (Fixed Share algorithm) Herbster+98

  • (2 people) Coin-betting and parameter-free online convex optimization. Francesco+16(main paper), Francesco+17(implementation)

    • => for two people
  • Online Optimization : Competing with Dynamic Comparators Jadbabie+15

  • Boosting (AdaBoost algorithm) Freund+97

  • Almost optimal pure exploration Karnin+13

  • Design of experiments via regret minimization.

  • Online Control with Adversarial Disturbances

  • Pure exploration in adversarial worlds.

  • (2 people) Pure exploration for hyperparameter optimization (HyperBand) Li+17

  • (Pratik Bhandari) Algorithms for Portfolio Management based on the Newton Method Agarwal+06

  • Online transductive classification with relaxation framework.

  • On Equivalence of Martingale Tail Bounds and Deterministic Regret Inequalities

  • Online Control with Adversarial Disturbances

  • Doubly robust off-policy evaluation with shrinkage

  • The End of Optimism? An Asymptotic Analysis of Finite-Armed Linear Bandits

  • Smoothed Online Convex Optimization in High Dimensions via Online Balanced Descent

  • Strongly Adaptive Online Learning

  • (more to be added)