Kwang-Sung Jun
Assistant Professsor
Computer Science
Statistics GIDP, Applied Math GIDP (affiliated)
University of Arizona

kjun å† cs ∂ø† arizona ∂ø† edu
Gould-Simpson Rm 746, 1040 E. 4th St., Tucson, AZ 85721
Google scholar


Broadly, I work on interactive machine learning. I spend most of my time on on developing and analyzing adaptive decision-making/sampling methods including bandit algorithms and reinforcement learning. I tend to revolve around simple problems. Recently, I am also looking into Monte Carlo tree search methods and various applications including efficient matrix decomposition, geoscience (some blackbox/bayesian optimization involved), and material science problems. I also had some fun in the past with machine learning applied to psychology. I was previously a postdoc with Francesco Orabona (who I call the 'master of coin’) at Boston University. Before then, I spent 9 years at UW-Madison for a PhD degree with Xiaojin (Jerry) Zhu and a postdoc position at Wisconsin Institute for Discovery with Robert Nowak, Rebecca Willett, and Stephen Wright.


  • Sep’22: 2 papers accepted at NeurIPS. Congrats to my postdoc Kyoungseok Jang!

  • May’22: I gave a talk at the RL theory virtual seminars on Maillard sampling.

  • Jan’22: 1 paper accpeted at AAAI, 3 papers accepted to AISTATS.

  • May’21: 2 papers accepted at ICML.

  • May’21: 1 paper accepted at ISIT.

  • Jul’20: I gave a talk at the RL theory virtual seminars on structured bandits with the asymptotic optimality.

  • Jul’20: Chicheng and I have a new work on structured bandits that is accepted to ICML’20 workshop on theoretical foundations of reinforcement learning!

  • Nov’19: In Spring 2020, I will be teaching CSC 665 Online Learning and Multi-armed Bandits.

  • Oct’19: Chicheng Zhang, Jason Pacheco, and I are organizing a machine learning reading group at UA.

  • Oct’19: Our paper on kernel regression paper is accepted at NeurIPS’19.

  • Oct’19: Our paper on parameter-free SGD with local differential privacy is accepted to PriML’19 (worshop at NeurIPS).

multi-armed bandits

The multi-armed bandit problem is a state-less version of reinforcement learning (RL). Informally speaking, bandit algorithms learn to make better decisions over time in a feedback-loop. The decisions necessarily affect the feedback information, and the feedback data collected so far is no longer i.i.d.; most traditional learning guarantees do not apply. But why study an easier version of RL while RL seems to be solving all the problems these day?

  • Being a very simple problem, you can develop algorithms with precise theoretical guarantees and superior performance compared to RL algorithms applied to bandit problems. These guarantees include precise instance-dependent guarantees (as opposed to the worst-case or minimax guarantees) where some algorithms even achieve optimal rates with exact numerical constants!

  • Bayesian optimization's convergence guarantees are analyzed in the bandit setup.

  • Developments in bandits are being transferred to propose new RL algorithms with strong guarantees.

  • Monte Carlo tree search (MCTS) algorithm used in AlphaGo was originated from the paper “bandit based Monte-Carlo planning”, and MCTS made a revolutionary performance improvement in solving Go since its appearance. UCT was extended to PUCT and used in AlphaGo and numerous other successful RL applications from DeepMind and other applications (e.g., chemical synthesis).

  • Bandit algorithms were used to improve the computational complexity of k-medoids problem (similar to k-Means) dramatically; e.g., this paper.

Bandits are actively being studied in both theory and applications including deployable web service and hyperparameter optimization (check ray implementation). Also, the cartoon caption contest of New Yorker is using bandit algorithms to efficiently crowdsource caption evaluations (this article)!


  • Senior program committee / area chair: COLT, NeurIPS

  • Program committee: NeurIPS, ICML, AISTATS, AAAI