Home

Kwang-Sung Jun
Assistant Professsor
Computer Science
Statistics GIDP, Applied Math GIDP (affiliated)
University of Arizona

kjun å† cs ∂ø† arizona ∂ø† edu
Gould-Simpson Rm 746, 1040 E. 4th St., Tucson, AZ 85721
Google scholar
CV

Intro

Broadly, I am interested in interactive machine learning (IML), which includes reinforcement learning, bandits, Bayesian optimization, and active learning (see below for some introduction of these). I also develop novel confidence bounds that often becomes key tools for constructing efficient IML algorithms. Recently, I have been looking into IML problems arising from GenAIs including LLMs.

Before coming to UA, I did a postdoc with Francesco Orabona (who I call the 'master of coin’) at Boston University. Before then, I spent 9 years at UW-Madison for a PhD degree with Xiaojin (Jerry) Zhu and a postdoc position at Wisconsin Institute for Discovery with Robert Nowak, Rebecca Willett, and Stephen Wright.

News

Jan’25: 2 papers accepted to AISTATS! (i) Linear extension of the MED algorithm (or Maillard sampling) that is randomized, off-policy-friendly, and nearly optimal. (ii) Estimating the value of the maximum mean among K arms (rather than estimating who achieves it) that can surpass the performance of the oracle who knows the best arm and reports its sample mean!
Nov’24: I gave a talk at UW-Madison SILO on 'Confidence Sequences via Online Learning’ (video link).
Oct’24: 2 papers accepted to NeurIPS: (i) led by Yao, bandit experiments with instrumental variables, so you can experiment when you can't experiment! (ii) led by Junghyun, unified confidence sets for generalized linear models and finally removing the norm dependence in logistic bandits with novel regret analysis!
May’24: 2 papers accepted to ICML: (i) adapting to the unknown noise level in linear bandits (ii) low-rank matrix recovery with better guarantees than the nuclear-norm regularization, how we can perform design of experiments for better subspace recovery, and how to solve low-rank bandits with them!
May’24: 1 paper accepted to COLT on PAC-Bayes bounds with a different divergence that is better than KL!
Feb’24: I will serve as an action editor for the journal Machine Learning.
Jan’24: 1 paper accepted to AISTATS on converting a regret bound into a tight confidence set. Congrats Junghyun!
Dec’23: 1 paper accepted to NeurIPS. Congrats Hao Qin!

Interactive machine learning

IML is an umbrella term for ML problems where the agent's decision is an action that affects which data/feedback s/he gets to receive as opposed to prediction that does not. For example, recommending a product is an action since we get click feedback for the recommended product, not for the unrecommended products (and we don't know what would have happened if we recommended another item). Forecasting tomorrow's weather is a prediction since the weather tomorrow is not affected by which forecast we have made. Basically, if you are getting feedback on GenAI models, then the right approach is to view the generated outputs as 'action’ and leverage tools from interactive machine learning!

Bandit problems

The multi-armed bandit problem is a state-less version of reinforcement learning (RL). Informally speaking, bandit algorithms learn to make better decisions over time in a feedback-loop. The decisions necessarily affect the feedback information, and the feedback data collected so far is no longer i.i.d.; most traditional learning guarantees do not apply. But why study an easier version of RL while RL seems to be solving all the problems these day?

Being a very simple problem, you can develop algorithms with precise theoretical guarantees and superior performance compared to RL algorithms applied to bandit problems. These guarantees include precise instance-dependent guarantees (as opposed to the worst-case or minimax guarantees) where some algorithms even achieve optimal rates with exact numerical constants!
Bayesian optimization's convergence guarantees are analyzed in the bandit setup.
Developments in bandits are being transferred to propose new RL algorithms with strong guarantees.
Monte Carlo tree search (MCTS) algorithm used in AlphaGo was originated from the paper “bandit based Monte-Carlo planning”, and MCTS made a revolutionary performance improvement in solving Go since its appearance. UCT was extended to PUCT and used in AlphaGo and numerous other successful RL applications from DeepMind and other applications (e.g., chemical synthesis).
Bandit algorithms were used to improve the computational complexity of k-medoids problem (similar to k-Means) dramatically; e.g., this paper.

Bandits are actively being studied in both theory and applications including deployable web service and hyperparameter optimization (check ray implementation). Also, the cartoon caption contest of New Yorker is using bandit algorithms to efficiently crowdsource caption evaluations (this article)!

Service

Area chair: COLT, ALT, NeurIPS, AAAI, ICML
Program committee: AISTATS
Action editor for Machine Learning.