home

KwangSung Jun
Assistant Professsor
Computer Science
Statistics GIDP, Applied Math GIDP (affiliated)
University of Arizona
kjun å† cs ∂ø† arizona ∂ø† edu
GouldSimpson Rm 746, 1040 E. 4th St., Tucson, AZ 85721
Google scholar
CV

intro
Broadly, I work on interactive machine learning.
I spend most of my time on on developing and analyzing adaptive decisionmaking/sampling methods including bandit algorithms and reinforcement learning.
I tend to revolve around simple problems.
Recently, I am also looking into Monte Carlo tree search methods and various applications including efficient matrix decomposition, geoscience (some blackbox/bayesian optimization involved), and material science problems.
I also had some fun in the past with machine learning applied to psychology.
I was previously a postdoc with Francesco Orabona (who I call the 'master of coin’) at Boston University.
Before then, I spent 9 years at UWMadison for a PhD degree with Xiaojin (Jerry) Zhu and a postdoc position at Wisconsin Institute for Discovery with Robert Nowak, Rebecca Willett, and Stephen Wright.
news
Sep’22: 2 papers accepted at NeurIPS. Congrats to my postdoc Kyoungseok Jang!
May’22: I gave a talk at the RL theory virtual seminars on Maillard sampling.
Jan’22: 1 paper accpeted at AAAI, 3 papers accepted to AISTATS.
May’21: 2 papers accepted at ICML.
May’21: 1 paper accepted at ISIT.
Jul’20: I gave a talk at the RL theory virtual seminars on structured bandits with the asymptotic optimality.
Jul’20: Chicheng and I have a new work on structured bandits that is accepted to ICML’20 workshop on theoretical foundations of reinforcement learning!
Nov’19: In Spring 2020, I will be teaching CSC 665 Online Learning and Multiarmed Bandits.
Oct’19: Chicheng Zhang, Jason Pacheco, and I are organizing a machine learning reading group at UA.
Oct’19: Our paper on kernel regression paper is accepted at NeurIPS’19.
Oct’19: Our paper on parameterfree SGD with local differential privacy is accepted to PriML’19 (worshop at NeurIPS).
multiarmed bandits
The multiarmed bandit problem is a stateless version of reinforcement learning (RL).
Informally speaking, bandit algorithms learn to make better decisions over time in a feedbackloop.
The decisions necessarily affect the feedback information, and the feedback data collected so far is no longer i.i.d.; most traditional learning guarantees do not apply.
But why study an easier version of RL while RL seems to be solving all the problems these day?
Being a very simple problem, you can develop algorithms with precise theoretical guarantees and superior performance compared to RL algorithms applied to bandit problems. These guarantees include precise instancedependent guarantees (as opposed to the worstcase or minimax guarantees) where some algorithms even achieve optimal rates with exact numerical constants!
Bayesian optimization's convergence guarantees are analyzed in the bandit setup.
Developments in bandits are being transferred to propose new RL algorithms with strong guarantees.
Monte Carlo tree search (MCTS) algorithm used in AlphaGo was originated from the paper “bandit based MonteCarlo planning”, and MCTS made a revolutionary performance improvement in solving Go since its appearance. UCT was extended to PUCT and used in AlphaGo and numerous other successful RL applications from DeepMind and other applications (e.g., chemical synthesis).
Bandit algorithms were used to improve the computational complexity of kmedoids problem (similar to kMeans) dramatically; e.g., this paper.
Bandits are actively being studied in both theory and applications including deployable web service and hyperparameter optimization (check ray implementation).
Also, the cartoon caption contest of New Yorker is using bandit algorithms to efficiently crowdsource caption evaluations (this article)!
service
Senior program committee / area chair: COLT, NeurIPS
Program committee: NeurIPS, ICML, AISTATS, AAAI
