====== Multi-Armed Bandits ====== See [[https://en.wikipedia.org/wiki/Multi-armed_bandit|Wikipedia - Multi-armed Bandit]]. ===== Surveys ===== * [[http://homes.di.unimi.it/~cesabian/Pubblicazioni/banditSurvey.pdf|Bubeck & Cesa-Bianchi 2012 - Regret Analysis of Stochastic and Nonstochastic Multi-armed Bandit Problems]] Very good survey ===== Theory ===== * [[https://proceedings.mlr.press/v202/mei23a/mei23a.pdf|Mei et al 2023 - Stochastic Gradient Succeeds for Bandits]] ===== Related Pages ===== * [[ml:Reinforcement Learning]] * [[ml:Online Learning]] * [[ml:theory:Regret Bounds]]