====== Theory: Online Learning and Regret Bounds ======

===== Online Learning =====
==== Surveys and Theses ====
  * [[http://citeseerx.ist.psu.edu/viewdoc/download;jsessionid=9AF556097D53F9170E8DC85C381F6971?doi=10.1.1.161.9973&rep=rep1&type=pdf|Shalev-Shwartz 2007 - Online Learning: Theory, Algorithms, and Applications]]  See section 2.4 (page 27 in pdf) for historical references
  * [[https://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.419.9&rep=rep1&type=pdf|Battou - Online Learning and Stochastic Approximations]]

==== Key Papers =====
  * [[https://www.aaai.org/Papers/ICML/2003/ICML03-120.pdf|Zinkevich 2003 - Online Convex Programming and Generalized Infinitesimal Gradient Ascent]] See also the [[https://www.cs.cmu.edu/~maz/publications/techconvex.pdf|CMU tech report]]

===== Regret Bounds =====
Regret bounds are widely used for proving generalization bounds for online learning algorithms, and for proving convergence rates of optimization algorithms (for example, in the Adagrad paper).

Quick technical explaination from [[https://arxiv.org/pdf/1606.04838.pdf|Bottou et al 2016]], page 39:
{{media:regret-bounds.png}}

==== Key Papers ====
  * [[http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.67.4767&rep=rep1&type=pdf|Gordon 1999 - Regret Bounds for Prediction Problems]]

===== Related Pages =====
  * [[ml:theory:Multi-Armed Bandit]]
  * [[ml:Online Learning]]
  * [[ml:Reinforcement Learning#Theory|Reinforcement Learning - Theory]]