Differences

This shows you the differences between two versions of the page.

--- ml:theory:regret_bounds [2022/05/09 08:12] – created jmflanig
+++ ml:theory:regret_bounds [2023/06/15 07:36] (current) – external edit 127.0.0.1
@@ Line 1: / Line 1: @@
 ====== Theory: Online Learning and Regret Bounds ======
+===== Online Learning =====
+==== Surveys and Theses ====
+  * [[http://citeseerx.ist.psu.edu/viewdoc/download;jsessionid=9AF556097D53F9170E8DC85C381F6971?doi=10.1.1.161.9973&rep=rep1&type=pdf|Shalev-Shwartz 2007 - Online Learning: Theory, Algorithms, and Applications]]  See section 2.4 (page 27 in pdf) for historical references
+  * [[https://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.419.9&rep=rep1&type=pdf|Battou - Online Learning and Stochastic Approximations]]
+==== Key Papers =====
+  * [[https://www.aaai.org/Papers/ICML/2003/ICML03-120.pdf|Zinkevich 2003 - Online Convex Programming and Generalized Infinitesimal Gradient Ascent]] See also the [[https://www.cs.cmu.edu/~maz/publications/techconvex.pdf|CMU tech report]]
+===== Regret Bounds =====
+Regret bounds are widely used for proving generalization bounds for online learning algorithms, and for proving convergence rates of optimization algorithms (for example, in the Adagrad paper).
+Quick technical explaination from [[https://arxiv.org/pdf/1606.04838.pdf|Bottou et al 2016]], page 39:
 {{media:regret-bounds.png}}
+==== Key Papers ====
+  * [[http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.67.4767&rep=rep1&type=pdf|Gordon 1999 - Regret Bounds for Prediction Problems]]
 ===== Related Pages =====
+  * [[ml:theory:Multi-Armed Bandit]]
   * [[ml:Online Learning]]
+  * [[ml:Reinforcement Learning#Theory|Reinforcement Learning - Theory]]