Differences

This shows you the differences between two versions of the page.

--- ml:optimization [2023/06/15 07:36] – external edit 127.0.0.1
+++ ml:optimization [2024/03/06 21:57] (current) – jmflanig
@@ Line 17: / Line 17: @@
   * [[https://arxiv.org/pdf/1506.01186.pdf|2015 - Cyclical Learning Rates for Training Neural Networks]]
   * [[https://arxiv.org/pdf/2002.12414.pdf|2020 - On the Convergence of Nesterov's Accelerated Gradient Method in Stochastic Settings]]
+  * [[https://proceedings.mlr.press/v202/mei23a/mei23a.pdf|Mei et al 2023 - Stochastic Gradient Succeeds for Bandits]]
 ===== Courses =====
@@ Line 29: / Line 30: @@
   * [[https://medium.com/intuitionmachine/the-peculiar-behavior-of-deep-learning-loss-surfaces-330cb741ec17|2017 - The Two Phases of Gradient Descent in Deep Learning]]
   * [[https://medium.com/inveterate-learner/deep-learning-book-chapter-8-optimization-for-training-deep-models-part-i-20ae75984cb2|Optimization For Training Deep Models Part I]]
+===== People =====
+  * [[https://scholar.google.com/citations?user=xaQuPloAAAAJ&hl=en|Dale Schuurmans]]
 ===== Related Pages =====