====== Optimization ====== ===== Lectures and Books ===== * [[https://web.stanford.edu/~jduchi/PCMIConvex/Duchi16.pdf|Duchi - Introductory Lectures on Stochastic Optimization]] * **[[https://arxiv.org/pdf/1606.04838.pdf|Bottou et al 2016 - Optimization Methods for Large-Scale Machine Learning]]** Survey paper that includes theory results with proofs (even includes the rate of convergence of SGD on non-convex objectives) * [[https://www.lix.polytechnique.fr/~dambrosio/blackbox_material/Cassioli_1.pdf|Black-box Optimization]] * [[https://web.stanford.edu/~boyd/cvxbook/bv_cvxbook.pdf|Boyd & Vandenberghe - Convex Optimization]] * [[https://web.stanford.edu/class/msande310/310trialtext.pdf|Luenberger - Linear and Nonlinear Programming]] Great book ===== Theory ===== * [[http://proceedings.mlr.press/v37/agarwal15.pdf|Agarwal, Leon Bottou 2015 - A Lower Bound for the Optimization of Finite Sums]] * [[https://arxiv.org/pdf/1309.5549.pdf|Ghadimi & Lan 2013 - Stochastic First- and Zeroth-order Methods for Nonconvex Stochastic Programming]] Cited by Jin 2021 for Theorem 2.10 * [[https://dl.acm.org/doi/pdf/10.1145/3418526|Jin et al 2021 - On Nonconvex Optimization for Machine Learning: Gradients, Stochasticity, and Saddle Points]] Overview of results ===== Papers ===== * [[https://arxiv.org/pdf/1708.07120.pdf|2018 - Super-Convergence: Very Fast Training of Neural Networks Using Large Learning Rates]] * [[https://arxiv.org/pdf/1506.01186.pdf|2015 - Cyclical Learning Rates for Training Neural Networks]] * [[https://arxiv.org/pdf/2002.12414.pdf|2020 - On the Convergence of Nesterov's Accelerated Gradient Method in Stochastic Settings]] * [[https://proceedings.mlr.press/v202/mei23a/mei23a.pdf|Mei et al 2023 - Stochastic Gradient Succeeds for Bandits]] ===== Courses ===== See also [[:Courses]] and [[:Seminars]]. * [[http://www.princeton.edu/~yc5/ele522_optimization/|Large-Scale Optimization for Data Science @ Princeton]] * [[http://www.princeton.edu/~yc5/reading_group/index.html|Mathematical Data Science Reading Group @ Princeton]] * [[https://github.com/epfml/OptML_course|EPFL Course - Optimization for Machine Learning]] * [[https://ee227c.github.io/|Convex Optimization and Approximation @ Berkeley]] ===== Blog Posts ===== * [[https://www.jeremyjordan.me/nn-learning-rate/|2018 - Setting the learning rate of your neural network]] * [[https://medium.com/intuitionmachine/the-peculiar-behavior-of-deep-learning-loss-surfaces-330cb741ec17|2017 - The Two Phases of Gradient Descent in Deep Learning]] * [[https://medium.com/inveterate-learner/deep-learning-book-chapter-8-optimization-for-training-deep-models-part-i-20ae75984cb2|Optimization For Training Deep Models Part I]] ===== People ===== * [[https://scholar.google.com/citations?user=xaQuPloAAAAJ&hl=en|Dale Schuurmans]] ===== Related Pages ===== * [[application_optimization|Application: Optimization]] * [[NN Training]] * [[Optimizers]] * [[Optimization in Deep Learning]]