User Tools

Site Tools


ml:optimization

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revisionPrevious revision
Next revision
Previous revision
ml:optimization [2021/02/23 10:34] jmflanigml:optimization [2024/03/06 21:57] (current) jmflanig
Line 3: Line 3:
 ===== Lectures and Books ===== ===== Lectures and Books =====
   * [[https://web.stanford.edu/~jduchi/PCMIConvex/Duchi16.pdf|Duchi - Introductory Lectures on Stochastic Optimization]]   * [[https://web.stanford.edu/~jduchi/PCMIConvex/Duchi16.pdf|Duchi - Introductory Lectures on Stochastic Optimization]]
 +  * **[[https://arxiv.org/pdf/1606.04838.pdf|Bottou et al 2016 - Optimization Methods for Large-Scale Machine Learning]]** Survey paper that includes theory results with proofs (even includes the rate of convergence of SGD on non-convex objectives)
 +  * [[https://www.lix.polytechnique.fr/~dambrosio/blackbox_material/Cassioli_1.pdf|Black-box Optimization]]
 +  * [[https://web.stanford.edu/~boyd/cvxbook/bv_cvxbook.pdf|Boyd & Vandenberghe - Convex Optimization]]
 +  * [[https://web.stanford.edu/class/msande310/310trialtext.pdf|Luenberger - Linear and Nonlinear Programming]] Great book
  
 ===== Theory ===== ===== Theory =====
   * [[http://proceedings.mlr.press/v37/agarwal15.pdf|Agarwal, Leon Bottou 2015 - A Lower Bound for the Optimization of Finite Sums]]   * [[http://proceedings.mlr.press/v37/agarwal15.pdf|Agarwal, Leon Bottou 2015 - A Lower Bound for the Optimization of Finite Sums]]
 +  * [[https://arxiv.org/pdf/1309.5549.pdf|Ghadimi & Lan 2013 - Stochastic First- and Zeroth-order Methods for Nonconvex Stochastic Programming]] Cited by Jin 2021 for Theorem 2.10
 +  * [[https://dl.acm.org/doi/pdf/10.1145/3418526|Jin et al 2021 - On Nonconvex Optimization for Machine Learning: Gradients, Stochasticity, and Saddle Points]] Overview of results
  
 ===== Papers ===== ===== Papers =====
Line 11: Line 17:
   * [[https://arxiv.org/pdf/1506.01186.pdf|2015 - Cyclical Learning Rates for Training Neural Networks]]   * [[https://arxiv.org/pdf/1506.01186.pdf|2015 - Cyclical Learning Rates for Training Neural Networks]]
   * [[https://arxiv.org/pdf/2002.12414.pdf|2020 - On the Convergence of Nesterov's Accelerated Gradient Method in Stochastic Settings]]   * [[https://arxiv.org/pdf/2002.12414.pdf|2020 - On the Convergence of Nesterov's Accelerated Gradient Method in Stochastic Settings]]
-  * Lipschitz constant +  * [[https://proceedings.mlr.press/v202/mei23a/mei23a.pdf|Mei et al 2023 Stochastic Gradient Succeeds for Bandits]]
-     * [[https://arxiv.org/pdf/1906.04893.pdf|2019 - Efficient and Accurate Estimation of Lipschitz Constants for Deep Neural Networks]] +
-     * [[https://arxiv.org/pdf/1805.10965.pdf|2018 Lipschitz regularity of deep neural networks: analysis and efficient estimation]]+
  
 ===== Courses ===== ===== Courses =====
-See also [[:Courses and Seminars]].+See also [[:Courses]] and [[:Seminars]].
   * [[http://www.princeton.edu/~yc5/ele522_optimization/|Large-Scale Optimization for Data Science @ Princeton]]   * [[http://www.princeton.edu/~yc5/ele522_optimization/|Large-Scale Optimization for Data Science @ Princeton]]
   * [[http://www.princeton.edu/~yc5/reading_group/index.html|Mathematical Data Science Reading Group @ Princeton]]   * [[http://www.princeton.edu/~yc5/reading_group/index.html|Mathematical Data Science Reading Group @ Princeton]]
Line 26: Line 30:
   * [[https://medium.com/intuitionmachine/the-peculiar-behavior-of-deep-learning-loss-surfaces-330cb741ec17|2017 - The Two Phases of Gradient Descent in Deep Learning]]   * [[https://medium.com/intuitionmachine/the-peculiar-behavior-of-deep-learning-loss-surfaces-330cb741ec17|2017 - The Two Phases of Gradient Descent in Deep Learning]]
   * [[https://medium.com/inveterate-learner/deep-learning-book-chapter-8-optimization-for-training-deep-models-part-i-20ae75984cb2|Optimization For Training Deep Models Part I]]   * [[https://medium.com/inveterate-learner/deep-learning-book-chapter-8-optimization-for-training-deep-models-part-i-20ae75984cb2|Optimization For Training Deep Models Part I]]
 +
 +===== People =====
 +  * [[https://scholar.google.com/citations?user=xaQuPloAAAAJ&hl=en|Dale Schuurmans]]
 +
 +===== Related Pages =====
 +  * [[application_optimization|Application: Optimization]]
 +  * [[NN Training]]
 +  * [[Optimizers]]
 +  * [[Optimization in Deep Learning]]
  
ml/optimization.1614076447.txt.gz · Last modified: 2023/06/15 07:36 (external edit)

Donate Powered by PHP Valid HTML5 Valid CSS Driven by DokuWiki