User Tools

Site Tools


ml:theory:generalization_in_deep_learning

Generalization in Deep Learning

The theory of generalization in deep learning is not well understood, and is an active area of research.

Overviews

Key Papers

Sample Complexity Bounds

Phenomena

Double Descent

From Nakkiran et al 2020 - Optimal Regularization Can Mitigate Double Descent:

Recent works have demonstrated a ubiquitous “double descent” phenomenon present in a range of machine learning models, including decision trees, random features, linear regression, and deep neural networks (Opper, 1995, 2001; Advani & Saxe, 2017; Spigler et al., 2018; Belkin et al., 2018; Geiger et al., 2019b; Nakkiran et al., 2020; Belkin et al., 2019; Hastie et al., 2019; Bartlett et al., 2019; Muthukumar et al., 2019; Bibas et al., 2019; Mitra, 2019; Mei & Montanari, 2019; Liang & Rakhlin, 2018; Liang et al., 2019; Xu & Hsu, 2019; Dereziski et al., 2019; Lampinen & Ganguli, 2018; Deng et al., 2019; Nakkiran, 2019). The phenomenon is that models exhibit a peak of high test risk when they are just barely able to fit the train set, that is, to interpolate. For example, as we increase the size of models, test risk first decreases, then increases to a peak around when effective model size is close to the training data size, and then decreases again in the overparameterized regime. Also surprising is that Nakkiran et al. (2020) observe a double descent as we increase sample size, i.e. for a fixed model, training the model with more data can hurt test performance.

Grokking

ml/theory/generalization_in_deep_learning.txt · Last modified: 2025/05/29 07:00 by jmflanig

Donate Powered by PHP Valid HTML5 Valid CSS Driven by DokuWiki