ml:theory:generalization_in_deep_learning

This is an old revision of the document!


Generalization in Deep Learning

The theory of generalization in deep learning is not well understood, and is an active area of research.

Summary

Key Papers

Sample Complexity Bounds

Phenomena

Double Descent

From Nakkiran et al 2020 - Optimal Regularization Can Mitigate Double Descent:

Recent works have demonstrated a ubiquitous “double descent” phenomenon present in a range of machine learning models, including decision trees, random features, linear regression, and deep neural networks (Opper, 1995, 2001; Advani & Saxe, 2017; Spigler et al., 2018; Belkin et al., 2018; Geiger et al., 2019b; Nakkiran et al., 2020; Belkin et al., 2019; Hastie et al., 2019; Bartlett et al., 2019; Muthukumar et al., 2019; Bibas et al., 2019; Mitra, 2019; Mei & Montanari, 2019; Liang & Rakhlin, 2018; Liang et al., 2019; Xu & Hsu, 2019; Dereziski et al., 2019; Lampinen & Ganguli, 2018; Deng et al., 2019; Nakkiran, 2019). The phenomenon is that models exhibit a peak of high test risk when they are just barely able to fit the train set, that is, to interpolate. For example, as we increase the size of models, test risk first decreases, then increases to a peak around when effective model size is close to the training data size, and then decreases again in the overparameterized regime. Also surprising is that Nakkiran et al. (2020) observe a double descent as we increase sample size, i.e. for a fixed model, training the model with more data can hurt test performance.

ml/theory/generalization_in_deep_learning.1656112039.txt.gz · Last modified: 2023/06/15 07:36 (external edit)

Donate Powered by PHP Valid HTML5 Valid CSS Driven by DokuWiki