ml:regularization
Differences
This shows you the differences between two versions of the page.
| Both sides previous revisionPrevious revisionNext revision | Previous revision | ||
| ml:regularization [2021/11/02 09:46] – [Lp Regularization] jmflanig | ml:regularization [2024/03/14 07:38] (current) – [Sparsity-Inducing Regularizers] jmflanig | ||
|---|---|---|---|
| Line 3: | Line 3: | ||
| ===== Regularization in Deep Learning ===== | ===== Regularization in Deep Learning ===== | ||
| ==== Dropout ==== | ==== Dropout ==== | ||
| + | * Dropout: [[https:// | ||
| + | * DropConnect: | ||
| * [[https:// | * [[https:// | ||
| * [[https:// | * [[https:// | ||
| * [[https:// | * [[https:// | ||
| + | * [[https:// | ||
| + | * LayerDrop: [[https:// | ||
| * [[https:// | * [[https:// | ||
| + | * [[https:// | ||
| ==== Early-stopping ==== | ==== Early-stopping ==== | ||
| Line 13: | Line 18: | ||
| Although L2 regularization (weight decay) wasn't popular in early deep learning models in NLP, it has become popular in pre-trained transformer models. | Although L2 regularization (weight decay) wasn't popular in early deep learning models in NLP, it has become popular in pre-trained transformer models. | ||
| * WAdam: [[https:// | * WAdam: [[https:// | ||
| + | |||
| + | ==== Max-Norm Regularization ==== | ||
| ==== $L_p$ Regularization ===== | ==== $L_p$ Regularization ===== | ||
| - | $L_p$ regularization is regularization with an $L_p$ norm term. Popular choices of $p$ are $L_2$ (weight decay), $L_1$ (Lasso, induces sparsity while still being convex), $L_0$ (very non-convex: counts the number of non-zero parameters. difficult to optimize), and $L_\infty$ (max-norm regularization). | + | $L_p$ regularization is regularization with an $L_p$ norm term. Popular choices of $p$ are $L_2$ (weight decay), $L_1$ (Lasso, induces sparsity while still being convex), $L_0$ (very non-convex: counts the number of non-zero parameters. difficult to optimize), and $L_\infty$ (max-norm regularization). |
| - | * $L_0$ | + | * $L_0$ regularization |
| * [[https:// | * [[https:// | ||
| - | ==== Max-Norm Regularization | + | ==== Sparsity-Inducing Regularizers |
| + | * **Structured Sparsity** | ||
| + | * [[https:// | ||
| + | * [[https:// | ||
| + | Networks]] | ||
| ==== Label Smoothing ==== | ==== Label Smoothing ==== | ||
| Line 70: | Line 81: | ||
| </ | </ | ||
| === Papers === | === Papers === | ||
| + | * [[http:// | ||
| * [[https:// | * [[https:// | ||
| * [[https:// | * [[https:// | ||
ml/regularization.1635846399.txt.gz · Last modified: 2023/06/15 07:36 (external edit)