ml:loss_functions

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revisionPrevious revision
Next revision
Previous revision
ml:loss_functions [2023/10/27 21:07] jmflanigml:loss_functions [2024/07/23 00:32] (current) jmflanig
Line 1: Line 1:
 ====== Loss Functions ====== ====== Loss Functions ======
 A function that is minimized during training (using gradient descent or Adam, for example) is called a loss function. A function that is minimized during training (using gradient descent or Adam, for example) is called a loss function.
 +
 +==== Code Examples =====
 +  * Hugging Face
 +    * Custom loss in Hugging Face trainer: [[https://huggingface.co/docs/transformers/main_classes/trainer|Trainer]]
 +
  
 ==== List of Loss Functions ==== ==== List of Loss Functions ====
Line 6: Line 11:
     * Lots of different ways to write this loss function.  One way is minimize $L(\mathcal{D}) = -\sum_{i=1}^{N} log(p(y_i|x_i))$, where $p(y|x) = \frac{e^{score(x,y)}}{\sum_{y} e^{score(x,y)}}$, where $p(y|x) = \frac{e^{score(x,y)}}{\sum_{y} e^{score(x,y)}}$     * Lots of different ways to write this loss function.  One way is minimize $L(\mathcal{D}) = -\sum_{i=1}^{N} log(p(y_i|x_i))$, where $p(y|x) = \frac{e^{score(x,y)}}{\sum_{y} e^{score(x,y)}}$, where $p(y|x) = \frac{e^{score(x,y)}}{\sum_{y} e^{score(x,y)}}$
     * The cross-entropy version writes it as $L(\mathcal{D}) = -\sum_{i=1}^{N}\sum_{y} p(y|x_i) log(p_\theta(y|x_i))$, but usually we put in the empirical distribution $p(y|x_i) = I[y=y_i]$ which gives us the log-loss above.     * The cross-entropy version writes it as $L(\mathcal{D}) = -\sum_{i=1}^{N}\sum_{y} p(y|x_i) log(p_\theta(y|x_i))$, but usually we put in the empirical distribution $p(y|x_i) = I[y=y_i]$ which gives us the log-loss above.
 +    * Cross-entropy loss can be written as
 +\[
 +L(\theta,\mathcal{D}) = \sum_{(x_i,y_i)\in\mathcal{D}} \Big( -score_\theta(x_i,y_i) + log( \sum_{y \in \mathcal{Y}(x_i)} e^{score_\theta(x_i,y)} ) \Big)
 +\]
 +    * This is often call the Conditional Random Field (CRF) loss
     * The minimum of cross-entropy loss does not always exist, and does not exist if the data training data can be completely separated.  See for example, section 1.1 of [[https://arxiv.org/pdf/1804.09753.pdf|this paper]].     * The minimum of cross-entropy loss does not always exist, and does not exist if the data training data can be completely separated.  See for example, section 1.1 of [[https://arxiv.org/pdf/1804.09753.pdf|this paper]].
   * Perceptron loss \[   * Perceptron loss \[
Line 16: Line 26:
     * [[https://www.aclweb.org/anthology/N10-1112.pdf|Gimple & Smith 2010 - Softmax-Margin CRFs: Training Log-Linear Models with Cost Functions]]     * [[https://www.aclweb.org/anthology/N10-1112.pdf|Gimple & Smith 2010 - Softmax-Margin CRFs: Training Log-Linear Models with Cost Functions]]
     * [[https://arxiv.org/pdf/1612.02295.pdf|Large-Margin Softmax Loss for Convolutional Neural Networks]] L-Softmax.  Doesn't cite [[https://www.aclweb.org/anthology/N10-1112.pdf|Gimple & Smith]].  I suspect it may be different, but need to check.     * [[https://arxiv.org/pdf/1612.02295.pdf|Large-Margin Softmax Loss for Convolutional Neural Networks]] L-Softmax.  Doesn't cite [[https://www.aclweb.org/anthology/N10-1112.pdf|Gimple & Smith]].  I suspect it may be different, but need to check.
 +    * The softmax margin loss is obtained by replacing the max in the SVM loss with a softmax: \[
 +L(\theta,\mathcal{D}) = \sum_{(x_i,y_i)\in\mathcal{D}} \Big( -score_\theta(x_i,y_i) + log( \sum_{y \in \mathcal{Y}(x_i)} e^{score_\theta(x_i,y) + cost(y_i,y)} ) \Big)
 +\]
   * Risk\[   * Risk\[
 L(\theta,\mathcal{D}) = \sum_{(x_i,y_i)\in\mathcal{D}} \frac{\sum_{y\in\mathcal{Y}(x_i)} cost(y_i,y) e^{score_\theta(x_i,y_i)}}{\sum_{y\in\mathcal{Y}(x_i)} e^{score_\theta(x_i,y_i)}} L(\theta,\mathcal{D}) = \sum_{(x_i,y_i)\in\mathcal{D}} \frac{\sum_{y\in\mathcal{Y}(x_i)} cost(y_i,y) e^{score_\theta(x_i,y_i)}}{\sum_{y\in\mathcal{Y}(x_i)} e^{score_\theta(x_i,y_i)}}
Line 29: Line 42:
   * Squentropy (Cross-entropy + squared error)   * Squentropy (Cross-entropy + squared error)
     * [[https://arxiv.org/pdf/2302.03952.pdf|Hui et al 2023 - Cut your Losses with Squentropy]]     * [[https://arxiv.org/pdf/2302.03952.pdf|Hui et al 2023 - Cut your Losses with Squentropy]]
- 
-==== Code Examples ===== 
-  * Hugging Face 
-    * Custom loss in Hugging Face trainer: [[https://huggingface.co/docs/transformers/main_classes/trainer|Trainer]] 
  
 ===== Related Pages ===== ===== Related Pages =====
   * [[NN Training|Training]]   * [[NN Training|Training]]
  
ml/loss_functions.1698440864.txt.gz · Last modified: 2023/10/27 21:07 by jmflanig

Donate Powered by PHP Valid HTML5 Valid CSS Driven by DokuWiki