User Tools

Site Tools


ml:loss_functions

Loss Functions

A function that is minimized during training (using gradient descent or Adam, for example) is called a loss function.

Code Examples

  • Hugging Face
    • Custom loss in Hugging Face trainer: Trainer

List of Loss Functions

  • Cross-entropy (aka log loss, conditional log-likelihood, CRF loss)
    • Lots of different ways to write this loss function. One way is minimize $L(\mathcal{D}) = -\sum_{i=1}^{N} log(p(y_i|x_i))$, where $p(y|x) = \frac{e^{score(x,y)}}{\sum_{y} e^{score(x,y)}}$, where $p(y|x) = \frac{e^{score(x,y)}}{\sum_{y} e^{score(x,y)}}$
    • The cross-entropy version writes it as $L(\mathcal{D}) = -\sum_{i=1}^{N}\sum_{y} p(y|x_i) log(p_\theta(y|x_i))$, but usually we put in the empirical distribution $p(y|x_i) = I[y=y_i]$ which gives us the log-loss above.
    • Cross-entropy loss can be written as

\[ L(\theta,\mathcal{D}) = \sum_{(x_i,y_i)\in\mathcal{D}} \Big( -score_\theta(x_i,y_i) + log( \sum_{y \in \mathcal{Y}(x_i)} e^{score_\theta(x_i,y)} ) \Big) \]

  • This is often call the Conditional Random Field (CRF) loss
  • The minimum of cross-entropy loss does not always exist, and does not exist if the data training data can be completely separated. See for example, section 1.1 of this paper.
  • Perceptron loss \[ L(\theta,\mathcal{D}) = \sum_{(x_i,y_i)\in\mathcal{D}} \Big( -score_\theta(x_i,y_i) + \max_{y \in \mathcal{Y}(x_i)} score_\theta(x_i,y) \Big) \]
  • Hinge (SVM) loss \[ L(\theta,\mathcal{D}) = \sum_{(x_i,y_i)\in\mathcal{D}} \Big( -score_\theta(x_i,y_i) + \max_{y \in \mathcal{Y}(x_i)} \big(score_\theta(x_i,y) + cost(y_i,y)\big) \Big) \]
  • Softmax margin
  • Large-Margin Softmax Loss for Convolutional Neural Networks L-Softmax. Doesn't cite Gimple & Smith. I suspect it may be different, but need to check.
  • The softmax margin loss is obtained by replacing the max in the SVM loss with a softmax: \[ L(\theta,\mathcal{D}) = \sum_{(x_i,y_i)\in\mathcal{D}} \Big( -score_\theta(x_i,y_i) + log( \sum_{y \in \mathcal{Y}(x_i)} e^{score_\theta(x_i,y) + cost(y_i,y)} ) \Big) \]
  • Risk\[ L(\theta,\mathcal{D}) = \sum_{(x_i,y_i)\in\mathcal{D}} \frac{\sum_{y\in\mathcal{Y}(x_i)} cost(y_i,y) e^{score_\theta(x_i,y_i)}}{\sum_{y\in\mathcal{Y}(x_i)} e^{score_\theta(x_i,y_i)}} \]

\[ L(\theta,\mathcal{D}) = \sum_{(x_i,y_i)\in\mathcal{D}} \sum_{y\in\mathcal{Y}(x_i)} cost(y_i,y) p_\theta(y|x_i) \]

ml/loss_functions.txt · Last modified: 2024/07/23 00:32 by jmflanig

Donate Powered by PHP Valid HTML5 Valid CSS Driven by DokuWiki