User Tools

Site Tools


ml:loss_functions

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revisionPrevious revision
Next revision
Previous revision
ml:loss_functions [2024/07/23 00:29] – [List of Loss Functions] jmflanigml:loss_functions [2024/07/23 00:32] (current) jmflanig
Line 13: Line 13:
     * Cross-entropy loss can be written as     * Cross-entropy loss can be written as
 \[ \[
-L(\theta,\mathcal{D}) = \sum_{(x_i,y_i)\in\mathcal{D}} \Big( -score_\theta(x_i,y_i) + \sum_{y \in \mathcal{Y}(x_i)} e^{score_\theta(x_i,y)} \Big)+L(\theta,\mathcal{D}) = \sum_{(x_i,y_i)\in\mathcal{D}} \Big( -score_\theta(x_i,y_i) + log( \sum_{y \in \mathcal{Y}(x_i)} e^{score_\theta(x_i,y)} \Big)
 \] \]
-This is often call the Conditional Random Field (CRF) loss+    * This is often call the Conditional Random Field (CRF) loss
     * The minimum of cross-entropy loss does not always exist, and does not exist if the data training data can be completely separated.  See for example, section 1.1 of [[https://arxiv.org/pdf/1804.09753.pdf|this paper]].     * The minimum of cross-entropy loss does not always exist, and does not exist if the data training data can be completely separated.  See for example, section 1.1 of [[https://arxiv.org/pdf/1804.09753.pdf|this paper]].
   * Perceptron loss \[   * Perceptron loss \[
Line 26: Line 26:
     * [[https://www.aclweb.org/anthology/N10-1112.pdf|Gimple & Smith 2010 - Softmax-Margin CRFs: Training Log-Linear Models with Cost Functions]]     * [[https://www.aclweb.org/anthology/N10-1112.pdf|Gimple & Smith 2010 - Softmax-Margin CRFs: Training Log-Linear Models with Cost Functions]]
     * [[https://arxiv.org/pdf/1612.02295.pdf|Large-Margin Softmax Loss for Convolutional Neural Networks]] L-Softmax.  Doesn't cite [[https://www.aclweb.org/anthology/N10-1112.pdf|Gimple & Smith]].  I suspect it may be different, but need to check.     * [[https://arxiv.org/pdf/1612.02295.pdf|Large-Margin Softmax Loss for Convolutional Neural Networks]] L-Softmax.  Doesn't cite [[https://www.aclweb.org/anthology/N10-1112.pdf|Gimple & Smith]].  I suspect it may be different, but need to check.
-    * The softmax margin loss is obtained by replacing the max in the SVM loss with a softmax:  +    * The softmax margin loss is obtained by replacing the max in the SVM loss with a softmax: \[ 
-\[ +L(\theta,\mathcal{D}) = \sum_{(x_i,y_i)\in\mathcal{D}} \Big( -score_\theta(x_i,y_i) + log( \sum_{y \in \mathcal{Y}(x_i)} e^{score_\theta(x_i,y) + cost(y_i,y)} \Big)
-L(\theta,\mathcal{D}) = \sum_{(x_i,y_i)\in\mathcal{D}} \Big( -score_\theta(x_i,y_i) + \sum_{y \in \mathcal{Y}(x_i)} e^{score_\theta(x_i,y) + cost(y_i,y)} \Big)+
 \] \]
   * Risk\[   * Risk\[
ml/loss_functions.1721694585.txt.gz · Last modified: 2024/07/23 00:29 by jmflanig

Donate Powered by PHP Valid HTML5 Valid CSS Driven by DokuWiki