Differences

This shows you the differences between two versions of the page.

--- ml:loss_functions [2024/07/23 00:29] – [List of Loss Functions] jmflanig
+++ ml:loss_functions [2024/07/23 00:32] (current) – jmflanig
@@ Line 13: / Line 13: @@
     * Cross-entropy loss can be written as
 \[
-L(\theta,\mathcal{D}) = \sum_{(x_i,y_i)\in\mathcal{D}} \Big( -score_\theta(x_i,y_i) + \sum_{y \in \mathcal{Y}(x_i)} e^{score_\theta(x_i,y)} \Big)
+L(\theta,\mathcal{D}) = \sum_{(x_i,y_i)\in\mathcal{D}} \Big( -score_\theta(x_i,y_i) + log( \sum_{y \in \mathcal{Y}(x_i)} e^{score_\theta(x_i,y)} ) \Big)
 \]
-This is often call the Conditional Random Field (CRF) loss
+    * This is often call the Conditional Random Field (CRF) loss
     * The minimum of cross-entropy loss does not always exist, and does not exist if the data training data can be completely separated.  See for example, section 1.1 of [[https://arxiv.org/pdf/1804.09753.pdf|this paper]].
   * Perceptron loss \[
@@ Line 26: / Line 26: @@
     * [[https://www.aclweb.org/anthology/N10-1112.pdf|Gimple & Smith 2010 - Softmax-Margin CRFs: Training Log-Linear Models with Cost Functions]]
     * [[https://arxiv.org/pdf/1612.02295.pdf|Large-Margin Softmax Loss for Convolutional Neural Networks]] L-Softmax.  Doesn't cite [[https://www.aclweb.org/anthology/N10-1112.pdf|Gimple & Smith]].  I suspect it may be different, but need to check.
-    * The softmax margin loss is obtained by replacing the max in the SVM loss with a softmax:
+    * The softmax margin loss is obtained by replacing the max in the SVM loss with a softmax: \[
-\[
+L(\theta,\mathcal{D}) = \sum_{(x_i,y_i)\in\mathcal{D}} \Big( -score_\theta(x_i,y_i) + log( \sum_{y \in \mathcal{Y}(x_i)} e^{score_\theta(x_i,y) + cost(y_i,y)} ) \Big)
-L(\theta,\mathcal{D}) = \sum_{(x_i,y_i)\in\mathcal{D}} \Big( -score_\theta(x_i,y_i) + \sum_{y \in \mathcal{Y}(x_i)} e^{score_\theta(x_i,y) + cost(y_i,y)} \Big)
 \]
   * Risk\[