Differences

This shows you the differences between two versions of the page.

--- ml:loss_functions [2024/07/23 00:30] – jmflanig
+++ ml:loss_functions [2024/07/23 00:32] (current) – jmflanig
@@ Line 13: / Line 13: @@
     * Cross-entropy loss can be written as
 \[
-L(\theta,\mathcal{D}) = \sum_{(x_i,y_i)\in\mathcal{D}} \Big( -score_\theta(x_i,y_i) + \sum_{y \in \mathcal{Y}(x_i)} e^{score_\theta(x_i,y)} \Big)
+L(\theta,\mathcal{D}) = \sum_{(x_i,y_i)\in\mathcal{D}} \Big( -score_\theta(x_i,y_i) + log( \sum_{y \in \mathcal{Y}(x_i)} e^{score_\theta(x_i,y)} ) \Big)
 \]
     * This is often call the Conditional Random Field (CRF) loss
@@ Line 27: / Line 27: @@
     * [[https://arxiv.org/pdf/1612.02295.pdf|Large-Margin Softmax Loss for Convolutional Neural Networks]] L-Softmax.  Doesn't cite [[https://www.aclweb.org/anthology/N10-1112.pdf|Gimple & Smith]].  I suspect it may be different, but need to check.
     * The softmax margin loss is obtained by replacing the max in the SVM loss with a softmax: \[
-L(\theta,\mathcal{D}) = \sum_{(x_i,y_i)\in\mathcal{D}} \Big( -score_\theta(x_i,y_i) + \sum_{y \in \mathcal{Y}(x_i)} e^{score_\theta(x_i,y) + cost(y_i,y)} \Big)
+L(\theta,\mathcal{D}) = \sum_{(x_i,y_i)\in\mathcal{D}} \Big( -score_\theta(x_i,y_i) + log( \sum_{y \in \mathcal{Y}(x_i)} e^{score_\theta(x_i,y) + cost(y_i,y)} ) \Big)
 \]
   * Risk\[