\[
L(\theta,\mathcal{D}) = \sum_{(x_i,y_i)\in\mathcal{D}} \Big( -score_\theta(x_i,y_i) + log( \sum_{y \in \mathcal{Y}(x_i)} e^{score_\theta(x_i,y)} ) \Big)
\]
This is often call the Conditional Random Field (CRF) loss
The minimum of cross-entropy loss does not always exist, and does not exist if the data training data can be completely separated. See for example, section 1.1 of
this paper.
Perceptron loss \[
L(\theta,\mathcal{D}) = \sum_{(x_i,y_i)\in\mathcal{D}} \Big( -score_\theta(x_i,y_i) + \max_{y \in \mathcal{Y}(x_i)} score_\theta(x_i,y) \Big)
\]
Hinge (SVM) loss \[
L(\theta,\mathcal{D}) = \sum_{(x_i,y_i)\in\mathcal{D}} \Big( -score_\theta(x_i,y_i) + \max_{y \in \mathcal{Y}(x_i)} \big(score_\theta(x_i,y) + cost(y_i,y)\big) \Big)
\]
Softmax margin
-
-
The softmax margin loss is obtained by replacing the max in the SVM loss with a softmax: \[
L(\theta,\mathcal{D}) = \sum_{(x_i,y_i)\in\mathcal{D}} \Big( -score_\theta(x_i,y_i) + log( \sum_{y \in \mathcal{Y}(x_i)} e^{score_\theta(x_i,y) + cost(y_i,y)} ) \Big)
\]
Risk\[
L(\theta,\mathcal{D}) = \sum_{(x_i,y_i)\in\mathcal{D}} \frac{\sum_{y\in\mathcal{Y}(x_i)} cost(y_i,y) e^{score_\theta(x_i,y_i)}}{\sum_{y\in\mathcal{Y}(x_i)} e^{score_\theta(x_i,y_i)}}
\]
\[
L(\theta,\mathcal{D}) = \sum_{(x_i,y_i)\in\mathcal{D}} \sum_{y\in\mathcal{Y}(x_i)} cost(y_i,y) p_\theta(y|x_i)
\]