Negative log-likelihood multi-class

We can extend the idea of Binary Cross-Entropy Loss directly to multi-class classification with $K$ classes, where the training labels is represented with the one-hot vector $y = [y_{1}, \dots, y_{k}]^{T}$ , where $y_{k} = 1$ if the example is of class $k$ .

Assume that our network uses Softmax as the activation function in the last layer, so that the output is $a = [a_{1}, \dots, a_{k}]^{T}$ , which represents a probability distribution over $K$ possible classes. Then, the probability that our network predicts the correct class for this example is $\prod_{k = 1}^{K} a_{k}^{y_{k}}$ and the log of the probability that it is correct is $\sum_{k = 1}^{K} y_{k} lo g a_{k}$ , so

L_{nllm} (guess, actual) = - k = 1 \sum K actual_{k}, lo g (guess_{k})

We’ll call this NLLM for negative log likelihood multiclass.

/notes/

Recent

Backpropagation Algorithm

Backpropagation Intuition

Backpropagation Toy Example

Negative log-likelihood multi-class

Graph View

Backlinks