Log-Likelihood Criterion

The maximum likelihood criterion is not practical in terms of computation. Each term $P r (y_{i} ∣ f [x_{i}, ϕ])$ can be small, so the product of many of these terms can be tiny. It may be difficult to represent this quantity with finite precision arithmetic.

As an alternative, we can maximize the logarithm of likelihood:

\hat{ϕ} = ϕ argmax [i = 1 \prod I P r (y_{i} ∣ f [x_{i}, ϕ])] = ϕ argmax [lo g [i = 1 \prod I P r (y_{i} ∣ f [x_{i}, ϕ])]] = ϕ argmax [i = 1 \sum I lo g [P r (y_{i} ∣ f [x_{i}, ϕ])]]

This log-likelihood criterion is equivalent because the logarithm is a monotonically increasing function.

If $z > z^{'}$ , then we also have $lo g [z] > lo g [z^{'}]$ .
Thus, we change the model parameters $ϕ$ to improve the log-likelihood criterion, we also improve the original maximum likelihood criterion. The overall maximum of the criteria are in the same place, so the best model parameters $\hat{ϕ}$ are the same in both cases.

The advantage of log-likelihood is that it uses a sum of terms, not a product, so representation with finite precision isn’t problematic.

Minimizing negative log-likelihood

By convention, model fitting problems are framed in terms of minimizing a loss. To convert the maximum log-likelihood criterion to a minimization problem, we use the negative log-likelihood criterion instead:

\hat{ϕ} = ϕ argmin [- i = 1 \sum I lo g [P r (y_{i} ∣ f [x_{i}, ϕ])]] = ϕ argmin [L [ϕ]]

/notes/

Recent

Backpropagation Algorithm

Backpropagation Intuition

Backpropagation Toy Example

Log-Likelihood Criterion

Minimizing negative log-likelihood

Graph View

Backlinks