The maximum likelihood criterion is not practical in terms of computation. Each term can be small, so the product of many of these terms can be tiny. It may be difficult to represent this quantity with finite precision arithmetic.

As an alternative, we can maximize the logarithm of likelihood:

This log-likelihood criterion is equivalent because the logarithm is a monotonically increasing function.

  • If , then we also have .
  • Thus, we change the model parameters to improve the log-likelihood criterion, we also improve the original maximum likelihood criterion. The overall maximum of the criteria are in the same place, so the best model parameters are the same in both cases.

The advantage of log-likelihood is that it uses a sum of terms, not a product, so representation with finite precision isn’t problematic.

Minimizing negative log-likelihood

By convention, model fitting problems are framed in terms of minimizing a loss. To convert the maximum log-likelihood criterion to a minimization problem, we use the negative log-likelihood criterion instead: