To determine the parameters in a probability distribution using an observed data set, known as the maximum likelihood, is to find the parameters that maximize the likelihood function, which usually take some form like this:

The most convenient way to do this is to take the log of the likelihood function; since logarithms are monotonically increasing, maximizing the log of a function is equivalent to maximizing the function itself, and lets us simplify the mathematical analysis. It’s also easier to do programmatically because products of small numbers can cause underflow.

This expression can then be maximized with several methods:

  • Analytical solution: Find partial derivatives with respect to and , then set to zero and solve.
  • Learning solution: Define an error function and minimize.