Under the conditional probabilistic perspective of learning, the model now computes different distribution parameters for each training input .

Each observed training output should have high probability under its corresponding distribution . Hence, we choose the model parameters so that they maximized the combined probability across all training samples:

The combined probability term is the likelihood of the parameters. Thus, the above is known as the maximum likelihood criterion.

A more practical version of the maximum likelihood criterion is the Log-Likelihood Criterion.