Under the conditional probabilistic perspective of learning, the model now computes different distribution parameters for each training input .
Each observed training output should have high probability under its corresponding distribution . Hence, we choose the model parameters so that they maximized the combined probability across all training samples:
The combined probability term is the likelihood of the parameters. Thus, the above is known as the maximum likelihood criterion.
A more practical version of the maximum likelihood criterion is the Log-Likelihood Criterion.