To construct a loss function for training data using the maximum likelihood approach, we:
- Choose a suitable probability distribution defined over the domain of the set of predictions with parameters .
- Set the model to predict one or more of these parameters, so that and .
- To train the model, find the network parameters that minimize the negative log-likelihood loss function over the training dataset pairs :
Examples of this: