To construct a loss function for training data using the maximum likelihood approach, we:

  1. Choose a suitable probability distribution defined over the domain of the set of predictions with parameters .
  2. Set the model to predict one or more of these parameters, so that and .
  3. To train the model, find the network parameters that minimize the negative log-likelihood loss function over the training dataset pairs :

Examples of this: