Consider fitting a model with parameters using a training set of input/output pairs. We seek the parameters that minimize the loss function :
where the individual terms measure the mismatch between the network predictions and output targets for each training pair. To bias this minimization toward certain solutions:
where:
- is a function that returns a scalar which takes larger values when the parameters are less preferred.
- is a positive scalar that controls the relative contribution of the original loss function and the regularization term.
The minima of the regularized loss function usually differ from those in the original, so the procedure converges to different parameter values.