Given an objective function or loss function, how do we optimize it?
We have a training set of of input/output pairs. We seek parameters for the model that maps the inputs to the outputs as closely as possible. To this end, we have a loss/objective function that returns a single number that quantifies the mismatch in this mapping. The goal of an optimization algorithm is to find parameters that minimize the loss:
- The objective/loss function defines some surface over , and we want to find the value at the lowest point on the surface.
Most standard optimization algorithms are iterative; they initially model parameters heuristically and then adjust them repeatedly in such a way that the loss decreases.