Instead of trying to intuit learning algorithms like the Perceptron, we can introduce a general framework for solving machine learning problems that lets us derive machine learning algorithms for arbitrarily complicated problems. This is done by framing machine learning as an optimization problem: using computational methods to find the minimum/maximum of a given function.

Fundamentally, we define an objective function , where are the parameters of our model. For a Linear Classifier, we would have . We also write to indicate dependence on the data . Generally, the optimization is that we want to find such that:

which is to say that we want to find the that minimizes .

A common objective function for machine learning is:

where:

  • is a Loss Function, which makes the first term equivalent to training error.
  • is a regularizer, and is a constant hyperparameter governing how well we want to fit to training data.