Regression is a supervised learning problem. It has data of the form:

Instead of classification problems, where values are discrete, they will be real-valued. Regression is appropriate for predicting numerical quantities, like height, stock value, etc.

Thus, our hypotheses will have the form:

An example hypothesis for linear regression would be:

  • Note that for classification, we would have done something like applying a function or a sigmoid. Here, we are letting it be.
  • Furthermore, note that we can get a rich class of hypotheses by performing a non-linear feature transformation before doing the regression, where is a linear regression of , but is a non-linear function of if is a non-linear function of

A typical loss function for regression is squared loss:

  • This penalizes guesses that are too high the same amount as it penalizes guesses that are too low.
  • Has a good mathematical justification in the case that your data are generated from an underlying linear hypothesis, but with Gaussian-distributed noise added to the y values.

With the above hypothesis and loss function, we can treat regression as an optimization problem in which, for a given dataset , we wish to find a linear hypothesis that minimizes mean squared error. This mean squared error objective is:

resulting in the solution

The classical regression problem is called Ordinary Least Squares.