Regression is a supervised learning problem. It has data of the form:
Instead of classification problems, where values are discrete, they will be real-valued. Regression is appropriate for predicting numerical quantities, like height, stock value, etc.
Thus, our hypotheses will have the form:
An example hypothesis for linear regression would be:
- Note that for classification, we would have done something like applying a function or a sigmoid. Here, we are letting it be.
- Furthermore, note that we can get a rich class of hypotheses by performing a non-linear feature transformation before doing the regression, where is a linear regression of , but is a non-linear function of if is a non-linear function of
A typical loss function for regression is squared loss:
- This penalizes guesses that are too high the same amount as it penalizes guesses that are too low.
- Has a good mathematical justification in the case that your data are generated from an underlying linear hypothesis, but with Gaussian-distributed noise added to the y values.
With the above hypothesis and loss function, we can treat regression as an optimization problem in which, for a given dataset , we wish to find a linear hypothesis that minimizes mean squared error. This mean squared error objective is:
resulting in the solution
The classical regression problem is called Ordinary Least Squares.