Regression

Regression is a supervised learning problem. It has data of the form:

D = {(x^{(1)}, y^{(1)}), \dots, (x^{(n)}, y^{(n)})}

Instead of classification problems, where $y$ values are discrete, they will be real-valued. Regression is appropriate for predicting numerical quantities, like height, stock value, etc.

Thus, our hypotheses will have the form:

h : R^{d} \to R

An example hypothesis for linear regression would be:

h (x; θ, θ_{0}) = θ^{T} x + θ_{0}

Note that for classification, we would have done something like applying a $sign$ function or a sigmoid. Here, we are letting it be.
Furthermore, note that we can get a rich class of hypotheses by performing a non-linear feature transformation before doing the regression, where $θ^{T} x + θ_{0}$ is a linear regression of $x$ , but $θ^{T} ϕ (x) + θ_{0}$ is a non-linear function of $x$ if $ϕ$ is a non-linear function of $x .$

A typical loss function for regression is squared loss:

L (guess, actual) = (guess - actual)^{2}

This penalizes guesses that are too high the same amount as it penalizes guesses that are too low.
Has a good mathematical justification in the case that your data are generated from an underlying linear hypothesis, but with Gaussian-distributed noise added to the y values.

With the above hypothesis and loss function, we can treat regression as an optimization problem in which, for a given dataset $D$ , we wish to find a linear hypothesis that minimizes mean squared error. This mean squared error objective is:

J (θ, θ_{0}) = \frac{1}{n} i = 1 \sum n (θ^{T} x^{(i)} + θ_{0} - y^{(i)})^{2}

resulting in the solution

θ^{*}, θ_{0}^{*} = argmin_{θ, θ_{0}} (J (θ, θ_{0}))

The classical regression problem is called Ordinary Least Squares.

/notes/

Recent

Sources of Test Error

UDL Chapter 8 Problems

Parameter Initialization

Regression

Graph View

Backlinks