Gradient Descent for Linear Regression

Consider applying gradient descent to an 1D linear regression model. The model $f [x, ϕ]$ maps a scalar input $x$ to a scalar output $y$ and has parameter $ϕ = [ϕ_{0}, ϕ_{1}]^{T}$ , which represents the $y$ -intercept and the slope:

y = f [x, ϕ] = ϕ_{0} + ϕ_{1} x

Given a dataset ${x_{i}, y_{i}}$ containing $I$ input/output pairs, we choose the least squares loss function:

L [ϕ] = i = 1 \sum I ℓ_{i} = i = 1 \sum I (f [x_{i}, ϕ] - y_{i})^{2} = i = 1 \sum I (ϕ_{0} + ϕ_{1} x_{i} - y_{i})^{2}

where the term $ℓ_{i} = (ϕ_{0} + ϕ_{1} x_{i} - y_{i})^{2}$ is the individual contribution to loss from the $i$ -th training example.

The derivative of the loss function with respect to the parameters can be decomposed into the sum of the derivates of the individual contributions:

\frac{\partial L}{\partial ϕ} = \frac{\partial}{\partial ϕ} i = 1 \sum I ℓ_{i} = i = 1 \sum I \frac{\partial ℓ _{i}}{\partial ϕ}

where these are given by:

\frac{\partial ℓ _{i}}{\partial ϕ} = [\frac{\partial ℓ _{i}}{\partial ϕ _{0}} \frac{\partial ℓ _{i}}{\partial ϕ _{1}}] = [2 (ϕ_{0} + ϕ_{1} x_{i} - y_{i}) 2 x_{i} (ϕ_{0} + ϕ_{1} x_{i} - y_{i})]

The figure below shows the progression of this algorithm as we iteratively compute the derivatives according to the equations above and update them. In this case, we have used a line search procedure to find the value of $α$ that decreases the loss the most at each iteration.

/notes/

Recent

Sources of Test Error

UDL Chapter 8 Problems

Parameter Initialization

Gradient Descent for Linear Regression

Graph View

Backlinks