Consider applying gradient descent to an 1D linear regression model. The model maps a scalar input to a scalar output and has parameter , which represents the -intercept and the slope:

Given a dataset containing input/output pairs, we choose the least squares loss function:

where the term is the individual contribution to loss from the -th training example.

The derivative of the loss function with respect to the parameters can be decomposed into the sum of the derivates of the individual contributions:

where these are given by:

The figure below shows the progression of this algorithm as we iteratively compute the derivatives according to the equations above and update them. In this case, we have used a line search procedure to find the value of that decreases the loss the most at each iteration.