Recall that Ordinary Least Squares is a regression problem aiming to minimize the mean squared loss, such that we have objective function:

This problem actually has an analytical solution based on calculus.

Analytical/Closed-Form Solution

We can approach this like a minimization problem by taking the derivative of with respect to , set it to 0, and then solve for . It’s possible to do this by:

  • Finding for in
  • Constructing a set of equations of the form
  • Solving the system for values of .

We can work through this in a cool matrix view. Let’s also assume that have been augmented with an extra input dimension with a value of , so that we can ignore .

Let us think of our training data in terms of matrices and , where each column of is an example and each column of is the corresponding output target value:

To make this easier and more standard with most textbooks, we define a new matrix and vector, and , which are just transposes of our and :

Now, each row of corresponds to a sample.

We can then write our objective as:

Here is a vector of predictions. so is a vector of differences between predictions and labels. When we do , we’re basically taking the dot product of the vector of differences with itself, which achieves a summing and squaring effect.

To solve this problem, we take the gradient and set it to zero:

The gradient has the same shape as . The simplified version is basically just the power rule and chain rule (from equation ). Verifying the shapes, we see:

so our result would have shape , which is the shape we want!

Setting to and solving, we get:

And the dimensions work out!

This is cool because it’s a rare closed-form solution!

To be really good and proper, we should also check that this solution wields a minimum, not just a critical point. Also, what if is not invertible?