Recall that Ordinary Least Squares is a regression problem aiming to minimize the mean squared loss, such that we have objective function:
This problem actually has an analytical solution based on calculus.
Analytical/Closed-Form Solution
We can approach this like a minimization problem by taking the derivative of with respect to , set it to 0, and then solve for . It’s possible to do this by:
- Finding for in
- Constructing a set of equations of the form
- Solving the system for values of .
We can work through this in a cool matrix view. Let’s also assume that have been augmented with an extra input dimension with a value of , so that we can ignore .
Let us think of our training data in terms of matrices and , where each column of is an example and each column of is the corresponding output target value:
To make this easier and more standard with most textbooks, we define a new matrix and vector, and , which are just transposes of our and :
Now, each row of corresponds to a sample.
We can then write our objective as:
Here is a vector of predictions. so is a vector of differences between predictions and labels. When we do , we’re basically taking the dot product of the vector of differences with itself, which achieves a summing and squaring effect.
To solve this problem, we take the gradient and set it to zero:
The gradient has the same shape as . The simplified version is basically just the power rule and chain rule (from equation ). Verifying the shapes, we see:
so our result would have shape , which is the shape we want!
Setting to and solving, we get:
And the dimensions work out!
This is cool because it’s a rare closed-form solution!
To be really good and proper, we should also check that this solution wields a minimum, not just a critical point. Also, what if is not invertible?