How do we quantify the quality of fit?
The sum of squares of residuals () is a quantification of the error between the measured and predicted values after regression:
The total sum of squares around the mean value is the magnitude of the residual error associated with the dependent variable prior to regression:
We can then use then use the difference between and to quantify the improvement or error reduction. To do this, we define:
If we have , that means that the line explains all of the variability of the data and is therefore a “perfect fit”. On the other hand, would be a poor fit.
Something like means that 86% of the uncertainty is explained by the linear model.
It’s still important to check results visually, as and can trick you!