UDL Chapter 2 Problems

Problem 2.1

To walk “downhill” on the least squares loss function
$L [ϕ] = i = 1 \sum I (ϕ_{0} + ϕ_{1} x_{i} - y_{i})^{2}$
we measure its its gradient with respect to the parameters $ϕ_{0}$ and $ϕ_{1}$ . Calculate expressions for the slopes $\partial L / \partial ϕ_{0}$ and $\partial L / \partial ϕ_{1}$ .

For $ϕ_{0}$ :

\frac{\partial}{\partial ϕ _{0}} (ϕ_{0} + ϕ_{1} x_{i} - y_{i})^{2} = 2 (ϕ_{0} + ϕ_{1} x_{i} - y_{i}) \cdot \frac{\partial}{\partial ϕ _{0}} (ϕ_{0} + ϕ_{1} x_{i} - y_{i}) = 2 (ϕ_{0} + ϕ_{1} x_{i} - y_{i}) \cdot 1

Taking the summation into account, we have

\frac{\partial L}{\partial ϕ _{0}} = 2 i = 1 \sum I (ϕ_{0} + ϕ_{1} x_{i} - y_{i})

For $ϕ_{1}$ :

\frac{\partial}{\partial ϕ _{1}} (ϕ_{0} + ϕ_{1} x_{i} - y_{i})^{2} = 2 (ϕ_{0} + ϕ_{1} x_{i} - y_{i}) \cdot \frac{\partial}{\partial ϕ _{1}} (ϕ_{0} + ϕ_{1} x_{i} - y_{i}) = 2 (ϕ_{0} + ϕ_{1} x_{i} - y_{i}) \cdot x_{i}

Taking the summation into account, we have:

\frac{\partial L}{\partial ϕ _{1}} = 2 i = 1 \sum I (ϕ_{0} + ϕ_{1} x_{i} - y_{i}) x_{i}

Problem 2.2

Show that we can find the minimum of the loss function in closed form by setting the expression for the derivatives from Problem 2.1 to zero and solving for $ϕ_{0}$ and $ϕ_{1}$ . Note that this works for linear regression but not for more complex models; this is why we use iterative model fitting methods like gradient descent.

For $ϕ_{0}$ :

\frac{\partial L}{\partial ϕ _{0}} = 2 i = 1 \sum I (ϕ_{0} + ϕ_{1} x_{i} - y_{i}) i = 1 \sum I ϕ_{0} + i = 1 \sum I ϕ_{1} x_{i} - i = 1 \sum I y_{i} I ϕ_{0} + ϕ_{1} i = 1 \sum I x_{i} - i = 1 \sum I y_{i} ϕ_{0} ϕ_{0} = 0 = 0 = 0 = \frac{1}{I} i = 1 \sum I y_{i} - ϕ_{1} \frac{1}{I} i = 1 \sum I x_{i} = \overset{y}{ˉ} - ϕ_{1} \overset{x}{ˉ}

For $ϕ_{1}$ :

\frac{\partial L}{\partial ϕ _{1}} = 2 i = 1 \sum I (ϕ_{0} + ϕ_{1} x_{i} - y_{i}) x_{i} i = 1 \sum I (ϕ_{0} + ϕ_{1} x_{i} - y_{i}) x_{i} i = 1 \sum I ϕ_{0} x_{i} + i = 1 \sum I ϕ_{1} x_{i}^{2} - i = 1 \sum I y_{i} x_{i} ϕ_{0} i = 1 \sum I x_{i} + ϕ_{1} i = 1 \sum I x_{i}^{2} - i = 1 \sum I y_{i} x_{i} = 0 = 0 = 0 = 0

Substitute $ϕ_{0} = \overset{y}{ˉ} - ϕ_{1} \overset{x}{ˉ}$ into the equation:

(\overset{y}{ˉ} - ϕ_{1} \overset{x}{ˉ}) i = 1 \sum I x_{i} + ϕ_{1} i = 1 \sum I x_{i}^{2} - i = 1 \sum I y_{i} x_{i} \overset{y}{ˉ} i = 1 \sum I x_{i} - ϕ_{1} \overset{x}{ˉ} i = 1 \sum I x_{i} + ϕ_{1} i = 1 \sum I x_{i}^{2} - i = 1 \sum I y_{i} x_{i} \overset{y}{ˉ} i = 1 \sum I x_{i} - i = 1 \sum I y_{i} x_{i} + ϕ_{1} (i = 1 \sum I x_{i}^{2} - \overset{x}{ˉ} i = 1 \sum I x_{i}) i = 1 \sum I (\overset{y}{ˉ} - y_{i}) x_{i} + ϕ_{1} i = 1 \sum I (x_{i} - \overset{x}{ˉ}) x_{i} = 0 = 0 = 0 = 0.

Then:

i = 1 \sum I (\overset{y}{ˉ} - y_{i}) x_{i} + ϕ_{1} i = 1 \sum I (x_{i} - \overset{x}{ˉ}) x_{i} i = 1 \sum I \overset{y}{ˉ} x_{i} - i = 1 \sum I y_{i} x_{i} + ϕ_{1} i = 1 \sum I (x_{i}^{2} - \overset{x}{ˉ} x_{i}) \overset{y}{ˉ} i = 1 \sum I x_{i} - i = 1 \sum I y_{i} x_{i} + ϕ_{1} (i = 1 \sum I x_{i}^{2} - \overset{x}{ˉ} i = 1 \sum I x_{i}) ϕ_{1} (i = 1 \sum I x_{i}^{2} - \overset{x}{ˉ} i = 1 \sum I x_{i}) ϕ_{1} ϕ_{1} = 0 = 0 = 0 = i = 1 \sum I y_{i} x_{i} - \overset{y}{ˉ} i = 1 \sum I x_{i} = \frac{\sum _{i = 1}^{I} y _{i} x _{i} - y ˉ \sum _{i = 1}^{I} x _{i}}{\sum _{i = 1}^{I} x _{i}^{2} - x ˉ \sum _{i = 1}^{I} x _{i}} = \frac{\sum _{i = 1}^{I} ( x _{i} - x ˉ ) ( y _{i} - y ˉ )}{\sum _{i = 1}^{I} ( x _{i} - x ˉ ) ^{2}} .

Problem 2.3

Consider reformulating linear regression as a generative model, so we have $x = g [y, ϕ] = ϕ_{0} + ϕ_{1} y$ . What is the new loss function? Find an expression for the inverse function $y = g^{- 1} [x, ϕ]$ that we would use to perform inference. Will this model make the same predictions as the discriminative version for a given training dataset ${x_{i}, y_{i}}$ ? One way to establish this is to write code that fits a line to three data points using both methods and see if the result is the same.

The generative model is

x = g [y, ϕ] = ϕ_{0} + ϕ_{1} y

Here, $x$ is generated as a function of $y$ and parameters $ϕ_{0}, ϕ_{1}$ .

The new least squares loss becomes:

L [ϕ] = i = 1 \sum I (x_{i} - (ϕ_{0} + ϕ_{1} y_{i}))^{2}

We want to find the inverse function $y = g^{- 1} [x, ϕ]$ , starting from:

x y = ϕ_{0} + ϕ_{1} y = \frac{x - ϕ _{0}}{ϕ _{1}}

The discriminative model directly minimizes the loss on $y$ given $x$ .
The generative model minimizes the loss on $x$ given $y$ , and we invert it to predict $y$ from $x$ .

Discriminative Model: y = 1.17 + 0.75x
Generative Model: x = -1.43 + 1.29y
Inverse Generative Model: y = 1.11 + 0.78x

/notes/

Recent

Sources of Test Error

UDL Chapter 8 Problems

Parameter Initialization

UDL Chapter 2 Problems

Graph View

Backlinks