UDL Chapter 9 Problems

Problem 9.1

Consider a model where the prior distribution over the parameter is a normal distribution with mean zero and variance $σ_{ϕ}^{2}$ so that
$P r (ϕ) = j = 1 \prod J Norm_{ϕ_{j}} [0, σ_{ϕ}^{2}]$
where $j$ indexes the model parameters. We now maximize $\sum_{i = 1}^{I} P r (y_{i} ∣ x_{i}, ϕ) P r (ϕ)$ . Show that the associated loss function of this model is equivalent to L2 regularization.

Recall from Probabilistic interpretation that the regularization term can be considered a prior $P r (ϕ)$ representing some knowledge we have about the parameters.

The posterior objective is then:

\hat{ϕ} = ϕ argmax [i = 1 \prod I P r (y_{i} ∣ x_{i}, ϕ) P r (ϕ)]

We use the log to convert from product to sum:

\hat{ϕ} = ϕ argmax [i = 1 \sum I lo g P r (y_{i} ∣ x_{i}, ϕ) + lo g P r (ϕ)]

The prior term is:

lo g P r (ϕ) = lo g j = 1 \prod J \frac{1}{2 π σ _{ϕ}^{2}} exp (- \frac{ϕ _{j}^{2}}{2 σ _{ϕ}^{2}}) = j = 1 \sum J - \frac{1}{2} lo g (2 π σ_{ϕ}^{2}) - \frac{ϕ _{j}^{2}}{2 σ _{ϕ}^{2}} = j = 1 \sum J C - \frac{ϕ _{j}^{2}}{2 σ _{ϕ}^{2}}

where we collapsed the first term into $C$ because it does not depend on the parameters $ϕ$ .

Therefore, maximizing the log posterior is equivalent to

\hat{ϕ} = ϕ argmax [i = 1 \sum I lo g P r (y_{i} ∣ x_{i}, ϕ) - \frac{1}{2 σ _{ϕ}^{2}} j = 1 \sum J ϕ_{j}^{2}] = ϕ argmax i = 1 \sum I lo g P r (y_{i} ∣ x_{i}, ϕ) - \frac{1}{2 σ _{ϕ}^{2}} ∣∣ ϕ ∣ ∣_{2}^{2}

or equivalently minimizing

L (ϕ) = - i = 1 \sum I lo g P r (y_{i} ∣ x_{i}, ϕ) + λ ∣∣ ϕ ∣ ∣_{2}^{2}

with $λ = - \frac{1}{2 σ _{ϕ}^{2}}$ .

Problem 9.2

How do the gradients of the loss function change when L2 regularization is added?

The parameters are incentivized to stay small (near zero), as larger norm will cause the loss value to be higher.

L (ϕ) = L_{0} (ϕ) + λ ∣∣ ϕ ∣ ∣^{2}

where $L_{0} (ϕ)$ is the regular NLL objective.

Then the gradient becomes:

\nabla_{ϕ} L (ϕ) = \nabla_{ϕ} L_{0} (ϕ) + 2 λ ϕ

Thus, every parameter is pulled toward zero proportionally to its magnitude.

Problem 3

Problem 4

Problem 5

Problem 6

Problem 7

Problem 8

Problem 9

Problem 10

/notes/

Recent

Argument Principle

Contours in Complex Plane

Stable Gain Determination from Nyquist Plots

UDL Chapter 9 Problems

Graph View

Backlinks