Problem 5.1

Show that the logistic sigmoid function becomes as , is when , and becomes when , where

For :

For :

For :

Problem 5.2

The loss for binary classification for a single training pair is

Plot this loss as a function of the transformed output (i) when the training label and when (ii) when .

When , we just have . With , we just have :

Problem 5.3

Suppose we want to build a model that predicts the direction in radians of the prevailing wind based on local measurements of barometric pressure . A suitable distribution over circular domains is the von Mises distribution:

  • is a measure of the mean direction

  • is a measure of concentration (i.e. inverse of variance)

  • The term is a modified Bessel function of the first kind of order .

    Use the loss function recipe to develop a loss function for learning the parameter of a model to predict the most likely wind direction. Your solution should treat the concentration as a constant. How would you perform inference?

We set , so

Then the negative log-likelihood loss function is

To perform inference we just take the maximum of the distribution (which is just the predicted parameter ). This might be out of the range , in which case we would add/remove multiples of until it is in the right range.

Problem 5.4

Sometimes, the outputs for input are multimodal; there is more than one valid prediction for a given input. Here, we might use a sum of normal components as the distribution over the output. This is known as a mixture of Gaussians model. For example, a mixture of two Gaussians has parameters :

where controls the relative weight of the two components, which have means and variances , respectively. This model can represent a distribution with two peaks or a distribution with one peak but a more complex shape.

Use the loss function recipe to construct a loss function for training a model that takes input , has parameters , and predicts a mixture of two Gaussians. The loss should be based on training data pairs . What problems do you foresee when performing inference?

Let:

Then the loss is

Inference is a bit trickier in this case since there is no simple closed form for the mode of this distribution.

Problem 5.5

Consider extending the model from problem 5.3 to predict the wind direction using a mixture of two von Mises distributions. Write an expression for the likelihood for this model. How many outputs will the network produce?

Each von Mises distribution is parametrized by . Thus, for a mixture of two von Mises distributions, the parameters will be

where is the relative weight of the two distributions. The likelihood will then be:

Like the mixture of Gaussians above, we would need five outputs, unless we consider and to be constants, in which case we would need 3.

Problem 5.6

Consider building a model to predict the number of pedestrians that will pass a given point in the city in the next minute, based on data that contains information about the time of the day, the longitude and latitude, and the type of neighborhood. A suitable distribution for modeling counts is the Poisson distribution. This has a single parameter called the rate that represents the mean of the distri

Problem 5.7

Problem 5.8

Problem 5.9

Problem 5.10