Problem 4.1

Consider composing the two neural networks in figure 4.8. Draw a plot of the relationship between the input and output for .

Problem 4.2

Identify the four parameters in figure 4.6.

Problem 4.3

Using the non-negative homogeneity property of the ReLU function (see problem 3.5), show that:

where and are non-negative scalars. From this, we see that the weight matrices can be rescaled by any magnitude as long as the biases are also adjusted, and the scale factors can be re-applied at the end of the network.

The non-negative homogeneity property states that:

We have

as desired.

Problem 4.4

Write out the equations for a deep neural network that takes inputs, outputs and has three hidden layers of sizes , , and , respectively, in both the forms of equations 4.15 and 4.16. What are the sizes of each weight matrix and bias vector ?

Individual equations (like 4.15):

One equation (like 4.16):

Sizes:

Problem 4.5

Consider a deep neural network with inputs, output, and hidden layers containing hidden units each. What is the depth of this network? What is the width?

  • Depth is 20 (number of hidden layers)
    • Or 21 if we count the output layer? In this case, the definition we’re using is “number of layers with parameters” instead of number of hidden layers
  • Width is 30 (number of hidden units in each layer)

Problem 4.6

Consider a network with input, output, and layers, with hidden units in each. Would the number of weights increase more if we increased the depth by one or the width by one?

Original:

  • Input to :
  • Between 9 hidden layers:
  • Last hidden to output:
  • Each hidden layer has 10 biases:
  • Output layer: bias
  • Total:

Increase depth by 1:

  • Input to :
  • Between 9 hidden layers:
  • Last hidden to output:
  • Each hidden layer has 10 biases:
  • Output layer: bias
  • Total:

Increase width by 1:

  • Input to :
  • Between 9 hidden layers:
  • Last hidden to output:
  • Each hidden layer has 10 biases:
  • Output layer: bias
  • Total:

Problem 4.7

Choose values for the parameters for the shallow neural network in equation 3.1 (with ReLU activation functions) that will define an identity function over a finite range .

The function is:

We want it to be the identity function, such that

If we have:

We would get , which only equals and has slope instead of .

Instead, we need

which expands to

If we have:

  • , both ReLUs are 0 so
  • , the first ReLU is active, so we have
  • , both ReLUs are active, so we have

Problem 4.8

Problem 4.9

Problem 4.10

Problem 4.11