Problem 7.1

A two-layer network with two hidden units in each layer can be defined as:

where the functions are ReLU functions. Compute the derivatives of the output with respect to each of the 13 parameters and directly. The derivatives of the ReLU function with respect to its input is the indicator function , which returns one if the argument is greater than zero and zero otherwise.

Output layer:

Second hidden layer: Let us first define

Then:

First hidden layer: Let us first define

Then we have:

Problem 7.2

Problem 7.3

Problem 7.4

Problem 7.5

Problem 7.6

Problem 7.7

Problem 7.8

Problem 7.9

Problem 7.10

Problem 7.11

Problem 7.12

Problem 7.13

Problem 7.14

Problem 7.15

Problem 7.16

Problem 7.17