Convolution layers perform the operation of a Convolution Filter. The goal is to have each bank of an Image Filter Bank to correspond to a single neural network layer. The values in the filters/kernels are the weights in this case (plus an offset bias for each filter); these weights can then be trained with Gradient Descent.

The same weights are used many many times in the computation of each layer. This weight sharing means that we can express a transformation on a large image with relatively few parameters; it also means we’ll have to take care in figuring out exactly how to train it.

A convolution layer/filter layer is formally defined with:

  • Number of filters
  • Size of one filters is (plus 1 bias value for this kernel)
  • Stride is the spacing at which we apply the filter to the image
  • Input tensor size
  • Padding is how many extra pixels (usually with value ) are added around the edges of the input. For an input of size , our new effective input size with padding becomes

This layer will produce an output of size , where

Any bias terms are simply applied with element-wise addition.