Activation functions aim to introduce non-linearity or certain characteristics to artificial neurons.
- They transform the summed weighted input from the node into an output value to be fed to the next hidden layer.
- They are also used to transform the summed weighted input to a final output, often for the sake that is interpretability or normalization.
Why Activation Functions?
Activation functions serve to increase the representational capacity of the network. What happens if we don’t have an activation function or we let be the identity?
If is the identity, in a network with layers, we would have:
Multiplying out the weight matrices:
which is a linear function of . Having all those layers did not change the representational capacity of the network; the non-linearity of the activation function is crucial.