Takes a whole vector and generates as output a vector with the property that , which means we can interpret it as a probability distribution over items:
Softmax is similar to sigmoid in concept (used for outputting probabilities/confidences) but in higher dimensions. Commonly used for multi-class classification.