Softmax

Takes a whole vector $Z \in R^{n}$ and generates as output a vector $A \in [0, 1]$ with the property that $\sum_{i = 1}^{n} A_{i} = 1$ , which means we can interpret it as a probability distribution over $n$ items:

softmax (z) = exp (z_{1}) / \sum_{i} exp (z_{i}) ⋮ exp (z_{n}) / \sum_{i} exp (z_{i})

Softmax is similar to sigmoid in concept (used for outputting probabilities/confidences) but in higher dimensions. Commonly used for multi-class classification.

/notes/

Recent

DSA Depth First Search

Greedy Descent

Arithmetic Crossover

Softmax

Graph View

Backlinks