Takes a whole vector and generates as output a vector with the property that , which means we can interpret it as a probability distribution over items:
Softmax is similar to sigmoid in concept (used for outputting probabilities/confidences) but in higher dimensions. Commonly used for multi-class classification.
Stable Softmax
Naive implementation:
def softmax(items_in):
exps = np.exp(items_in)
items_out = exps / np.sum(exps)
return items_outThis has issues with numerical stability. If any element of items_in is large, then np.exp(items_in) will overflow.
Numerically stable implementation:
def softmax(items_in):
shifted = items_in - np.max(items_in)
exps = np.exp(shifted)
return exps / np.sum(exps)This works because subtracting the same constant from every input does not change the output of the softmax:
\frac{e^{x_i-c}}{\sum_j e^{x_j-c}} = \frac{e^{x_i}}{\sum_j e^{x_j}}Proof:
dl Numerically stable softmax ? Subtracting the same constant from every input does not change the output of the softmax. So we subtract the maximum element to prevent overflow.
def softmax(items_in):
shifted = items_in - np.max(items_in)
exps = np.exp(shifted)
return exps / np.sum(exps)+++