softmax
scroll ↓ to Resources
Note
- normalized exponential function, converts a vector of K real numbers into a probability distribution of K possible outcomes
- used in multi-class classification problems (including next token prediction tasks) as a generalization of logistic regression, see ^027f8a
Formula

- of input elements []
- Output of softmax is always a positive number regardless of the input sign
- not invariant under scaling: is not equal to
- temperature scaling: dividing the exponent by a parameter temperature T allows for controlling entropy and affecting the output distribution - making it more uniform (large T) or sharp, confident (low T).
- ==See also Briefly about transformer’s evolution or why is softmax cool==
Resources
Links to this File
table file.inlinks, filter(file.outlinks, (x) => !contains(string(x), ".jpg") AND !contains(string(x), ".pdf") AND !contains(string(x), ".png")) as "Outlinks" from [[]] and !outgoing([[]]) AND -"Changelog"