softmax

scroll ↓ to Resources

Note

  • normalized exponential function, converts a vector of K real numbers into a probability distribution of K possible outcomes
  • used in multi-class classification problems (including next token prediction tasks) as a generalization of logistic regression, see ^027f8a

Formula

  • of input elements []
  • Output of softmax is always a positive number regardless of the input sign
  • not invariant under scaling: is not equal to
  • temperature scaling: dividing the exponent by a parameter temperature T allows for controlling entropy and affecting the output distribution - making it more uniform (large T) or sharp, confident (low T).
  • ==See also Briefly about transformer’s evolution or why is softmax cool==

Resources


table file.inlinks, filter(file.outlinks, (x) => !contains(string(x), ".jpg") AND !contains(string(x), ".pdf") AND !contains(string(x), ".png")) as "Outlinks" from [[]] and !outgoing([[]])  AND -"Changelog"