principle component analysis

scroll ↓ to Resources

Note

  • new features are linear combinations of existing ones with some weights
  • weights are selected so that the dispersion of a new feature is maximized
    • sum of weights are 1, otherwise we can always increase the dispersion by proportional increase of weights
  • if we want to get m new features, we maximize the dispersion of all m new features
    • all weights for each one of m new features are different
  • prior to PCA the dataset needs to be normalized per feature: subtract the average
    • this allows to compute the optimization formula without subtracting the mean
  • geometrical interpretation: we project the dataset on a new hypersurface
Robust PCA

Resources


Transclude of base---related.base


table file.inlinks, filter(file.outlinks, (x) => !contains(string(x), ".jpg") AND !contains(string(x), ".pdf") AND !contains(string(x), ".png")) as "Outlinks" from [[]] and !outgoing([[]])  AND -"Changelog"