data labeling

scroll ↓ to Resources

Contents

Tabular data

  • Use https://www.snorkel.org/ to create labeling heuristics
  • semi-supervised learning: when a small part of the dataset is labeled we can generates labels for unlabeled data by leveraging a model trained on a that small set of labeled data using supervised learning. this is called pseudo-labeling.
  • unsupervised learning using clustering methods such as K-means
    • Metrics
      • inertia: the sum of the distance of all the points from the centroid. If all the points are close to each other, that means they are similar and it is a good cluster. We aim for a small distance for all points from the centroid.
    • Dunn’s index

Image data

  • weak supervision
    • create a lot of noisy labels for each data point based on various heuristics and average them
    • for images the label can be 0 or 1 depending on whether each object of interest is present
  • transfer learning
  • focal loss

Text data

Video data

Audio data

Resources


table file.tags from [[]] and !outgoing([[]])  AND -"Changelog"