Ground Truth Data

Learn about the importance of ground truth data within the super.AI platform.

Ground truth data is input data together with its correct output. The better your ground truth data, the better your project’s output will be.

Ground truth provides 3 benefits:

  • Higher quality
  • A guarantee of that quality
  • Cheaper and faster data labeling

How does ground truth data provide higher quality?

The first step to ensuring quality is to be able to measure a project’s output. Ground truth data provides a method of measurement.

We feed tasks made with ground truth data to our labelers. We then evaluate their performance by comparing the labeled output they create to the ground truth output.

Comparing the labeler’s output to the ground truth output allows us to:

  • Train labelers on your data, automatically pointing out where they're making mistakes
  • Screen labelers by ability, so your data is only sent to the best performing labelers
  • Continuously monitor the performance of the labelers working on your data
  • See how labelers’ performance varies over time (e.g., checking whether a labeler is fatigued)

All of these lead to higher quality output.


How does ground truth data guarantee quality?

Because we can measure quality accurately, we can also predict it. This is how we can guarantee the minimum quality of the labels we provide. The measurement and prediction process occurs on every project before you begin paying to have your data labeled.

How does ground truth data lead to faster and cheaper data labeling?

We can also use your ground truth to generate machine learning models that are of a high enough quality to replace human labelers under certain circumstances, thereby lowering your costs and providing a faster turnaround time.

How can I add ground truth data to my project?

There are two ways to add ground truth to your super.AI project: