How to add and manage ground truth

Ground truth data is data that super.AI knows to be correctly labeled. We use ground truth to:

  • Measure and predict a project’s quality
  • Train machine learning models to assist in data labeling
  • Qualify new human labelers to work on your project
  • Audit existing labelers

This process provides higher quality output, a guarantee on that quality, and cheaper and faster data processing.

This section of the documentation covers the following:

How to add ground truth data

There are two ways to add ground truth data to your project:

  • Submit a new data point along with its correct output
  • Review a processed data point and mark it as correct

How to submit new ground truth data

  1. Head to your super.AI dashboard
  2. Open the relevant project
  3. Click Ground truth data in the left-hand menu
  4. Click Upload ground truth at the top right of the table
  5. There are three options available:
    • Through the super.AI API: consult our API reference documentation to learn more about using the super.AI API
    • Quick create: this lets you upload a single piece of ground truth data using our web UI. It’s the most straightforward method, but it’s slower than using our API or uploading a JSON file, so it’s not the advised option if you have a lot of ground truth data to upload. Once you’ve provided the input and output data, you can hit Submit data on the right side of the screen.
    • Create from JSON: This allows you to enter an array of inputs in one JSON file and a corresponding array of outputs in another, which you then upload. We provide sample input and output JSON files for each project type. You can download these files and replace the examples with your data. Once you’ve uploaded the two JSON files, click Submit data on the right side of the screen.

How to review a processed data point and mark it as correct

Whenever you review completed jobs in your work queue and mark them as correct, the data point is converted to ground truth data. Additionally, if you edit the output of a job, we add the data point to your ground truth dataset.

  1. Head to your super.AI dashboard
  2. Open the relevant project
  3. Click on a completed job in the table, or, if you want to review all unreviewed outputs, click Review results at the top left of the table
  4. If the output is correct, simply click Correct at the top right of the details card
    If the output is not correct, you have the option to click Edit output and correct it. When you save edits to the output, the job is automatically marked as correct and added to your ground truth dataset.

How to edit ground truth data

Editing ground truth might be necessary when you spot an error or your project requirements change. If you need to edit, correct, or review your ground truth data, you can follow the instructions below.

  1. Head to your super.AI dashboard
  2. Open the relevant project
  3. Click Ground truth data in the left-hand menu
  4. Choose a ground truth data point from the table by clicking on it
    • The Order option at the top of the table allows you to change which ground truth data appears at the top of the table
  5. Click Edit ground truth data on the right side of the card
  6. Here, you can change the output of the ground truth data
  7. When you’re happy with your changes, click Update data
  8. You can continue to cycle through your ground truth data using the arrow buttons or the arrow keys on your keyboard and repeat the editing process as necessary

How to download ground truth data

You can download a JSON file containing a collection of information on any of your ground truth data points.

  1. Open your super.AI dashboard
  2. Open the relevant project
  3. Click Ground truth data from the left-hand menu
  4. Select any ground truth data points in the table that you want to download by using the checkboxes on the left of the table
  5. Hit Download selected at the top right of the table

Your download will contain the following information on each ground truth data point:

  • ID
  • Creation time
  • Modified time
  • Project ID (appId)
  • Owner ID
  • Status
  • Job ID (if you review a completed data point and mark it as correct, this is the ID of the reviewed job)
  • Inputs
  • Outputs
  • The number of times this data point has been used for quality assessment
  • Type (i.e., who created the ground truth data)
  • Universally unique identifier (UUID)

The data will be in JSON format. This raw data can be used for analysis outside of the super.AI dashboard.

📘

Converting JSON to CSV

You can convert JSON into CSV format using an online tool such as Convert JSON to CSV, by integrating our API with Google Sheets to import the JSON data before exporting it in CSV format, or by using a command line tool such as jq to reorganize the data format.


Did this page help you?