Project measurements

Overview

The project measurement feature enables you to assess the quality of the data processing application by analyzing its performance against existing ground truth. This tool is designed to help optimize application settings by providing insightful metrics and comparisons.

Note that, internally projects are referred to as applications, as such some APIs are using the application measurement name for this feature.

🚧
Running measurements is using processing capacity of the super.AI platform and will incur cost.

Key Features

Performance Measurement: Evaluate your application's performance using current settings against a predefined ground truth.
Metrics Tracking: View predefined metrics over time to monitor progress and identify trends.
Settings Comparison: Compare the performance of various application settings to determine which yields the best results.
Performance Monitoring: Keep an eye on performance improvements or regressions over time to ensure optimal operation.
Trigger: Trigger measurements manually via UI and API or via a fixed schedule.
Result Integration: Upon completion, new metric records are stored in a database. They are shown on the Statistics page and are available via API.

Prerequisites

Before starting a measurement process, ensure that your application has ground truth available. Ground truth can be established by reviewing completed jobs, serving as a baseline for performance assessment.

Understanding Performance

In this context, performance refers to the quality of machine learning predictions (or human answers).

How to Use

Prepare Ground Truth: Confirm that ground truth data is available and accurate.
Initiate Measurement: There are three ways to start measurements
1. UI: Manually trigger the measurement process from the Ground-truth page under Run measurement or right at the Save changes dialogue when updating the job settings.
2. Automatic: Under Measurement settings enable an automatic measurement frequency. This will schedule periodic cronjobs.
3. API: To start measurement operations manually, check out Create a new operation for the measurer.
Monitor Process: The process will take a few minutes. During this time, special measurement jobs are executed but not displayed in the job list. Currently, the progress is not shown in the user interface but can be checked via API: Get a specific operation for the measurer.
View Results: After completion, check the Performance over time - by metric or Performance over time - by field panels on the Statistics page. This allows you to see long term trends of the specific metrics. To use the metric values in your own data pipeline checkout this API to retrieve them.

Metrics

The available set of metrics can differ based on the application type. For our document extraction use case the following metrics are available:

Accuracy: Measures the proportion of true results among the total number of cases examined.
Recall: Quantifies the number of correct positive predictions made out of all actual positives.
Precision: Calculates the proportion of correct positive predictions in relation to all positive predictions made.
F1: Provides a harmonic mean of precision and recall, balancing the two in situations of uneven class distributions.
Lev_dist (Levenshtein Distance): Assesses the minimum number of edits required to change one string into another.
bbox_iou (Bounding Box Intersection Over Union): Evaluates the overlap between predicted and actual bounding boxes as a measure of accuracy in object detection.
bbox_iou_eq (Bounding Box IOU Equality): Determines the equality of the Intersection Over Union metric specifically for bounding boxes, useful in precise object location assessments.

Example Project measurement

An example output of the API ( Get measurements for the given app_id ) looks like this:

{
  "app_id": "...",
  "records": [
    {
      "field_name": "Invoice Date",
      "metric_name": "accuracy",
      "value": 1,
      "created_at": "2024-05-30T10:15:01",
      "dataset_id": "a0cd8a81-538c-481c-836b-03decccacae7",
      "parameter_set_id": "66f6da15-6c0f-4133-a49c-cfe5beb67271"
    },
    {
      "field_name": "Invoice Number",
      "metric_name": "accuracy",
      "value": 1,
      "created_at": "2024-05-30T10:15:01",
      "dataset_id": "a0cd8a81-538c-481c-836b-03decccacae7",
      "parameter_set_id": "66f6da15-6c0f-4133-a49c-cfe5beb67271"
    }
  ]
}

Notice that each metric is recorded on a granular field level. This allows evaluating performance for specific fields.

🚧
Launching a new measurement process will currently automatically cancel any previously running process.