Database Mapping

Link a database to specific fields within super.AI's General Document Process (GDP). Match extracted data with values in the database.

The General Document Processor (GDP) is a tool that allows users to process documents and extract information from them. This feature adds the ability to link a database to specific fields in GDP, so that the extracted data can be matched with values in the database.

How to set up the database connection

  1. When creating a new field in GDP, select the data type "Calculated".
  2. Enter the data set URI. This is the location of the database you want to link to GDP. This can be public URL or data:// linking to your super.AI storage.
  3. Define which fields to match against the data in the dataset. Select the fields in GDP that you want to compare with the values in the database.
  4. Define the number of candidates to present for human review. This is the number of matches that will be returned for interactive selection. If set to 0, matching will be disabled.
  5. Define the matching threshold. This is the percentage of fields that should match for automatic assignment. For example, a threshold of 0.8 means that 80% of the fields must match for the value to be automatically assigned.
  6. Define the Output Column Name. This is the name of the column in the output.
  7. Click "Setup" to link the database to GDP and begin matching values.

Once the fields are set up, GDP will look up values in the database and match them to the extracted data from your documents.
If the matching threshold is met, the corresponding value will be automatically assigned to the output field.
If there are multiple matches found, the number of candidates as defined, will be shown for human review, and users can select the correct one.

By setting up the matching threshold, you can control the level of confidence in automatic matching. Lowering threshold would increase the chance of getting matched but increase the chance of mismatches, while raising the threshold decrease the chance of mismatches but also reduce the chance of getting matched.

This feature is useful if you have a specific set of values that should be extracted from your documents, and you want to ensure that the extracted data matches one of those values. It allows to improve extraction quality and automate process to some extent.

Writable Dataset

Overview

The Writable Dataset feature in Super.AI enables users to enhance their datasets by appending new, previously unseen entries. This feature is primarily designed for an append-only mode, where new entries can be added post a manual review.

Functionality

Append-Only Dataset: The dataset managed by Super.AI allows for the addition of new entries. These entries are subject to a manual review process before being incorporated into the dataset.
Manual Review Threshold: A new setting that controls the manual review process, ensuring quality and relevance of the appended data.
Integration with Existing Datasets

Automatic Matching: The system tries to automatically match new entries with existing data in the user's datasets.
Manual Review Trigger: If the automatic matching score falls below a pre-set automatic threshold, it initiates a human review task.
Human Review Task: Allows manual verification and addition of new entries, specifically for matching based on the output column name.
Editing and Annotation

Edit Job Feature: Users can add new entries to the dataset via the Edit Job feature, enhancing the dataset with new annotations.

API Interaction

CRUD Operations: The /v1/field_datasets/data/{app_uuid}/{field_id} endpoint allows for Create, Read, Update, and Delete operations.
Field ID: Corresponds to a unique identifier generated during the app creation process.
Data Format: The API interacts with the dataset in CSV string format.
Dataset Reset: Achievable by making a POST request with an empty CSV.
Appending Data: Done via PUT method, with the header row being ignored during the append.
Retrieving Data: The GET method returns the complete dataset in CSV format.

Multi-Instance Support

Overview

Multi-Instance Support allows for the matching of a single field to multiple values. This feature is useful in scenarios where a field might correspond to multiple annotations or entries.

Functionality

List of Annotations: Enables the creation of lists of annotations for fields that have multiple corresponding values.

Programmatic Dataset Update

Overview

Super.AI offers the capability to update datasets programmatically for each field.

Functionality

API Endpoint: Utilize the /field_datasets/params API endpoint for updating datasets.

Reprocessing Jobs: Option to reprocess completed jobs with the newly provided dataset via the same API call.

This functionality allows for efficient and dynamic updating of datasets, keeping them relevant and up-to-date.