How to extract field names with python via API

  • This notebook is a quick introduction to how to use the super.AI API to extract annotations from a job, and transform them into a custom format

Getting Started with Super.AI API

  • We will be interacting with the super.AI API
  • We also need 2 other opensource tools requests and pandas for the API requests and data transformations respectively.

Setting up the API client

Before interacting with SuperAI to parse job responses, you'll need to set up a client. This client will facilitate communication with the SuperAI system, enabling you to send requests and receive responses. Below, you'll find a step-by-step guide on how to create and configure the client for your specific needs.


  1. Create api.py file and paste the following code
import requests


class APIClient:
    def __init__(self, base_url: str, api_key: str):
        self.base_url = base_url
        self.session = requests.Session()
        self.session.headers.update({
            "API-KEY": f"{api_key}",
            "Content-Type": "application/json"
        })

    def _make_api_call(self, endpoint: str, method="GET", params=None, data=None):
        """
        Generalizes API calls to handle various methods and endpoints.

        :param endpoint: The API endpoint (relative to base_url)
        :param method: HTTP method (GET, POST, etc.)
        :param params: URL parameters (optional)
        :param data: Request payload (optional)
        :return: JSON response if successful, None if unsuccessful
        """
        url = f"{self.base_url}{endpoint}"
        try:
            if method.upper() == "GET":
                response = self.session.get(url, params=params)
            elif method.upper() == "POST":
                response = self.session.post(url, json=data)
            elif method.upper() == "PUT":
                response = self.session.put(url, json=data)
            elif method.upper() == "DELETE":
                response = self.session.delete(url, json=data)
            else:
                raise ValueError(f"Unsupported HTTP method: {method}")

            response.raise_for_status()  # Raise an HTTPError for bad responses
            return response.json()  # Return the JSON response

        except requests.exceptions.RequestException as e:
            print(f"Error making API call: {e}")
            return None

    def get_job_response(self, job_id: str):
        """
        Retrieves job response from the API.

        :param job_id: Job ID
        :return: JSON response or None if failed
        """
        endpoint = f"/v1/jobs/{job_id}/response"
        return self._make_api_call(endpoint)

    def get_app_schema(self, app_id: str):
        """
        Retrieves app schema from the API.

        :param app_id: Application ID
        :return: JSON response or None if failed
        """
        endpoint = f"/v1/apps/{app_id}"
        return self._make_api_call(endpoint)




This is a general API client which we will use to communicate with the super.AI platform


  1. Import the APIClient from the api.py file in your main python program (ex: main.py)
from api import APIClient

  1. Configure the client (main.py), make sure to enter your API Key
api_client = APIClient(base_url="https://api.super.ai", api_key="YOUR_API_KEY")

Fetching the relevant data from Super.AI

We will need to call the super.AI API to get the result of a processed job and the output schema, which are necessary for the final transformation.


  1. Use the api client you just created to fetch an individual job response (main.py), make sure to replace JOB_ID with a relevant job_id
job_response = api_client.get_job_response(job_id="JOB_ID")

  1. Use the client to fetch the job output schema needed for the transformation (main.py), make sure to replace APP_ID with a relevant app_id
app_schema = api_client.get_app_schema(app_id="APP_ID")

  1. Now we will need the annotations and annotation schema from the data we have. Extract the annotations from the job response (main.py)
annotations = job_response.get("response", {}).get("annotations", {})

  1. Extract the application schema (main.py)
annotations_schema = app_schema.get("outputSchema", {}).get("definitions", {}).get("AnnotationModel", {}).get(
    "properties", {})

Transforming Data

To transform the data we will create some handy utility function in a new file called util.py


  1. Create the util.py file and paste the following:
import pandas as pd
from itertools import chain
from collections.abc import Iterable


class DataTransformationUtils:
    @staticmethod
    def _merge_arrays_by_id(array1, array2):
        """
        Merges two arrays of objects by their `id` property.

        :param array1: First array of objects.
        :param array2: Second array of objects.
        :return: Merged list of objects based on `id`.
        """
        df1 = pd.DataFrame(array1)
        df2 = pd.DataFrame(array2)

        # Merge on `id` and fill missing values
        merged = pd.merge(df1, df2, on="id", how="left")
        return merged.to_dict(orient="records")

    @staticmethod
    def _transform_to_table(cells_array):
        """
        Transforms a cell-based structure into a table-like array of objects.

        :param cells_array: The array containing cell-based data with row and column info.
        :return: Transformed data as a list of dictionaries representing the table.
        """
        is_horizontal = not cells_array.get("orientation") or cells_array["orientation"] == "horizontal"
        cells = cells_array["cells"]

        # Create a DataFrame from the cells
        df = pd.DataFrame(cells)

        # Create headers
        if is_horizontal:
            headers = df[df["rowIndex"] == 1].set_index("columnIndex")["content"].to_dict()
        else:
            headers = df[df["columnIndex"] == 1].set_index("rowIndex")["content"].to_dict()

        # Filter non-header rows and columns
        data_cells = df[
            (df["rowIndex"] != 1 if is_horizontal else df["columnIndex"] != 1)
        ]

        # Map the content into rows and headers
        data_cells["header"] = data_cells.apply(
            lambda x: headers.get(x["columnIndex"] if is_horizontal else x["rowIndex"], None),
            axis=1,
        )

        # Pivot data into a table format
        table = data_cells.pivot_table(
            index=(data_cells["rowIndex"] - 2) if is_horizontal else (data_cells["columnIndex"] - 2),
            columns="header",
            values="content",
            aggfunc="first",
        ).reset_index(drop=True)

        return table.to_dict(orient="records")

    @staticmethod
    def _transform_job_annotations(obj):
        """
        Transforms annotations into a standardized format.

        :param obj: Dictionary of annotations with `id` as the key.
        :return: List of standardized annotation dictionaries.
        """
        annotations = [
            {
                "id": key,
                "value": list(
                    chain.from_iterable(
                        DataTransformationUtils._transform_to_table(item["content"])
                        if "cells" in item.get("content", {})
                        else [item["content"]]
                        for item in annotation
                        if item.get("content") and isinstance(item["content"], Iterable) and not isinstance(
                            item["content"], int)
                    )
                ),
            }
            for key, annotation in obj.items()
            if isinstance(annotation, list) and annotation
        ]
        return [ann for ann in annotations if ann["value"]]

    @staticmethod
    def _transform_job_schema(obj):
        """
        Transforms schema fields into a simplified format.

        :param obj: Dictionary containing schema fields.
        :return: List of dictionaries with `id` and `title` properties.
        """
        return [
            {"id": key, "title": field["title"]}
            for key, field in obj.items()
            if field.get("title")
        ]

    @classmethod
    def process_annotations(cls, annotations_schema, annotations):
        """
        Processes annotations using internal transformation methods and merges them with schema.

        :param annotations_schema: Dictionary containing schema fields.
        :param annotations: Dictionary of annotations with `id` as the key.
        :return: Dictionary mapping schema titles to corresponding annotation values.
        """
        schema = cls._transform_job_schema(annotations_schema) if annotations_schema else []
        data = cls._transform_job_annotations(annotations) if annotations else []
        result = {
            item["title"]: item["value"]
            for item in cls._merge_arrays_by_id(data, schema)
        }
        return result


  1. Import the DataTransformationUtils class in your main.py file and apply the transformations
result = DataTransformationUtils.process_annotations(annotations_schema, annotations)

  1. Print the result
print(result)

The result will be an object with a key/value pair, the key being the field name, and value being the extracted value.


Example output:

{
  "Invoice Date": ["30.9.2022"],
  "Invoice Number": ["14163-5"],
  "PO number": ["PO-57392018"],
  "Consignor Name": ["Tyrell Inc."],
  "Consignor Address": ["Mill Rd, Worthing BN11 4GU, UK"],
  "Supplier Tax ID": ["54-1234567"],
  "Consignee Name": ["Monster GmbH"],
  "Consignee Address": ["Teststr. 13, 20100 Hamburg, Germany"],
  "Receiver Tax ID": ["98-7654321"],
  "Delivery Date": ["3.10.2023"],
  "Due Date": ["3.11.2023"],
  "Subtotal": ["USD 9050.00"],
  "Total Tax": ["USD 100.00"],
  "Total Amount": ["USD 10050.00"],
  "Shipped items list": [
    {
      "Description": "Lorem ipsum dolor sit\namet, consectetur\nipsum",
      "Id": "ZU781298",
      "Quantity": "3",
      "Total (USD)": "300",
      "Unit Price (USD)": "100"
    },
    {
      "Description": "Sed ut perspiciatis unde\nomnis iste natus",
      "Id": "ZU781432",
      "Quantity": "5",
      "Total (USD)": "1000",
      "Unit Price (USD)": "200"
    },
    {
      "Description": "Ut enim ad minima\nveniam, quis nostrum\nexercitationem",
      "Id": "ZU781753",
      "Quantity": "10",
      "Total (USD)": "2500",
      "Unit Price (USD)": "250"
    }
  ]
}