General Document Processor

This article applies to the latest version of the General Document Processor.

The General Document Processor (GDP) combines state of the art Optical Character Recognition (OCR) techniques with latest super.AI deep learning models to understand and extract data from variety of document types - such as invoices, bill of ladings, purchase orders, passports, business cards or custom documents. The document can be in various formats and quality including captured images, scans and machine readable PDFs.

Human input improves the accuracy per document type over time. The human in the loop (HITL) can be on the customer's side or customers can leverage certified super.AI personnel.

General Document Processor overview

  • The GDP offers pre-trained models for a variety of document types; it doesn't require labeling or training to get started
  • Users can extract via API or UI nested-key-value pairs, text, complex tables, and selection marks
  • Human input improves the accuracy over time per document type
  • You can leverage your own humans or certified super.AI humans

General Document Processor user interface showing a sample document

Key Value Pairs

Key-value pairs are a group of entities within a document that identify a key and its associated value (e.g. day of birth date as the key and its value 2015-04-30). The super.AI model is trained to extract keys and values based on a wide variety of document types, formats, and structures.

Keys can also exist without a corresponding value, e.g. a middle name field may be left blank on a form in some instances. For documents where the same value is described in various ways, e.g. Phone Number and Telephone Number, it will be harmonised on one key.

Nested key-value pairs can be extracted as well. Thus, a parent key with nested key value pairs, e.g. Gender and options to be checked for male, female, diverse, or not applicable.

Wherever feasible the values are standardized on ISO format in the JSON output, e.g. a date is formatted YYYY-MM-DD.

Input requirements

  • Image quality: Garbage in, garbage out... The best results are achieved by providing a sharp and non-distorted scanned image or a machine readable document.
  • Max. number of pages: for PDF up to 50 pages can be processed
  • Max. file size: the file size must be less than 50 MB
  • Image dimensions: between 50 x 50 pixels and 10,000 px x 10,000 pixels.
  • PDF dimensions: up to 17 x 17 inches, corresponding to Legal or A3 paper size, or smaller.
  • Font size: the minimum height of the text to be extracted is 12 pixels for a 1024 x 768 pixel image. This dimension corresponds to about 8-point text at 150 dots per inch (DPI).
  • PDFs with password locks can be processed, you must remove the lock before uploading

Supported file formats

ApplicationPDF (scanned)PDF (machine readable)Image (JPEG, PNG, BMP, and TIFF)
General Document Processor:heavy-check-mark::heavy-check-mark::heavy-check-mark:

Data Extraction

ApplicationTextTablesSelection MarksNested Key-Value PairsStampsSignatures
General Document Processor:heavy-check-mark::heavy-check-mark::heavy-check-mark::heavy-check-mark::heavy-check-mark: *:heavy-check-mark: *

* Released Q4'2022