Database Mapping (SQL)

Link a database to specific fields within super.AI's General Document Process (GDP). Match extracted data with values in the database.

The General Document Processor (GDP) is a tool that allows users to process documents and extract information from them. The data extracted from the document can then be matched to information in a dynamic external database.

Here, we show you how to link such a database to specific fields or table columns in GDP, so that the extracted data can be matched with values in the external database.


Field Parameters

Fields

At least two fields must be defined:

  1. Field of type “DB Key”. This is a string field, which will be extracted from input document and represents a key in an external table. It has the following required properties:
    1. DB Secret - name of secret with database credentials. This secret will be created and stored by administrators.
    2. Table name - name of a target table in the external database
    3. Column - name of a column in the table, which must be used to match extracted values of this field.
  2. Field of type “DB Column”. This is the field, which will be populated with values from the external DB. It has the following required properties:
    1. DB Key Field ID - ID of the corresponding field of type “DB Key”.
    2. Column - name of a column in the table that is used to extract value for this field.
    3. Content Type - type of values in the DB. Only String, Integer and Float types are supported.

🚧

Notes

  1. Data extraction happens during the post-processing step, see job states.
  2. It is possible to have multiple fields of type “DB Column” referenced the same field of type “DB Key”.
  3. Only the first extracted value of “DB Key” field is used. If there were extracted more than one value, all the rest are ignored.

Custom Table Headers

There are custom table headers of types “DB Key” and “DB Column” with exactly the same properties as fields. Extraction has the same logic except the following differences:

  1. If a table doesn’t have columns associated with table headers of type “DB Column”, they will be created and added to the end of column list.
  2. Values are extracted for each “non-header” row of the extracted table. Row is “non-header” if it contains no cells with cellType Header.

Example

Suppose that we defined the following fields:

  • “Invoice ID” (field ID: invoice_id) of type “DB Key”. We want to extract it from the document and find in the DB. Properties:
    • DB Secret : “dataprograms_example_db” (the secret with this name was created in advance by administrators)
    • Table name: “invoices”
    • Column: "invoice_pk”
  • “Compony Name” (field ID: company_name) of type “DB Column”. We want to fetch it from the DB and add to the job output. Properties:
    • DB Key Field ID: “invoice_id”
    • Column: “cmpn_name”
    • Content Type : “String”

Only the “Invoice ID” data will be extracted from the document, “Company Name” will be ignored during the extraction stage. Let’s suppose that the “Invoice ID” 123456-78 was extracted from the document. Job output looks like (simplified):

{"invoice_id": [{"content": "123456-78"}]}

The following will happen at the post-processing stage:

  • Data program will use value from property DB Secret of field invoice_id to connect to the database.
  • It will send the following request to the DB to extract Company Name from the DB (simplified):
    SELECT cmpn_name FROM invoices WHERE invoice_id = '123456-78'
    
  • Once the company name is received, it will be added to the job output.

For example, let’s say that DB contains company name “Example Company” for invoice ID 123456-78. Then the final job output will look like this:

 {
   "invoice_id": [{"content": "123456-78"}],
   "company_name": [{"content": "Example Company"}]
 }


Supported Databases

The following databases are supported out of the box:

  • PostgreSQL
  • MySQL and MariaDB
  • SQLite
  • Oracle
  • Microsoft SQL Server

Additional databases can be supported by request.


🚧

Current limitations

  • Credentials: Database credentials must be included to a connection URI. Here is an example of Postgres connection URI with username and password:
    postgresql://{username}:{password}@{host}:{port}/database
    
  • Supported types: keys must have type String, columns must have one of the following types: String, Integer or Float