Amazon S3 Integration

Learn to seamlessly connect an S3 bucket to your project on the super.AI platform.

Our S3 integration allows you to seamlessly connect your S3 bucket to your project on the super.AI platform, automating ingestion of files and collecting the results. The integration enables customers to process your documents in batches by creating a top-level folder, and once all documents in a batch have been processed, a CSV file is generated with the results of that batch.

To integrate your S3 bucket with Super.AI, you will need to grant us access to a specific bucket. Please refer to the AWS S3 Setup of our documentation for step-by-step instructions on how to set up the integration.

How do I set it up?

Each super.AI project can be linked to an S3 bucket. Please follow this instructions in order to grant super.AI read and write access to the desired bucket. Once your AWS account has been configured, share the bucket name and your project name with your account manager so that your bucket gets registered in super.AIs backend.

How does it work?

Once the link between your bucket and the project has been established, super.AI takes care of the heavy lifting. There are two processing modes, single documents and batch mode.

Single document processing

In this basic processing mode, each document uploaded to the top level of the S3 bucket is automatically picked up by Super.AI and submitted to your project for processing. Once the datapoint is processed, you can find the JSON output in your bucket under the .superai
folder.

Batch processing

Batch processing is also supported by the S3 integration. In order to process a batch of jobs you should:

  1. Create a folder within the root directory of the S3 bucket. e.g. s3://bucketName/firstBatch (Note: Only folders at the top level are going to be taken into account. Subfolders are ignored and the contents of those folders is not processed)
  2. (Optional) Metadata: You can provide custom metadata fields for each document that is going to be processed. The metadata has to provided in the form of a csv file called metadata.csv stored in the batch folder, e.g. s3://bucketName/firstBatch/metadata.csv . This file should contain in the first column the filename. The subsequent columns are considered to be metadata and they will be included in the csv output (see step 5)
  3. Add all files that you want to process to the folder.
  4. The processed output is going to be stored in the .superai folder with the same folder name e.g. s3://bucketName/.superai/firstBatch
  5. Once all files in the batch are processed, a CSV file with the outputs of all files is created in the output folder. For example, s3://bucketName/.superai/firstBatch. If metadata was provided, the CSV file will include the specified metadata fields.

With Super.AI's S3 integration, you can automate your data processing and streamline your workflow, saving you time and effort. For more detailed instructions on setting up the integration and using the different processing modes, please refer to our documentation.

AWS S3 setup

Import your S3 bucket data via IAM Delegated Access.

  1. Create an S3 bucket (AWS docs)

    1. In your AWS account, go to your S3 Management Console
    2. Click on Create bucket button located on the top left screen
    3. Provide bucket name
    4. Select the desired AWS Region (We recommend using eu-central-1 to avoid incurring in out of region transfer costs)
  2. Add bucket policy

    1. In your AWS account, go to your S3 Management Console

    2. Click the bucket name created in step 1 in the list of buckets

    3. Navigate to permissions tab

    4. Copy paste the following bucket policy substituting the following varialbles:

      1. <CUSTOMER.BUCKET_NAME> → Bucket name created in step 1
      {
         "Version": "2012-10-17",
         "Statement": [
            {
               "Sid": "Access to super-ai account",
               "Effect": "Allow",
               "Principal": {
                  "AWS": "arn:aws:iam::124137158684:root"
               },
               "Action": [
                  "s3:GetObject",
                  "s3:PutObject"
               ],
               "Resource": [
                  "arn:aws:s3:::<CUSTOMER.BUCKET_NAME>/*"
               ]
            }
         ]
      }
      
  3. Configure CORS headers

    • CORS allows super.AI to request resources from your cloud storage

      1. In your AWS account, go to your S3 Management Console

      2. Click the bucket name created in step 1 in the list of buckets.

      3. Go to the Permissions tab.

      4. In the Cross-origin resource sharing (CORS) section, click Edit.

      5. Paste the following configuration in the text field.

        [
            {
                "AllowedHeaders": [
                    "*"
                ],
                "AllowedMethods": [
                    "GET"
                ],
                "AllowedOrigins": [
                    "https://super.ai",
                    "https://app.super.ai",
                    "https://api.super.ai",
                    "https://production.super.ai"
                ],
                "ExposeHeaders": []
            }
        ]
        
      6. Click Save changes.

  4. Create a role for super.AI in your AWS account

    You will need to create a role for super.AI in your AWS account, specify permissions, and select a bucket. Follow the steps below to set this up in your AWS account.

    1. In your AWS account, create a permission policy for your bucket. If you already have a permission policy you plan to use, proceed to step 5. In your IAM Management Console, go to the Policies section, click Create policy, and enter your policy in the JSON tab. This sample policy restricts access to a specific S3 bucket.

      IMPORTANT: The resource section in the Policy indicates the bucket that you are granting access to. So please make sure to replace CustomerBucketARN with the appropriate bucket name.

      {
          "Version": "2012-10-17",
          "Statement": [
              {
                  "Effect": "Allow",
                  "Action": [
                      "s3:GetObject",
                      "s3:PutObject",
                      "s3:ListBucket",
                      "s3:PutBucketNotification",
                      "s3:GetBucketNotification"
                  ],
                  "Resource": [
                      "arn:aws:s3:::<CUSTOMER.BUCKET_NAME>",
                      "arn:aws:s3:::<CUSTOMER.BUCKET_NAME>/*"
                  ]
              }
          ]
      }
      
    2. Click Next: Review to bypass the optional Add tags step. Tags are not required to set up this integration.

    3. In the Review policy
      step, name the policy you just created. We recommend naming it something like SuperaiReadWriteAccess

    4. To approve, click Create policy

    5. From the Roles page, follow these steps:

      a. Click Create role.

      b. Select AWS account followed by the radio button for Another AWS account.

      c. Paste the super.AI Account ID 124137158684

      d. Do not check the box for Require MFA.

      e. Click Next: Permissions.

    6. In the Attach permissions policies section, check the box next to the permission policy you created to attach it to your role. Or you can select a policy in the list provided (e.g., AmazonS3ReadOnlyAccess).

    7. Click Next: Tags

    8. Click Next: Review to bypass the optional Add tags step. Tags are not required to set up this integration.

    9. Name the role you created for super.AI. We recommend naming it something like SuperaiS3AccessRole

    10. When you are done reviewing, click Create role

    11. Click on the role you just created and copy the Role ARN at the top of the Summary tab