For Amazon S3 pipelines, you connect to S3 buckets to select and store files.
On self-hosted instances, you also configure an S3 bucket and the credentials to use to store files for:
Uploaded file pipelines. The S3 bucket is required for uploaded file pipelines. The S3 bucket is not used for pipelines that connect to Azure Blob Storage or to Databricks Unity Catalog.
Dataset files. If you do not configure an S3 bucket, then the files are stored in the application database.
Individual files that you send to the SDK for redaction. If you do not configure an S3 bucket, then the files are stored in the application database.
Here are examples of IAM roles that have the required permissions to connect to Amazon S3 to select or store files.
For uploaded file pipelines, datasets, and individual file redactions, the files are stored in a single S3 bucket. For information on how to configure the S3 bucket and the corresponding access credentials, go to Setting the S3 bucket for file uploads and redactions.
The IAM role that is used to connect to the S3 bucket must be able to read files from and write files to it.
Here is an example of an IAM role that has the permissions required to support uploaded file pipelines, datasets, and individual redactions:
The access credentials that you configure for an Amazon S3 pipeline must be able to navigate to and select files and folders from the appropriate S3 buckets. They also need to be able to write output files to the configured output location.
Here is an example of an IAM role that has the permissions required to support Amazon S3 pipelines: