LogoLogo
Release notesPython SDK docsDocs homeTextual CloudTonic.ai
  • Tonic Textual guide
  • Getting started with Textual
  • Previewing Textual detection and redaction
  • Entity types that Textual detects
    • Built-in entity types
    • Managing custom entity types
  • Language support in Textual
  • Datasets - Create redacted files
    • Datasets workflow for text redaction
    • Creating and managing datasets
    • Assigning tags to datasets
    • Displaying the file manager
    • Adding and removing dataset files
    • Reviewing the sensitivity detection results
    • Configuring the redaction
      • Configuring added and excluded values for built-in entity types
      • Working with custom entity types
      • Selecting the handling option for entity types
      • Configuring synthesis options
      • Configuring handling of file components
    • Adding manual overrides to PDF files
      • Editing an individual PDF file
      • Creating templates to apply to PDF files
    • Sharing dataset access
    • Previewing the original and redacted data in a file
    • Downloading redacted data
  • Pipelines - Prepare LLM content
    • Pipelines workflow for LLM preparation
    • Viewing pipeline lists and details
    • Assigning tags to pipelines
    • Setting up pipelines
      • Creating and editing pipelines
      • Supported file types for pipelines
      • Creating custom entity types from a pipeline
      • Configuring file synthesis for a pipeline
      • Configuring an Amazon S3 pipeline
      • Configuring a Databricks pipeline
      • Configuring an Azure pipeline
      • Configuring a Sharepoint pipeline
      • Selecting files for an uploaded file pipeline
    • Starting a pipeline run
    • Sharing pipeline access
    • Viewing pipeline results
      • Viewing pipeline files, runs, and statistics
      • Displaying details for a processed file
      • Structure of the pipeline output file JSON
    • Downloading and using pipeline output
  • Textual Python SDK
    • Installing the Textual SDK
    • Creating and revoking Textual API keys
    • Obtaining JWT tokens for authentication
    • Instantiating the SDK client
    • Datasets and redaction
      • Create and manage datasets
      • Redact individual strings
      • Redact individual files
      • Transcribe and redact an audio file
      • Configure entity type handling for redaction
      • Record and review redaction requests
    • Pipelines and parsing
      • Create and manage pipelines
      • Parse individual files
  • Textual REST API
    • About the Textual REST API
    • REST API authentication
    • Redaction
      • Redact text strings
  • Datasets
    • Manage datasets
    • Manage dataset files
  • Snowflake Native App and SPCS
    • About the Snowflake Native App
    • Setting up the app
    • Using the app
    • Using Textual with Snowpark Container Services directly
  • Install and administer Textual
    • Textual architecture
    • Setting up and managing a Textual Cloud pay-as-you-go subscription
    • Deploying a self-hosted instance
      • System requirements
      • Deploying with Docker Compose
      • Deploying on Kubernetes with Helm
    • Configuring Textual
      • How to configure Textual environment variables
      • Configuring the number of textual-ml workers
      • Configuring the number of jobs to run concurrently
      • Configuring the format of Textual logs
      • Setting a custom certificate
      • Configuring endpoint URLs for calls to AWS
      • Enabling PDF and image processing
      • Setting the S3 bucket for file uploads and redactions
      • Required IAM role permissions for Amazon S3
      • Configuring model preferences
    • Viewing model specifications
    • Managing user access to Textual
      • Textual organizations
      • Creating a new account in an existing organization
      • Single sign-on (SSO)
        • Viewing the list of SSO groups in Textual
        • Azure
        • GitHub
        • Google
        • Keycloak
        • Okta
      • Managing Textual users
      • Managing permissions
        • About permissions and permission sets
        • Built-in permission sets and available permissions
        • Viewing the lists of permission sets
        • Configuring custom permission sets
        • Configuring access to global permission sets
        • Setting initial access to all global permissions
    • Textual monitoring
      • Downloading a usage report
      • Tracking user access to Textual
Powered by GitBook
On this page
  • Changing the AWS credentials for a pipeline
  • Selecting a location for the output files
  • Indicating whether to also redact the files
  • Filtering files in selected folders by file type
  • Selecting files and folders to process

Was this helpful?

Export as PDF
  1. Pipelines - Prepare LLM content
  2. Setting up pipelines

Configuring an Amazon S3 pipeline

Last updated 14 days ago

Was this helpful?

Required pipeline permissions:

  • Edit pipeline settings

  • Manage the pipeline file list

For an Amazon S3 pipeline, the settings include:

  • AWS credentials

  • Output location

  • Whether to also generate redacted versions of the original files

  • Selected files and folders

Changing the AWS credentials for a pipeline

When you create a pipeline that uses files from Amazon S3, you are prompted to provide the credentials to use to connect to Amazon S3.

From the Pipeline Settings page, to change the credentials:

  1. Click Update AWS Credentials.

  1. Provide the new credentials:

    1. In the Access Secret field, provide the secret key that is associated with the access key.

    2. From the Region dropdown list, select the AWS Region to send the authentication request to.

    3. In the Session Token field, provide the session token to use for the authentication request.

  2. To test the connection, click Test AWS Connection.

  3. To save the new credentials, click Update AWS Credentials.

Selecting a location for the output files

On the Pipeline Settings page, under Select Output Location, navigate to and select the folder in Amazon S3 where Textual writes the output files.

When you run a pipeline, Textual creates a folder in the output location. The folder name is the pipeline job identifier.

Within the job folder, Textual recreates the folder structure for the original files. It then creates the JSON output for each file. The name of the JSON file is <original filename>_<original extension>_parsed.json.

If the pipeline is also configured to generate redacted versions of the files, then Textual writes the redacted version of each file to the same location.

For example, for the original file Transaction1.txt, the output for a pipeline run contains:

  • Transaction1_txt_parsed.json

  • Transaction1.txt

Indicating whether to also redact the files

By default, when you run an Amazon S3 pipeline, Textual only generates the JSON output.

To also generate versions of the original files that redact or synthesize the detected entity values, toggle Synthesize Files to the on position.

For information on how to configure the file generation, go to Configuring file synthesis for a pipeline.

Filtering files in selected folders by file type

One option for selected folders is to filter the processed files based on the file extension. For example, in a selected folder, you might only want to process .txt and .csv files.

Under File Processing Settings, select the file extensions to include. To add a file type, select it from the dropdown list. To remove a file type, click its delete icon.

Note that this filter does not apply to individually selected files. Textual always processes those files regardless of file type.

Selecting files and folders to process

Under Select files and folders to add to run, navigate to and select the folders and individual files to process.

To add a folder or file to the pipeline, check its checkbox.

When you check a folder checkbox, Textual adds it to the Prefix Patterns list. It processes all of the applicable files in the folder, based on whether the file type is a type that Textual supports and whether it is included in the file type filter.

When you click the folder name, it displays the folder contents.

When you select an individual file, Textual adds it to the Selected Files list.

To delete a file or folder, either:

  • In the navigation pane, uncheck the checkbox.

  • In the Prefix Patterns or Selected Files list, click its delete icon.

In the Access Key field, provide an AWS access key that is associated with an IAM user or role. For an example of an IAM role that has the required permissions for an Amazon S3 pipeline, go to .

Example IAM role for Amazon S3 pipelines
Settings page for an Amazon S3 pipeline
AWS Credentials fields for an Amazon S3 pipeline
Panel to update the AWS credentials for an Amazon S3 pipeline
Output location configuration for an Amazon S3 pipeline
Synthesize Files option to create redacted versions of pipeline files
File type filtering for an Amazon S3 pipeline
Selected file and folder for an Amazon S3 pipeline