LogoLogo
Release notesPython SDK docsDocs homeTextual CloudTonic.ai
  • Tonic Textual guide
  • Getting started with Textual
  • Previewing Textual detection and redaction
  • Entity types that Textual detects
    • Built-in entity types
    • Managing custom entity types
  • Language support in Textual
  • Datasets - Create redacted files
    • Datasets workflow for text redaction
    • Creating and managing datasets
    • Assigning tags to datasets
    • Displaying the file manager
    • Adding and removing dataset files
    • Reviewing the sensitivity detection results
    • Configuring the redaction
      • Configuring added and excluded values for built-in entity types
      • Working with custom entity types
      • Selecting the handling option for entity types
      • Configuring synthesis options
      • Configuring handling of file components
    • Adding manual overrides to PDF files
      • Editing an individual PDF file
      • Creating templates to apply to PDF files
    • Sharing dataset access
    • Previewing the original and redacted data in a file
    • Downloading redacted data
  • Pipelines - Prepare LLM content
    • Pipelines workflow for LLM preparation
    • Viewing pipeline lists and details
    • Assigning tags to pipelines
    • Setting up pipelines
      • Creating and editing pipelines
      • Supported file types for pipelines
      • Creating custom entity types from a pipeline
      • Configuring file synthesis for a pipeline
      • Configuring an Amazon S3 pipeline
      • Configuring a Databricks pipeline
      • Configuring an Azure pipeline
      • Configuring a Sharepoint pipeline
      • Selecting files for an uploaded file pipeline
    • Starting a pipeline run
    • Sharing pipeline access
    • Viewing pipeline results
      • Viewing pipeline files, runs, and statistics
      • Displaying details for a processed file
      • Structure of the pipeline output file JSON
    • Downloading and using pipeline output
  • Textual Python SDK
    • Installing the Textual SDK
    • Creating and revoking Textual API keys
    • Obtaining JWT tokens for authentication
    • Instantiating the SDK client
    • Datasets and redaction
      • Create and manage datasets
      • Redact individual strings
      • Redact individual files
      • Transcribe and redact an audio file
      • Configure entity type handling for redaction
      • Record and review redaction requests
    • Pipelines and parsing
      • Create and manage pipelines
      • Parse individual files
  • Textual REST API
    • About the Textual REST API
    • REST API authentication
    • Redaction
      • Redact text strings
  • Datasets
    • Manage datasets
    • Manage dataset files
  • Snowflake Native App and SPCS
    • About the Snowflake Native App
    • Setting up the app
    • Using the app
    • Using Textual with Snowpark Container Services directly
  • Install and administer Textual
    • Textual architecture
    • Setting up and managing a Textual Cloud pay-as-you-go subscription
    • Deploying a self-hosted instance
      • System requirements
      • Deploying with Docker Compose
      • Deploying on Kubernetes with Helm
    • Configuring Textual
      • How to configure Textual environment variables
      • Configuring the number of textual-ml workers
      • Configuring the number of jobs to run concurrently
      • Configuring the format of Textual logs
      • Setting a custom certificate
      • Configuring endpoint URLs for calls to AWS
      • Enabling PDF and image processing
      • Setting the S3 bucket for file uploads and redactions
      • Required IAM role permissions for Amazon S3
      • Configuring model preferences
    • Viewing model specifications
    • Managing user access to Textual
      • Textual organizations
      • Creating a new account in an existing organization
      • Single sign-on (SSO)
        • Viewing the list of SSO groups in Textual
        • Azure
        • GitHub
        • Google
        • Keycloak
        • Okta
      • Managing Textual users
      • Managing permissions
        • About permissions and permission sets
        • Built-in permission sets and available permissions
        • Viewing the lists of permission sets
        • Configuring custom permission sets
        • Configuring access to global permission sets
        • Setting initial access to all global permissions
    • Textual monitoring
      • Downloading a usage report
      • Tracking user access to Textual
Powered by GitBook
On this page
  • Configuring how to handle .docx images
  • Configuring how to handle .docx tables
  • Configuring how to handle .docx comments
  • Configuring whether to redact PDF signatures

Was this helpful?

Export as PDF
  1. Datasets - Create redacted files
  2. Configuring the redaction

Configuring handling of file components

Last updated 14 days ago

Was this helpful?

Required dataset permission: Edit dataset settings

The Dataset Settings panel includes options for how Textual handles the following file components:

  • For .docx files, images and comments

  • For PDF files, scanned-in signatures

To display the Dataset Settings page, on the dataset details page, click Settings.

These options are not available for pipelines that also redact files.

Configuring how to handle .docx images

For .docx images, including .svg files, you can configure the dataset to either:

  • Redact the image content. When you select this option, Textual looks for and blocks out sensitive values in the image.

  • Ignore the image.

  • Replace the images with black boxes.

On the Dataset Settings page, under Image settings for DOCX files:

  • To redact the image content, click Redact contents of images using OCR. This is the default selection.

  • To ignore the images entirely, click Ignore images during scan.

  • To replace the images with black boxes, click Replace images from the output file with black boxes.

Configuring how to handle .docx tables

For .docx tables, you can configure the dataset to either:

  • Redact the table content. When you select this option, Textual detects sensitive values and replaces them based on the entity type configuration.

  • Block out all of the table cells. When you select this option, Textual places a black box over each table cell.

On the Dataset Settings page, under Table settings for DOCX files:

  • To redact the table content, click Redact content using the entity type configuration. This is the default selection.

  • To block out the table content, click Block out all table cell content.

Configuring how to handle .docx comments

For comments in a .docx file, you can configure the dataset to either:

  • Remove the comments from the file.

  • Ignore the comments and leave them in the file.

On the Dataset Settings page, to remove the comments, toggle Remove comments from the output file to the on position. This is the default configuration.

To ignore the comments, toggle Remove comments from the output file to the off position.

Configuring whether to redact PDF signatures

By default, Textual redacts scanned-in signatures in PDF files. You can configure the dataset to instead ignore the signatures.

On the Dataset Settings page:

  • To redact PDF signatures, toggle Detect and redact signatures in PDFs to the on position. This is the default configuration.

  • To ignore PDF signatures, toggle Detect and redact signatures in PDFs to the off position.

Dataset Settings page