LogoLogo
Release notesPython SDK docsDocs homeTextual CloudTonic.ai
  • Tonic Textual guide
  • Getting started with Textual
  • Previewing Textual detection and redaction
  • Entity types that Textual detects
    • Built-in entity types
    • Managing custom entity types
  • Language support in Textual
  • Datasets - Create redacted files
    • Datasets workflow for text redaction
    • Creating and managing datasets
    • Assigning tags to datasets
    • Displaying the file manager
    • Adding and removing dataset files
    • Reviewing the sensitivity detection results
    • Configuring the redaction
      • Configuring added and excluded values for built-in entity types
      • Working with custom entity types
      • Selecting the handling option for entity types
      • Configuring synthesis options
      • Configuring handling of file components
    • Adding manual overrides to PDF files
      • Editing an individual PDF file
      • Creating templates to apply to PDF files
    • Sharing dataset access
    • Previewing the original and redacted data in a file
    • Downloading redacted data
  • Pipelines - Prepare LLM content
    • Pipelines workflow for LLM preparation
    • Viewing pipeline lists and details
    • Assigning tags to pipelines
    • Setting up pipelines
      • Creating and editing pipelines
      • Supported file types for pipelines
      • Creating custom entity types from a pipeline
      • Configuring file synthesis for a pipeline
      • Configuring an Amazon S3 pipeline
      • Configuring a Databricks pipeline
      • Configuring an Azure pipeline
      • Configuring a Sharepoint pipeline
      • Selecting files for an uploaded file pipeline
    • Starting a pipeline run
    • Sharing pipeline access
    • Viewing pipeline results
      • Viewing pipeline files, runs, and statistics
      • Displaying details for a processed file
      • Structure of the pipeline output file JSON
    • Downloading and using pipeline output
  • Textual Python SDK
    • Installing the Textual SDK
    • Creating and revoking Textual API keys
    • Obtaining JWT tokens for authentication
    • Instantiating the SDK client
    • Datasets and redaction
      • Create and manage datasets
      • Redact individual strings
      • Redact individual files
      • Transcribe and redact an audio file
      • Configure entity type handling for redaction
      • Record and review redaction requests
    • Pipelines and parsing
      • Create and manage pipelines
      • Parse individual files
  • Textual REST API
    • About the Textual REST API
    • REST API authentication
    • Redaction
      • Redact text strings
  • Datasets
    • Manage datasets
    • Manage dataset files
  • Snowflake Native App and SPCS
    • About the Snowflake Native App
    • Setting up the app
    • Using the app
    • Using Textual with Snowpark Container Services directly
  • Install and administer Textual
    • Textual architecture
    • Setting up and managing a Textual Cloud pay-as-you-go subscription
    • Deploying a self-hosted instance
      • System requirements
      • Deploying with Docker Compose
      • Deploying on Kubernetes with Helm
    • Configuring Textual
      • How to configure Textual environment variables
      • Configuring the number of textual-ml workers
      • Configuring the number of jobs to run concurrently
      • Configuring the format of Textual logs
      • Setting a custom certificate
      • Configuring endpoint URLs for calls to AWS
      • Enabling PDF and image processing
      • Setting the S3 bucket for file uploads and redactions
      • Required IAM role permissions for Amazon S3
      • Configuring model preferences
    • Viewing model specifications
    • Managing user access to Textual
      • Textual organizations
      • Creating a new account in an existing organization
      • Single sign-on (SSO)
        • Viewing the list of SSO groups in Textual
        • Azure
        • GitHub
        • Google
        • Keycloak
        • Okta
      • Managing Textual users
      • Managing permissions
        • About permissions and permission sets
        • Built-in permission sets and available permissions
        • Viewing the lists of permission sets
        • Configuring custom permission sets
        • Configuring access to global permission sets
        • Setting initial access to all global permissions
    • Textual monitoring
      • Downloading a usage report
      • Tracking user access to Textual
Powered by GitBook
On this page
  • Specifying the handling option for entity types
  • Specifying a default handling option
  • Providing added and excluded values for entity types
  • Including custom entity types

Was this helpful?

Export as PDF
  1. Textual Python SDK
  2. Datasets and redaction

Configure entity type handling for redaction

Required dataset permission: Edit dataset settings

By default, when you:

  • Configure a dataset

  • Redact a string

  • Retrieve a redacted file

Textual does the following:

  • For the string and file redaction, replaces detected values with tokens.

  • For LLM synthesis, generates realistic synthesized values.

When you make the request, you can:

  • Override the default behavior.

  • For individual files and text strings, specify custom entity types to include.

Specifying the handling option for entity types

For each entity type, you can choose to redact, synthesize, or ignore the value.

  • When you redact a value, Textual replaces the value with a token that consists of the entity type. For example, ORGANIZATION.

  • When you synthesize a value, Textual replaces the value with a different realistic value.

  • When you ignore a value, Textual passes through the original value.

To specify the handling option for entity types, you use the generator_config parameter.

generator_config={'<entity_type>':'<handling_option>'}

Where:

  • <entity_type> is the identifier of the entity type. For example, ORGANIZATION. For the list of built-in entity types that Textual scans for, go to Built-in entity types. For custom entity types, the identifier is the entity type name in all caps. Spaces are replaced with underscores, and the identifier is prefixed with CUSTOM_. For example, for a custom entity type named My New Type, the identifier is CUSTOM_MY_NEW_TYPE. From the Custom Entity Types page, to copy the identifier of a custom entity type, click its copy icon.

  • <handling_option> is the handling option to use for the specified entity type. The possible values are Redact, Synthesis, and Off.

For example, to synthesize organization values, and ignore languages:

generator_config={'ORGANIZATION':'Synthesis', 'LANGUAGE':'Off'}

Specifying a default handling option

For string and file redaction, you can specify a default handling option to use for entity types that are not specified in generator_config.

To do this, you use the generator_default parameter.

generator_default can be either Redact, Synthesis, or Off.

Providing added and excluded values for entity types

You can also configure added and excluded values for each entity type.

You add values that Textual does not detect for an entity type, but should. You exclude values that you do not want Textual to identify as that entity type.

  • To specify the added values, use label_allow_lists.

  • To specify the excluded values, use label_block_lists.

For each of these parameters, the value is a list of entity types to specify the added or excluded values for. To specify the values, you provide an array of regular expressions.

{'<entity_type>':['<regex>']}

The following example uses label_allow_lists to add values:

  • For NAME_GIVEN, adds the values There and Here.

  • For NAME_FAMILY, adds values that match the regular expression ([a-z]{2}).

(label_allow_lists={
    'NAME_GIVEN':['There','Here'], 
    'NAME_FAMILY':['([a-z]{2})']
    }
)

Including custom entity types

When you redact a string or download a redacted file, you can provide a comma-separated list of custom entity types to include. Textual then scans for and redacts those entity types based on the configuration in generator_config.

custom_entities="["<entity type identifier>"]

For example:

custom_entities=["CUSTOM_COGNITIVE_ACCESS_KEY", "CUSTOM_PERSONAL_GRAVITY_INDEX"]

Last updated 14 days ago

Was this helpful?

Copy identifier option for a custom entity type