LogoLogo
Release notesPython SDK docsDocs homeTextual CloudTonic.ai
  • Tonic Textual guide
  • Getting started with Textual
  • Previewing Textual detection and redaction
  • Entity types that Textual detects
    • Built-in entity types
    • Managing custom entity types
  • Language support in Textual
  • Datasets - Create redacted files
    • Datasets workflow for text redaction
    • Creating and managing datasets
    • Assigning tags to datasets
    • Displaying the file manager
    • Adding and removing dataset files
    • Reviewing the sensitivity detection results
    • Configuring the redaction
      • Configuring added and excluded values for built-in entity types
      • Working with custom entity types
      • Selecting the handling option for entity types
      • Configuring synthesis options
      • Configuring handling of file components
    • Adding manual overrides to PDF files
      • Editing an individual PDF file
      • Creating templates to apply to PDF files
    • Sharing dataset access
    • Previewing the original and redacted data in a file
    • Downloading redacted data
  • Pipelines - Prepare LLM content
    • Pipelines workflow for LLM preparation
    • Viewing pipeline lists and details
    • Assigning tags to pipelines
    • Setting up pipelines
      • Creating and editing pipelines
      • Supported file types for pipelines
      • Creating custom entity types from a pipeline
      • Configuring file synthesis for a pipeline
      • Configuring an Amazon S3 pipeline
      • Configuring a Databricks pipeline
      • Configuring an Azure pipeline
      • Configuring a Sharepoint pipeline
      • Selecting files for an uploaded file pipeline
    • Starting a pipeline run
    • Sharing pipeline access
    • Viewing pipeline results
      • Viewing pipeline files, runs, and statistics
      • Displaying details for a processed file
      • Structure of the pipeline output file JSON
    • Downloading and using pipeline output
  • Textual Python SDK
    • Installing the Textual SDK
    • Creating and revoking Textual API keys
    • Obtaining JWT tokens for authentication
    • Instantiating the SDK client
    • Datasets and redaction
      • Create and manage datasets
      • Redact individual strings
      • Redact individual files
      • Transcribe and redact an audio file
      • Configure entity type handling for redaction
      • Record and review redaction requests
    • Pipelines and parsing
      • Create and manage pipelines
      • Parse individual files
  • Textual REST API
    • About the Textual REST API
    • REST API authentication
    • Redaction
      • Redact text strings
  • Datasets
    • Manage datasets
    • Manage dataset files
  • Snowflake Native App and SPCS
    • About the Snowflake Native App
    • Setting up the app
    • Using the app
    • Using Textual with Snowpark Container Services directly
  • Install and administer Textual
    • Textual architecture
    • Setting up and managing a Textual Cloud pay-as-you-go subscription
    • Deploying a self-hosted instance
      • System requirements
      • Deploying with Docker Compose
      • Deploying on Kubernetes with Helm
    • Configuring Textual
      • How to configure Textual environment variables
      • Configuring the number of textual-ml workers
      • Configuring the number of jobs to run concurrently
      • Configuring the format of Textual logs
      • Setting a custom certificate
      • Configuring endpoint URLs for calls to AWS
      • Enabling PDF and image processing
      • Setting the S3 bucket for file uploads and redactions
      • Required IAM role permissions for Amazon S3
      • Configuring model preferences
    • Viewing model specifications
    • Managing user access to Textual
      • Textual organizations
      • Creating a new account in an existing organization
      • Single sign-on (SSO)
        • Viewing the list of SSO groups in Textual
        • Azure
        • GitHub
        • Google
        • Keycloak
        • Okta
      • Managing Textual users
      • Managing permissions
        • About permissions and permission sets
        • Built-in permission sets and available permissions
        • Viewing the lists of permission sets
        • Configuring custom permission sets
        • Configuring access to global permission sets
        • Setting initial access to all global permissions
    • Textual monitoring
      • Downloading a usage report
      • Tracking user access to Textual
Powered by GitBook
On this page
  • Enabling consistency with Tonic Structural
  • Using the new synthesis process
  • Configuring location synthesis options
  • Selecting the type of address generator to use
  • Indicating whether to use realistic replacement values
  • Indicating how to generate replacement zip codes
  • Configuring datetime synthesis options
  • Adjusting the range for the replacement values
  • Indicating how to replace datetime values in unsupported formats
  • Adding datetime formats
  • Default supported datetime formats in Textual
  • Configuring age synthesis options
  • Configuring telephone number synthesis options
  • Selecting the generator type
  • Determining how to replace invalid telephone numbers
  • Selecting and configuring the generator for custom entity types

Was this helpful?

Export as PDF
  1. Datasets - Create redacted files
  2. Configuring the redaction

Configuring synthesis options

Last updated 3 days ago

Was this helpful?

Required dataset permission: Edit dataset settings

When Textual generates replacement values, those values are always consistent. Consistency means that the same original value always produces the same replacement value. You can also enable consistency with some Tonic Structural output values.

For some entity types, you can configure additional options for how Tonic Textual generates the replacement values.

For custom entity types, you can select the generator to use.

You can also set whether to use the new synthesis process.

Enabling consistency with Tonic Structural

If you also use Tonic Structural, then you can configure Textual to enable selected synthesized values to be consistent between the two applications.

For example, a given source telephone number can produce the same replacement telephone number in both Structural and Textual.

To enable this consistency, you configure a statistics seed value as the value of the Textual SOLAR_STATISTICS_SEED. A statistics seed is a signed 32-bit integer.

The value must match a , either:

  • The value of the Structural environment setting TONIC_STATISTICS_SEED.

  • A statistics seed configured for an individual Structural workspace.

The current statistics seed value is displayed on the System Settings page.

Using the new synthesis process

Textual has developed an updated synthesis process that is currently implemented for the following entity types:

  • URLs

  • Names

  • Custom entity types

In particular, the new synthesis process improves the display of the synthesized values in PDF files. The values better match the available space and the original font.

To configure whether to use the new process:

  1. On the dataset details page, click Settings.

  2. On the Dataset Settings page, under PDF Settings, the New PDF synthesis mode (experimental) determines which process to use. To use the new process, toggle the setting to the on position.

  1. Click Save Dataset.

Configuring location synthesis options

Location values include the following types:

  • Location

  • Location Address

  • Location State

  • Location Zip

You can select whether to generate HIPAA or non-HIPAA addresses. Address values can be consistent with values generated in Structural.

For each location type other than Location State, you can specify whether to use a realistic replacement value. For Location State, based on HIPAA guidelines, both the Synthesis option and the Off option pass through the value.

For location types that include zip codes, you can also specify how to generate the new zip code values.

In the entity types list, to display the location synthesis options, click Options.

Selecting the type of address generator to use

Under Address generator type, select the type of address generator to use:

If you configured a Textual statistics seed that matches a Structural statistics seed, then the generated address values are consistent with values generated in Structural. A given address value produces the same output value in both applications.

For example, in both Textual and Structural, a source address value 123 Main Street might be replaced with 234 Oak Avenue.

Indicating whether to use realistic replacement values

By default, Textual replaces a location value with a realistic corresponding value. For example, "Main Street" might be replaced with "Fourth Avenue".

To instead scramble the values, uncheck Replace with realistic values.

Indicating how to generate replacement zip codes

By default, to generate a new zip code, Textual selects a real zip code that starts with the same three digits as the original zip code. For a low population area, Textual instead selects a random zip code from the United States.

To instead replace the last two digits of the zip code with zeros, check Replace zeroes for zip codes. For a low population area, Textual instead replaces all of the digits in the zip code with zeros.

Configuring datetime synthesis options

By default, when you select the Synthesis option for Date/Time and Date of Birth values, Textual shifts the datetime values to a value that occurs within 7 days before or after the original value.

To customize how Textual sets the new values, you can:

  • Set a different range within which Textual sets the new values

  • Indicate whether to scramble date values that Textual cannot parse

  • Add additional date formats for Textual to recognize

In the entity types list, to display the datetime synthesis options, click Options.

Adjusting the range for the replacement values

By default, Textual adjusts the dates to values that are within 7 days before or after the original date.

To change the range, in the # of Days To Shift +/- field, enter the number of days before and the original date within which the replacement datetime value must occur. For example, if you enter 10, then the replacement datetime value must occur within 10 days before or after the original value.

Indicating how to replace datetime values in unsupported formats

Textual can parse datetime values that use either a format in Default supported datetime formats in Textual or a format that you add.

The Scramble Unrecognized Dates checkbox indicates how Textual should handle datetime values that it does not recognize.

By default, the checkbox is checked, and Textual scrambles those values.

To instead pass through the values without changing them, uncheck Scramble Unrecognized Dates.

Adding datetime formats

By default, Textual is able to recognize datetime values that use a format from Default supported datetime formats in Textual.

Under Additional Date Formats, you can add other datetime formats that you know are present in your data.

To add a format, type the format in the field, then click +.

To remove a format, click its delete icon.

Default supported datetime formats in Textual

By default, Textual supports the following datetime formats.

Date only formats

Format
Example value

yyyy/M/d

2024/1/17

yyyy-M-d

2024-1-17

yyyyMMdd

20240117

yyyy.M.d

2024.1.17

yyyy, MMM d

2024, Jan 17

yyyy-M

2024-1

yyyy/M

2024/1

d/M/yyyy

17/1/2024

d-MMM-yyyy

17-Jan-2024

dd-MMM-yy

17-Jan-24

d-M-yyyy

17-1-2024

d/MMM/yyyy

17/Jan/2024

d MMMM yyyy

17 January 2024

d MMM yyyy

17 Jan 2024

d MMMM, yyyy

17 January, 2024

ddd, d MMM yyyy

Wed, 17 Jan 2024

M/d/yyyy

1/17/2024

M/d/yy

1/17/24

M-d-yyyy

1-17-2024

MMddyyyy

01172024

MMMM d, yyyy

January 17, 2024

MMM d, ''yy

Jan 17, '24

MM-yyyy

01-2024

MMMM, yyyy

January, 2024

Date and time formats

Format
Example value

yyyy-M-d HH:mm

2024-1-17 15:45

d-M-yyyy HH:mm

17-1-2024 15:45

MM-dd-yy HH:mm

01-17-24 15:45

d/M/yy HH:mm:ss

17/1/24 15:45:30

d/M/yyyy HH:mm:ss

17/1/2024 15:45:30

yyyy/M/d HH:mm:ss

2024/1/17 15:45:30

yyyy-M-dTHH:mm:ss

2024-1-17T15:45:30

yyyy/M/dTHH:mm:ss

2024/1/17T15:45:30

yyyy-M-d HH:mm:ss'Z'

2024-1-17 15:45:30Z

yyyy-M-d'T'HH:mm:ss'Z'

2024-1-17T15:45:30Z

yyyy-M-d HH:mm:ss.fffffff

2024-1-17 15:45:30.1234567

yyyy-M-dd HH:mm:ss.FFFFFF

2024-1-17 15:45:30.123456

yyyy-M-dTHH:mm:ss.fff

2024-1-17T15:45:30.123

Time only formats

Format
Example value

HH:mm

15:45

HH:mm:ss

15:45:30

HHmmss

154530

hh:mm:ss tt

03:45:30 PM

HH:mm:ss'Z'

15:45:30Z

Configuring age synthesis options

By default, when you select the Synthesis option for Age values, Textual shifts the age value to a value that is within seven years before or after the original value. For age values that it cannot synthesize, it scrambles the value.

In the entity types list, to display the age synthesis options, click Options.

To configure the synthesis:

  1. In the Range of Years +/- for the Shifted Age field, enter the number of years before and after the original value to use as the range for the synthesized value.

  2. By default, Textual scrambles age values that it cannot parse. To instead pass through the value unchanged, uncheck Scramble Unrecognized Ages.

Configuring telephone number synthesis options

For Phone Number values, you can choose whether to generate a realistic phone number. If you do, then the generated values can be consistent with values generated in Structural.

In the entity types list, to display the phone number synthesis options, click Options.

Selecting the generator type

From the Phone number generator type dropdown list:

  • To replace each phone number with a randomly generated number, select Random Number.

If you also configured a Textual statistics seed that matches a Structural statistics seed, then the synthesized values are consistent with values generated in Structural. A given source telephone number produces the same output telephone number in both applications.

For example, in both Textual and Structural, 123-456-6789 might be replaced with 154-567-8901.

Determining how to replace invalid telephone numbers

The Replace invalid numbers with valid numbers checkbox determines how Textual handles invalid telephone numbers in the data.

To replace the invalid with valid telephone numbers, check the checkbox.

If you do not check the checkbox, then Textual randomly replaces the numeric characters.

Selecting and configuring the generator for custom entity types

By default, when you select the Synthesis option for a custom entity type, Textual scrambles the original value.

From the generator dropdown list, select the generator to use to create the replacement value.

The available generators are:

Generator
Description

Scramble

This is the default generator.

Scrambles the original value.

CC Exp

Generates a credit card expiration date.

Company Name

Generates a name of a business.

Credit Card

Generates a credit card number.

CVV

Generates a credit card security code.

Date Time

Generates a datetime value.

Email

Generates an email address.

HIPAA Address Generator

Generates a mailing address.

IP Address

Generates an IP address.

MICR Code

Generates an MICR code.

Money

Generates a currency amount.

Name

Generates a person's name.

You configure:

  • Whether to generate the same replacement value from source values that have different capitalization.

  • Whether the replacement value reflects the gender of the original value.

Numeric Value

Generates a numeric value.

You configure whether to use the Integer Primary Key generator to generate the value.

Person Age

Generates an age value.

Phone Number

Generates a telephone number.

SSN

Generates a United States Social Security Number.

URL

Generates a URL.

HIPAA-compliant address generator. This option generates values similar to those generated by the .

Non-HIPAA address generator. This option generates values similar to those generated by the .

The formats must use a .

To generate a realistic telephone number, select US Phone Number. The US Phone Number option generates values similar to those generated by the .

The Date Time generator has the .

The generator has the as the built-in location entity types.

The Person Age generator has the .

The Phone Number generator has the .

Noda Time LocalDateTime pattern
same synthesis configuration options as the built-in Date/Time entity type
same configuration options for generator type and realistic replacements
same configuration options as the built-in Age entity type
same configuration options as the built-in Phone Number entity type
environment variable
Dataset Settings page with the new synthesis option
Synthesis options for a location value
Datetime synthesis options
Synthesis options for Age values
Synthesis options for Phone Number values
Generator dropdown list for a custom entity type
Structural statistics seed value
HIPAA Address generator in Structural
Address generator in Structural
Phone generator in Structural