arrow-left

Only this pageAll pages
gitbookPowered by GitBook
triangle-exclamation
Couldn't generate the PDF for 207 pages, generation stopped at 100.
Extend with 50 more pages.
1 of 100

Tonic Textual

Loading...

Loading...

Loading...

Loading...

Loading...

Textual Agent

Loading...

Loading...

Loading...

Entity types

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Create and manage datasets

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Manage dataset files

Loading...

Loading...

Loading...

Loading...

Loading...

Configure the redaction

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Preview and obtain output

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Guided redaction (Beta)

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

About the Textual Agent

hashtag
What is the Textual Agent?

The Textual Agent is a chat-based tool to help you to explore the content of dataset files and configure the file processing.

Dataset details page with the Textual Agent

hashtag
When does the Textual Agent start?

The Textual Agent starts when you create or edit a Textual dataset.

Each combination of dataset and user has its own Textual Agent chat.

hashtag
Enabling the Textual Agent on a self-hosted instance

The Textual Agent is always enabled on Textual Cloud.

For information on how to enable the Textual Agent on a self-hosted instance, go to .

Tonic Textual guide

Tonic Textual allows you to put your text-based data to work for you.

A Textual dataset is a collection of files from a local file system or cloud storage. Textual scans the dataset files to identify sensitive values. You can then choose to redact or replace those sensitive values, to produce output files in the same format that you can safely use for development and training.

A guided redaction project identifies blocks out sensitive values in files. For example, you might use guided redaction to prepare documents in response to a Freedom of Information Act request. Textual scans the files to identify values. You can then review and adjust the results before you download the output files.

You can use the Textual SDK or the Textual REST API to manage datasets or to remove sensitive values from individual text strings and files.

Want to know what's in the latest Textual releases? Go to the .

Managing your user profile

The User Profile page displays a summary of information about your Textual account.

From the user profile page, you can copy your organization identifier, configure a team name, and manage your Textual API keys. For more information about managing your Textual API keys, go to .

hashtag
Displaying the User Profile page

To display the User Profile page:

Enabling the Textual Agent
hashtag
Startup and overview

hashtag
Textual SDK, REST API, and other integrations

To check the operational status of Textual Cloud, go to .

Need help with Textual? Contact .

Textual release notesarrow-up-right

Click the user icon at the bottom of the navigation menu.

  • In the user menu, click User Profile.

  • hashtag
    Copying your organization identifier

    The profile summary includes the identifier of your organization in Textual.

    To copy the identifier, click the copy icon.

    hashtag
    Adding a team name

    From the User Profile page, you can specify a team name. For example, you might use the team name field to identify a specific department or project that you belong to.

    To add a team name, type the name in the field.

    Creating and revoking Textual API keys
    Textual user menu
    status.tonic.aiarrow-up-right
    [email protected]envelope

    Getting started with Textual

    Sign up for a Textual account.

    Textual entity types

    Built-in entity types come with Textual. You can also configure custom entity types.

    Preview Textual detection and redaction

    Use the home page to see how Textual identifies sensitive values in text or a file.

    Textual Agent

    Use the AI-based agent to explore and configure a dataset.

    Datasets workflow

    Use Textual to detect and replace sensitive values in files.

    About guided redaction (beta)

    Use Textual to identify and block out sensitive values in files.

    Manage API keys

    Generate and revoke API keys for SDK and API authentication.

    SDK - Datasets and redaction

    Use the Textual Python SDK to redact text and manage datasets. Review redaction requests in the Request Explorer.

    REST API

    Use the Textual REST API to redact text strings, manage datasets, and manage user access.

    Textual integrations

    List of Textual integrations with other products.

    Getting started with Textual

    Note that these instructions are for setting up a new account on Textual Cloud. For a self-hosted instance, depending on how it is set up, you might either create an account manually or use single sign-on (SSO).

    hashtag
    Signing up for Textual

    To get started with a new Textual account:

    1. Go to .

    2. Click Sign up.

    3. Enter your email address.

    4. Create and confirm a password for your Textual account.

    5. Click Sign Up.

    Textual creates your account and sends you an email message to activate the account.

    After you activate your account and log in, Textual displays the Textual home page, which you can use to preview how Textual detects and replaces values. For more information, go to .

    hashtag
    Using the Textual free trial

    When you set up an account on Textual Cloud, you start a Textual free trial.

    hashtag
    Using the Getting Started options

    On the home page, the Getting Started section provides links to tasks and information related to:

    • Using the

    • Working with

    • Working with

    • Working with

    To view the available options for a panel, hover the mouse over it. Hover over an option to display a tooltip to identify the action that the option performs.

    hashtag
    Word count limit

    During the free trial Textual scans up to 100,000 words for free. Note that Textual counts actual words, not tokens. For example, "Hello, my name is John Smith." counts as six words.

    After the 100,000 words, Textual disables scanning for your account. Until you purchase a pay-as-you-go subscription, you cannot add files to a dataset.

    hashtag
    Viewing your current usage

    To display your current usage, in the navigation menu, click the usage icon.

    hashtag
    Next steps - pay-as-you-go or product demo

    Textual also prompts you to purchase a , which allows an unlimited number of words scanned for a flat rate per 1,000 words.

    You can also request a Textual product demo.

    Managing the Textual Agent and Agent chats

    hashtag
    Expanding and collapsing the Textual Agent

    The Textual Agent displays at the right side of the application.

    Dataset details with the Textual Agent expanded

    When the Textual Agent is collapsed, to expand it, click the agent icon.

    Collapsed Textual Agent with the agent icon to expand it

    When the Textual Agent is expanded, to collapse it, click the agent icon.

    About entity types

    An entity type is a category of entity value. For example, the entity value John might be an example of the Given Name entity type.

    Tonic Textual comes with a built-in set of entity types that it always detects.

    You can also configure custom entity types, which you can use to detect values that are not covered by the built-in entity types.

    When you create a custom entity type, you can either:

    • Use regular expressions to identify matching values. You might create this type of custom entity when there are a limited number of values or the values follow specific formats that can easily be identified with a regular expression.

    • Training a model is an iterative process that can take hours or days, depending on your data. You might create this type of custom entity when there are a large number of values that do not follow a specific format. The values need to be identified more by context.

    You can also view this .

    Starting a new model-based custom entity type

    To start a new custom entity type:

    1. On the Custom Entity Types page, click Create Custom Entity Type, then select Model-based entity type.

    Custom entity type creation dropdown
    1. On the next panel, in the Custom entity type name field, provide a name for the custom entity type.

    2. In the Annotation guidelines text area, provide instructions for how the model should identify values that you want to find. This is the first version of the model guidelines.

    1. Click Next.

    Textual displays the Test data setup page.

    Selecting the active model for the entity type

    Before you can use a model-based entity type, you must select the model to use.

    On the model list page, the active model is marked as Active.

    To change the active version:

    1. Hover the mouse over the model to make the active model for the entity type.

    2. Click Activate.

    Renaming or deleting a model-based custom entity type

    hashtag
    Renaming the entity type

    To change the entity type name, on the entity type details page:

    1. Click the actions menu next to the entity type name.

    Actions menu for a model-based custom entity type
    1. In the menu, click Edit entity name.

    2. On the Edit Name panel, provide the new name for the entity type.

    3. Click Save.

    hashtag
    Deleting the entity type

    You cannot delete an entity type that is enabled for any datasets. For information on how to enable and disable an entity type in datasets, go to .

    To delete a model-based custom entity type, in the entity types list, click the delete icon.

    On the confirmation panel, click Delete.

    What can you ask the Textual Agent to do?

    You can ask the Textual Agent to provide information about the content of the dataset files and to update the entity handling configuration.

    The following are just a few examples:

    Task
    Example

    Deleting datasets

    circle-info

    Required dataset permission: Delete a dataset

    To delete a dataset:

    1. Either:

    Changing the dataset name

    circle-info

    Required dataset permission: Edit dataset settings

    The dataset name displays at the top left of the dataset details page.

    To change the dataset name:

    1. On the dataset details page, click

    Displaying the synthesis options for an entity type

    To display the synthesis options for an entity type, click the icon next to the handling option dropdown list.

    Supported file types

    Tonic Textual can process the following types of files:

    • .txt

    • .csv

    • .tsv

    File preview for a JSON output file

    For a dataset that generates JSON output, the file preview displays the original file content.

    hashtag
    Text files

    For a text file, you can switch between Markdown and the JSON output.

    Generating cloud storage output files

    To generate original format output files for a cloud storage dataset, on the dataset details page, click Generate to <cloud storage type>.

    Tonic Textual generates the output files to the . If the output location is not configured, then the generate option is disabled.

    For datasets that produce JSON output, Textual generates the output files automatically as soon as the output location is configured.

    Name synthesis options

    For the Given Name and Family Name entity types, you can configure:

    • Whether to treat the same name with different casing as a different value.

    • Whether to replicate the gender of the original value.

    Summary results for the dataset

    The Project files and Entity settings page display the following summary results for the dataset::

    • The number of detected entities

    • The percentage of dataset content that is sensitive

    Configuring guided redaction options

    Before you create guided redaction projects, configure the following options.

    On the Datasets page, click the delete icon for the dataset.
  • On the dataset details page, click Project settings, then on the Dataset settings page, click Delete Dataset.

  • Click Confirm Delete.

  • .json

  • .docx

  • .html

  • .xlsx

  • .pdf

  • .png

  • .tif or .tiff

  • .jpg or .jpeg

  • Datasets that write JSON output can also process the following types of files:

    • .rtf

    • .eml

    • .msg

    Define and train a model to identify matching values.
    video overview of entity types and entity type handlingarrow-up-right
    https://textual.tonic.ai/arrow-up-right
    Previewing Textual detection and redaction
    Textual Python SDK
    datasets
    guided redaction
    custom entity types
    pay-as-you-go subscription
    Home page for a new account
    Getting started section on the Home page
    Hovering over a Getting Started panel to display the available options
    Available usage for an account
    Activate option to select the active model for a model-based custom entity type
    Synthesis options for an entity type
    configured output location
    Generate option to generate output files for a cloud storage dataset
    hashtag
    Differentiating source values by case

    To treat the same name with different casing as different source values, check Is Consistency Case Sensitive.

    For example, when this is checked, john and John are treated as different names, and can have different replacement values - john might be replaced with michael, and John might be replaced with Stephen.

    When this is not checked, then john and John are treated as the same source value, and get the same replacement.

    hashtag
    Preserving gender in names

    To replace source names with a names that have the same gender, check Preserve Gender.

    For example, when this is checked, John might be replaced with Michael, since they are both traditionally male names. However, John would not be replaced with Mary, which is traditionally a female name.

    Synthesis options for name values
    The number of entity types for which there are detected entities
  • The number of files in the dataset

  • The total number of words in the dataset files

  • The results do not include entity types for which the entity type handling option is set to Ignore.

    Summary results for a dataset

    Configure status values Manage the available status values for projects and files.

    Configure reference codes Manage the reference codes to assign to redactions.

    Expanded Textual Agent with the agent icon to collapse it
    Panel to provide the name and initial guidelines for a model-based custom entity type
    Enabling and disabling the entity type for datasets

    How Textual handles entity values that match multiple types

    A detected value might match multiple entity types.

    For example, a telephone number might match both the Phone Number and Numeric Value entity types.

    On most dataset details views, each value is only counted once, for the entity type that it is assigned in the output file. The Analytics page under Entities analysis allows you to choose whether to include a value in the counts for all of the types that it matches.

    By default, a detected value is assigned the entity type that it most closely matches. For our example, the telephone number value most closely matches the Phone Number entity type, and so by default is included in the Phone Number count and values list.

    If the entity type is ignored, or the value is excluded, then Textual moves the value to the next matching type.

    In our example, if you set the handling option for Phone Number to Ignore, then the telephone number value is added to the count and values list for the Numeric Value entity type.

    Assigning tags to a project

    You can assign tags to each guided redaction project, to help to further identify and link projects. For example, you might use a tag to indicate when a project is for a FOIA request.

    On the Guided Redaction page, the Tags column contains the assigned tags for the project.

    On the project settings panel, the Tags field lists the project tags. To display the settings panel, either:

    • On the Guided Redaction page, click the settings icon for the project.

    • On the project details page, click More, then click Settings.

    To change the assigned tags:

    1. Click the Tags column or field.

    2. To add a tag, type the tag text, then press enter.

    3. To remove a tag, click its delete icon.

    4. To remove all of the tags, click the delete icon at the right of the tags field.

    Deleting a project

    circle-info

    Required guided redaction permission: Delete a project

    To delete a guided redaction project:

    1. Either:

      1. On the Guided Redaction page, click the options menu for the project, then click Delete project.

      2. On the project details page, click More, then click Delete project.

    2. On the confirmation panel, click Delete.

    Changing the project name and description

    From the project settings panel, you can change the project name and an optional description.

    To display the settings panel, either:

    • On the Guided Redaction page, click the settings icon for the project.

    • On the project details page, click More, then click Settings.

    On the project settings panel:

    1. In the Name field, provide the new name.

    2. In the Description field, provide the description.

    3. Click Save.

    Ask questions about specific values

    Is the value John present in any of the files, and is it detected as a given name?

    Do any of the files contain values that start with ABC?

    Ask questions about the current configuration

    Are any of the entity types ignored or synthesized?

    Set the entity type handling for the dataset

    Ignore all location-based entity types.

    Synthesize name values.

    Configure details for synthesized values

    Synthesize names and maintain the same capitalization.

    Always replace the first name John with Michael.

    Replace all addresses with HIPAA compliant values.

    Configure handling for specific document types

    Ignore images in docx files.

    Block out tables in docx files.

    Ask questions about the number and type of files

    How many files are in the dataset?

    How many PDFs are in the dataset?

    What is the largest file in the dataset?

    Are there any images in the dataset?

    Ask questions about the detected entity types in the file

    How many files contain name entities?

    Which file contains the largest count of different entity types?

    Are there any occupation entities?

    How many credit card numbers are in the files?

    Project settings
    .
  • On the Dataset settings page, in the Dataset Name field, provide the new name for the dataset.

  • Dataset Settings page
    1. Click Save Dataset.

    Dataset name
    hashtag
    CSV and RTF files

    For CSV and RTF files, you can display either Markdown, HTML, or JSON output.

    File preview for a CSV file in JSON output dataset

    hashtag
    PDF and image files

    For a PDF or file, you can always display Markdownm, HTML output, or JSON output.

    If the file contains tables, then the Tables option displays, to allow you to view the tables specifically.

    If the file contains key-value pairs, then the Key-Values option displays, to allow you to view the key-value pairs specifically.

    File preview for a PDF file in a JSON output dataset

    File preview for a text file in a JSON output dataset

    Sharing dataset access

    circle-info

    Required permissions:

    • Global permission - View users and groups

    • Either:

      • Global permission - Manage access to datasets

      • Dataset permission - Share dataset access

    Tonic Textual uses dataset permission sets for role-based access (RBAC) of each dataset.

    A dataset permission set is a set of dataset permissions. Each permission provides access to a specific dataset feature or function.

    Textual provides built-in dataset permission sets. Organizations can also configure custom permission sets.

    To share dataset access, you assign dataset permission sets to users and to SSO groups, if you use SSO to manage Textual users. Before you assign a dataset permission set to an SSO group, make sure that you are aware of who is in the group. The permissions that are granted to an SSO group automatically are granted to all of the users in the group.

    To change the current access to the dataset:

    1. Either:

      • On the Datasets page, click the share icon.

      • On the Dataset settings page of the dataset details, click Share.

    1. The dataset access panel contains the current list of users and groups who have access to the dataset, and displays their assigned dataset permission sets. To add a user or group to the list of users and groups:

      1. In the search field, begin to type the user email address or group name.

      2. From the list of matching users or groups, select the user or group to add.

    Detected entity values

    The Entities Catalog page displays the list of detected entity values for the dataset. To display the Catalog, in the left menu on the dataset details page, click Catalog.

    Entities Catalog for a dataset

    hashtag
    Information in the Entities Catalog

    The Entities Catalog lists each instance of an entity value separately. For example, the given name John is detected twice in one file and 3 times in another file. The Entities Catalog then contains 5 entries for John.

    The Entities Catalog does not include entities for entity types that have the entity type handling option set to Ignore.

    For each value instance, the Entities Catalog includes:

    • The entity value.

    • How the value appears in the output, based on the selected handling option for the value's entity type.

    • The entity type.

    • A confidence score to indicate how confident Textual is that the value is correctly detected and identified.

    hashtag
    Filtering the Entities Catalog

    hashtag
    Filtering by entity value

    To filter the list by text in the entity value, in the search field, begin to type the text.

    As you type, Textual filters the list to only include entity values that contain that text.

    hashtag
    Filtering by entity type

    By default, the Entities Catalog list includes all of the entity types. To filter the list to a specific entity type, click All types, then select the entity type. To remove the filter, select All types.

    hashtag
    Filtering by file

    By default, the Entities Catalog list includes values from all of the files. To filter the list to only include values detected in a specific file, click All files, then select the file. To remove the filter, select All files.

    hashtag
    Sorting the Entities Catalog

    You can sort the Entities Catalog by the value, transformation, entity type, and confidence score.

    To sort by a column, click the column heading.

    To reverse the sort order, click the column heading again.

    Viewing the list of custom entity types

    To display the list of custom entity types, in the navigation menu, click the custom entity types icon.

    Custom Entity Types page with regex-based and model-based custom entity types

    hashtag
    Information in the list

    For each custom entity type, the Custom Entity Types page includes the following information:

    • Name of the custom entity type.

    • Whether the custom entity type is regex-based or model-based.

    • For model-based custom entity types, the status. The status indicates where the entity type is in the creation process.

    • When the custom entity type was most recently updated.

    • The number of projects where the custom entity type is active.

    hashtag
    Filtering the list

    hashtag
    Filtering by name

    To filter the list by the entity type name, in the search field, begin to type text in the name. As you type, Textual filters the list to only include matching entity types.

    hashtag
    Filtering by type

    By default, the list includes both regex-based and model-based custom entity types.

    To filter the list to only include one of the formats:

    1. Click Filter by type.

    2. In the filter dropdown list, select the format to include.

    hashtag
    Filtering by creator

    By default, the list includes all of the custom entity types that were created by users in your organization.

    To filter the list to only include custom entity types that were created by specific users:

    1. Click Filter by creator.

    2. In the dropdown list, check the checkbox for each user to include.

    hashtag
    Sorting the list

    By default, the list is sorted alphabetically by the entity type name.

    You can sort the list by the name, status, or update date.

    To sort by a column, click the column heading.

    To reverse the sort order, click the heading again.

    Tracking and managing file processing

    When you add files to a local files dataset, or change the file selection for a cloud storage dataset, Tonic Textual automatically scans the files to identify the entities that they contain.

    When you change the dataset configuration, Textual also prompts you to run a new scan. For example, a new scan is required when you:

    • Configure added values

    • Change the available custom entity types

    The file list reflects the current scanning status for the file. A file is initially queued for scanning. When the scan starts, the status changes to scanning. When Textual finishes processing a file, it marks the file as scanned.

    When a file needs to be rescanned, a warning icon displays in front of the file name.

    As Textual processes each file, it updates the results.

    hashtag
    Pausing the file processing

    If needed, you can pause the file processing. To pause the processing, click the pause icon.

    The information in the results only reflect the files that are scanned.

    For a cloud storage dataset, when you generate output, Textual only includes files that are scanned.

    hashtag
    Starting a scan on a paused file

    circle-info

    Required dataset permission: Start a scan of dataset files

    After you pause the scan, you can start a scan on individual files.

    To start a scan on a file, click the refresh icon for the file.

    hashtag
    Downloading logs for files that fail to process

    circle-info

    Required dataset permission: Start a scan of dataset files

    When Textual is unable to process a file, it displays an error for that file.

    To download log files for the failed file:

    1. Click the options menu for the file.

    2. Click Download Logs.

    Configuring entity type synthesis options

    circle-info

    Required dataset permission: Edit dataset settings

    When Tonic Textual generates replacement values, those values are always consistent within the dataset. Consistency means that the same original value always produces the same replacement value.

    You can configure how Textual generates the replacement values for each entity type.

    You can also configure synthesis options from the Textual Agent.

    Analysis of detected entity types

    The Entities Analytics page displays a summary of the detected entity types. For each entity type, you can display the distribution across the dataset files.

    It does not include entity types that have their entity type handling option set to Ignore.

    You can also use the Textual Agent to ask questions about the detected entity types and counts.

    Entities Analytics page on the dataset details page

    hashtag
    Selecting the value count option

    On the Entities Analytics page, you can choose how Textual determines the displayed value counts and entity types.

    • Match counts to redacted files - Displays value counts based on the output files. For this view, the counts do not include entity types that are ignored. The counts also resolve entity values that match multiple types and entity values that share some text. Each value is counted as a single type.

    • Show all detected entities - Displays the full detection value counts. For this view, the counts per entity type include all of the entities that Textual found during processing. This includes:

      • Values for ignored entity types

    hashtag
    Summary counts

    The panels at the top of the page provide summary information for the detected entity values. The displayed values are based on the selected value count option.

    The summary information includes:

    • The number of detected entity values.

    • The number of detected entity types.

    • The percentage of detected values that are redacted.

    • The percentage of detected values that are synthesized.

    hashtag
    Counts by entity type

    The entity types list on the Entities Analytics page displays a summary of the detected value counts for the detected entity types. The displayed entity types and counts are based on the selected value count option.

    For each entity type, the list includes:

    • The count of detected values

    • The percentage of detected values in the dataset that are of that type

    By default, the entity types are listed in descending order based on the value count.

    You can sort the list by the entity type, count, and percentage. To sort by a column, click the heading. To reverse the sort order, click the heading again.

    hashtag
    Displaying the top 10 file list for an entity type

    When you click an entity type, Textual displays a panel that lists the 10 files that contain the most detected values for that entity type.

    The panel also allows you to change the handling option for the entity type.

    Uploading and deleting local files

    For a local file dataset, you upload and remove new files directly.

    On Tonic Textual Cloud, and by default for self-hosted instances, Textual stores the uploaded files in the application database.

    On a self-hosted instance, you can instead configure an S3 bucket where Textual stores the files. In the S3 bucket, the files are stored in a folder that is named for the dataset identifier.

    For more information, go to Setting the S3 bucket for file uploads and redactions.

    For an example of an IAM role with the required permissions, go to .

    hashtag
    Adding files to the dataset

    circle-info

    Required dataset permission: Upload files to a dataset

    From the dataset details page, to add files to the dataset:

    1. In the left menu, click Project files.

    2. On the dataset files page, click Upload Files.

    1. Search for and select the files.

    Textual uploads and then processes the files. For more information about file processing, go to .

    circle-info

    Do not leave the page while files are uploading. If you leave the page before the upload is complete, then the upload stops.

    You can leave the page while Textual is processing the file.

    On a self-hosted instance, when a file fails to upload, you can download the associated logs. To download the logs, click the options menu for the file, then select Download Logs.

    hashtag
    Removing files from the dataset

    circle-info

    Required dataset permission: Delete files from a dataset

    To remove a file from the dataset:

    1. In the file list, click the options menu for the file.

    2. In the options menu, click Delete File.

    Synthesis options for specific entity types

    The following built-in entity types have additional type-specific synthesis options.

    Providing specific replacement values

    For all entity types, you can:

    • Map original values to replacement values.

    • Provide a constant value to use to replace all of the original values, except for mapped values.

    Synthesis options that apply for all entity types. Include mapping specific values and providing a constant value.

    hashtag
    Mapping original values to replacement values

    You can map specific original values to specific replacement values.

    For example, for the Given Name entity type, you might indicate to always replace John with Michael and Mary with Melissa.

    For original values that are not in the mapping:

    • If you also provid a constant value, then Textual uses that value.

    • Otherwise, Textual selects the values.

    hashtag
    Providing the mapping

    In the text area, provide a JSON object that maps the original values to the replacement values. For example:

    With the above configuration for the Language entity type:

    • All instances of French are changed to German.

    • All instances of English are changed to Japanese.

    hashtag
    Values are case-insensitive

    The values are case-insensitive.

    For example, if you specify "John": "Michael", then Structural also replaces john with michael and JOHN with MICHAEL.

    hashtag
    Leading and trailing punctuation is ignored

    Structural ignores leading and trailing punctuation.

    To continue the example of "John": "Michael", Structural also replaces 'John' with 'Michael'.

    hashtag
    Providing a constant replacement value

    You can provide a single constant value to use to replace all instances of the entity type. For example, you might replace all bank numbers with "0000000".

    In the Constant Value field, provide the value to use.

    When you provide a constant value, Textual ignores other .

    However, Textual does respect any .

    Date/time synthesis options

    By default, when you select the Synthesize option for Date/Time values, Textual shifts the datetime values to a value that occurs within 7 days before or after the original value.

    To customize how Textual sets the new values, you can:

    • Set a different range within which Textual sets the new values.

    • Indicate whether to scramble date values that Textual cannot parse.

    • Indicate whether to shift all of the original values by the same amount and in the same direction.

    • Add additional date formats for Textual to recognize.

    hashtag
    Adjusting the range for the replacement values

    By default, Textual adjusts the dates to values that are within 7 days before or after the original date.

    To change the range:

    1. In the Left bound on # of Days To Shift field, enter the number of days before the original date within which the replacement datetime value must occur. For example, if you enter 10, then the replacement datetime value cannot occur earlier than 10 days before the original value.

    2. In the Right bound on # of Days To Shift field, enter the number of days after the original date within which the replacement datetime value must occur. For example, if you enter 6, then the replacement datetime value cannot occur later than 6 days after the original value.

    hashtag
    Shifting all values in a file by the same amount

    By default, Textual applies different shifts to the original values. Some replacement dates might be earlier, and some might be later. The amount of shift might also vary.

    Within a given file, to shift all of the datetime values in the same way, check Apply same shift for entire document.

    For example, if this is checked, Textual might shift all datetime values in a file 3 days into the future.

    hashtag
    Replacing datetime values in unsupported formats

    Textual can parse datetime values that use either a format in or a format that you add.

    The Scramble Unrecognized Dates checkbox indicates how Textual should handle datetime values that it does not recognize.

    By default, the checkbox is checked, and Textual scrambles those values.

    To instead pass through the values without changing them, uncheck Scramble Unrecognized Dates.

    hashtag
    Adding datetime formats

    By default, Textual is able to recognize datetime values that use a format from .

    Under Additional Date Formats, you can add other datetime formats that you know are present in your data.

    The formats must use a .

    To add a format, type the format in the field, then click +.

    To remove a format, click its delete icon.

    Searching for text values

    circle-info

    To enable dataset text search, self-hosted instances must configure a connection to a search provider. For more information, go to Enabling dataset text search.

    The Entities Catalog lists the values that Textual detected in the dataset files.

    The Dataset search allows you to search for specific values to determine whether Textual detected them and assigned the correct entity type.

    You can also use the Textual Agent to ask whether the dataset includes specific values.

    Dataset search page

    hashtag
    Starting a search

    To start a search:

    1. On the dataset details page, click Dataset search.

    2. In the search field, provide the text to search for.

    When the text matches a full word in a file, Textual displays the results.

    Textual only searches for full words. For example, if you type "nu", it does not find the word "number". You must type the full word "number".

    hashtag
    Viewing the search results

    The search results include a separate row for each found instance of the search text.

    For each instance, the results include:

    • The instance itself, with the immediate context. The context includes the few words before and after the search text.

    • The assigned entity type, if Textual detected the search text as an entity. If it did not, then the entity type is None.

    • The name of the file where the instance was found.

    At the top of the list, Textual displays the approximate number of matches and number of files that contain matches.

    Depending on the number of matches, the list might not initially include all of the results. If there are additional results, then the Load more link displays. When you click Load more, Textual adds the next batch of results to the end of the list.

    hashtag
    Filtering the search results

    You can filter the results to only include matches in selected files.

    To filter the results:

    1. Click Filter by file.

    2. In the file list, check the checkbox for each file to include the results for.

    Configuring PDF options

    circle-info

    Required dataset permission: Edit dataset settings

    The PDF Settings section of the Dataset settings page provides options to configure how to work with PDF files.

    PDF Settings section of the Dataset settings page

    You can also configure the signature, synthesis, and LLM classification mode options from the Textual Agent.

    hashtag
    Configuring whether to redact PDF signatures

    By default, Textual redacts scanned-in signatures in PDF files. You can configure the dataset to instead ignore the signatures.

    Under PDF Settings:

    • To redact PDF signatures, toggle Detect and redact signatures in PDFs to the on position. This is the default configuration.

    • To ignore PDF signatures, toggle Detect and redact signatures in PDFs to the off position.

    hashtag
    Using the new synthesis process

    Textual has developed an updated synthesis process that is currently implemented for the following entity types:

    • URLs

    • Names

    • Custom entity types

    In particular, the new synthesis process improves the display of the synthesized values in PDF files. The values better match the available space and the original font.

    Under PDF Settings, the New PDF synthesis mode (experimental) determines which process to use.

    To use the new process, toggle the setting to the on position.

    hashtag
    Using an LLM to analyze structured components

    To improve the detection accuracy for sensitive values, you can optionally use an LLM to analyze structured components such as tables and form fields.

    To enable this, under PDF Settings, toggle Use LLM to classify structured data for PII detection to the on position.

    hashtag
    Selecting the OCR model to use (self-hosted only)

    For PDFs as well as images, if multiple optical character recognition (OCR) models are available, you can select the specific model to use in the dataset. For information on how to enable specific models, go to .

    Under PDF Settings, from the OCR Engine dropdown list, select the model to use.

    Downloading local output files

    circle-info

    Required dataset permission: Download redacted dataset files

    For each file in a dataset, you can download the output file.

    hashtag
    Downloading a single output file

    From the file list, to download a single output file, click the options menu for the file, then select the download option.

    The dataset file preview also provides options to download the file.

    hashtag
    Datasets that generate redacted output files

    For datasets that generate redacted versions of the source files, the option is Download File.

    hashtag
    Datasets that generate JSON output

    For datasets that generate JSON output, you can download either the JSON output or the redacted version of the Markdown content.

    For RTF files, you can also download the redacted content in HTML format.

    hashtag
    Downloading all of the output files

    To download all of the output files, click the download icon that is next to the file filter field.

    For datasets that generate redacted versions of the source files, the download happens immediately.

    For datasets that generate JSON output:

    • To download the JSON output for the files, select Download JSON.

    • To download the redacted Markdown content for all of the files, select Download Markdown.

    • If the dataset contains any RTF files, then to download only the redacted HTML content for all of the RTF files in the dataset, select Download HTML (RTF only). If you select this option, then the download does not include other types of files.

    Adding and removing manual redactions

    From a text, PDF, and image file preview, you can add manual redactions. When you add a manual redaction, you select the entity type to assign to it.

    hashtag
    Adding manual redactions

    hashtag
    Text files

    To add a manual redaction to a text file:

    1. In the source text panel on the left, select the text to redact.

    2. On the redaction panel, from the dropdown list, select the entity type for the redaction.

    hashtag
    PDF and image files

    To add a manual redaction to a PDF or image file:

    1. In Original view, to change to redaction mode, click Add Redaction.

    1. While in redaction mode, to add a redaction:

      1. Draw a box around the content to redact.

      2. On the redaction details panel, from the dropdown list, select the entity type for the redaction.

    1. To exit redaction mode, click Done.

    Textual displays the count of manual redactions on the current page.

    hashtag
    Removing a manual redaction

    hashtag
    Text files

    To remove a manual redaction from a text file:

    1. In either Redacted or Original view, click the redaction.

    2. Click Delete.

    hashtag
    PDF and image files

    To remove a manual redaction from a PDF or image file:

    1. In either Redacted or Original view, click the redaction.

    2. Click Delete Redaction.

    Location synthesis options

    Location values include the following types:

    • Location

    • Location Address

    • Location State

    • Location Zip

    For location types that include zip codes, you can also specify how to generate the new zip code values.

    hashtag
    Selecting the type of address generator to use

    Under Address generator type, select the type of address generator to use:

    • HIPAA-compliant address generator. This option generates values similar to those generated by the .

    • Non-HIPAA address generator. This option generates values similar to those generated by the .

    If you configured a Textual statistics seed that matches a Structural statistics seed, then the generated address values are consistent with values generated in Structural. A given address value produces the same output value in both applications.

    For example, in both Textual and Structural, a source address value 123 Main Street might be replaced with 234 Oak Avenue.

    hashtag
    Using realistic replacement values

    For Location State, based on HIPAA guidelines, both the Synthesize option and the Ignore option pass through the value.

    For location types other than Location State, by default, Textual replaces a location value with a realistic corresponding value. For example, "Main Street" might be replaced with "Fourth Avenue".

    To instead scramble the values, uncheck Replace with realistic values.

    hashtag
    Generating replacement zip codes

    hashtag
    Using zeroes for the last two digits

    By default, to generate a new zip code, Textual selects a real zip code that starts with the same three digits as the original zip code. For a low population area, Textual instead selects a random zip code from within the United States.

    To instead replace the last two digits of the zip code with zeros, check Replace zeroes for zip codes. For a low population area, Textual instead replaces all of the digits in the zip code with zeros.

    This option is also used when you truncate zip codes to the first 3 digits. When you truncate zip codes, and enable this option, Textual adds zeroes to fill out the zip code to 5 digits.

    hashtag
    Truncating zip codes to 3 digits

    To truncate zip codes to the first 3 digits, check Use 3-digit zip codes.

    If you also check Replace zeroes for zip codes, then Textual replaces the truncated digits with zeroes.

    hashtag
    Replacing foreign zip codes with zeroes

    For zip codes from outside of the United States, then to replace those zip codes with zeroes, check Replace foreign zip codes with zeroes.

    File preview for redacted files

    For a dataset that generates output files of the same type as the original file, the file preview displays the source and output versions of the file, and highlights the detected entity values.

    From a file preview, you can change the entity type handling. For some file types, you can also add manual redactions.

    Viewing and restoring dataset versions

    Tonic Textual tracks each change to the dataset configuration.

    You can view the list of changes to the dataset configuration, and revert the dataset configuration to a selected version.

    hashtag
    Displaying the dataset version history

    To display the version history, click the clock icon at the top right of the dataset details page.

    The Version history panel lists the versions in descending order, with the current version at the top of the list.

    hashtag
    Displaying details for a version

    For each version, the Version history list initially displays:

    • A summary description of the change.

    • The timestamp when the change occurred.

    • The name of the user who made the change.

    To display additional details about the change, expand the version description. For example, for a change to the handling option for an entity type, the details include the entity type, the previous handling option, and the new handling option.

    hashtag
    Restoring a version

    You can restore an earlier version of the dataset. When you restore a version, Textual reverts the changes that occurred since that version.

    On the expanded description of a version, to restore that version, click the Restore to <version number> button.

    Textual adds a version to the history. The new version is marked as Restored, and the description indicates the version that you restored.

    The expanded details for the version list the changes that Textual reverted when you restored the earlier version.

    Age synthesis options

    When you select the Synthesis option for the Age entity type, you can either:

    • Shift the age within a specific range.

    • Pass through ages younger than 90, and group ages 90 and older into a single value.

    Synthesis options for age values

    hashtag
    Shifting the age within a range

    To shift the age within a specified range:

    1. From the Age generator type dropdown list, select Age shift.

    2. In the Range of Years +/- for the Shifted Age field, enter the number of years before and after the original value to use as the range for the synthesized value.

    By default, Textual shifts the age value to a value that is within 7 years before or after the original value.

    hashtag
    Passing through or grouping the age value

    Based on the original age value, to either pass through or group the value, from the Age generator type dropdown list, select Passthrough or group age.

    With this option:

    • If the original age value is younger than 90, then the value is left as is in the output.

    • If the original age is 90 or older, then the value is replaced with 90+.

    hashtag
    Indicating whether to scramble values that Textual cannot parse

    By default, Textual scrambles age values that it cannot parse.

    To instead pass through the value unchanged, uncheck Scramble Unrecognized Ages.

    Assigning tags to datasets

    circle-info

    Required dataset permission: Edit dataset settings

    Tags can help you to organize your datasets. For example, you can use tags to indicate datasets that belong to different groups, or that deal with specific areas of your data.

    You can manage tags from the Datasets page or from the Project Settings tab on the dataset details.

    hashtag
    Datasets page

    On the Datasets page, the Tags column displays the currently assigned tags.

    On the dataset

    To change the tag assignment for a dataset:

    1. Click Tags.

    2. On the dataset tags panel, to create a new tag, type the tag text, then press Enter. You can also select an existing tag from the dropdown list.

    1. To remove a tag, click its delete icon.

    2. To remove all of the tags, click the delete all icon.

    hashtag
    Dataset details

    On the Dataset settings page of the dataset details, the Tags section lists the currently assigned tags.

    To change the tag assignment for the dataset:

    1. Click the Tags field.

    2. To create a new tag, type the tag text, then press Enter. You can also select an existing tag from the dropdown list.

    1. To remove a tag, click its delete icon.

    2. To remove all of the tags, click the delete all icon.

    Navigating the file list

    To display the list of files for the dataset, on the dataset details page, click Project files.

    File list for an uploaded file dataset

    For cloud storage dataset, the file list displays the full path to each file.

    File list for a cloud storage dataset

    hashtag
    Information in the file list

    For each file, the file list includes the following information:

    • The name of the file. For cloud storage datasets, the file name includes the full path to the file. When the file needs to be rescanned, a warning icon displays in front of the name.

    • The number of detected entity values in the file. Note that Textual does not display an entity value count for CSV files.

    • The total number of words in the file. Note that Textual does not display the word count for CSV files.

    • When the file was added to the dataset.

    hashtag
    Filtering the file list

    To filter the list based on the file name, in the search field, begin to type text from the file name.

    As you type, Textual updates the file list to only include matching files.

    hashtag
    Sorting the file list

    You can also sort the list by any of the columns.

    • To sort by a column, click the column heading.

    • To reverse the sort order, click the column heading again.

    Date of birth synthesis options

    When you select the Synthesis option for a Date of Birth, you can either:

    • Shift the date within a specific range.

    • Truncate all dates to January 1. The year is set based on whether the original year is more than 89 years ago.

    You can also configure how to handle dates that Textual does not recognize, and add additional date formats for Textual to recognize.

    Date of birth synthesis options

    hashtag
    Shifting the date within a range

    To shift the date within a specified range:

    1. From the Generator Type dropdown list, select Date shift generator.

    2. In the Left bound on # of Days To Shift field, enter the number of days before the original date within which the replacement datetime value must occur. For example, if you enter 10, then the replacement datetime value cannot occur earlier than 10 days before the original value.

    3. In the Right bound on # of Days To Shift field, enter the number of days after the original date within which the replacement datetime value must occur. For example, if you enter

    hashtag
    Truncating the date

    To truncate the date, from the Generator Type dropdown list, select Date Truncation Generator.

    With this option, in the output:

    • The month and day are always January 1.

    • If the original year is less than 90 years ago, then the year remains as is in the output.

    • If the original year is 90 years or more ago, then the year is set to the current year minus 89.

    hashtag
    Selecting how to handle invalid values

    Textual can parse datetime values that use either a format that Textual supports by default or a format that you add.

    The Scramble Unrecognized Dates checkbox indicates how Textual should handle datetime values that it does not recognize.

    By default, the checkbox is checked, and Textual scrambles those values.

    To instead pass through the values without changing them, uncheck Scramble Unrecognized Dates.

    hashtag
    Adding datetime formats

    By default, Textual is able to recognize a specific list of .

    Under Additional Date Formats, you can add other datetime formats that you know are present in your data.

    The formats must use a .

    To add a format, type the format in the field, then click +.

    To remove a format, click its delete icon.

    Displaying a dataset file preview

    circle-info

    You cannot preview TIF image files. You can preview PNG and JPG files.

    From the dataset file list, to display the preview, either:

    • Click the file name.

    • Click the options menu, then click Preview.

    Ignoring specific instances in PDF files

    From a PDF file preview, you can choose to ignore a specific value:

    1. In the source or results panel, click the value.

    2. On the details panel, to ignore the value, toggle Ignore this instance to the on position.

    Panel with the option to ignore a PDF value

    Iterating over the guidelines to use for model training

    On the Guidelines refinement page, you prepare the guidelines that define a model.

    To test each version of the guidelines, Textual uses the guidelines to detect entity values in the test data. It then generates scores to indicate how closely those detection results match the values that you established during your initial review.

    Textual also generates recommendations to improve the guidelines.

    hashtag
    Viewing the initial version of the guidelines

    Overview of the process to create a model-based custom entity type

    For a custom model entity type, the overall process is as follows:

    hashtag
    Select and annotate test files

    The first step is to identify the entity values that are in a small set of test files. The test files and established values are used both to iterate over the model guidelines and to assess how well your trained models perform.

    Sharing access to a guided redaction project

    circle-info

    Required permissions

    Global permission - View users and groups

    Either:

    Configuring and editing PDF redaction and synthesis

    circle-info

    Required dataset permission: Edit dataset settings

    You can configure how Textual works with PDFs. For an individual PDF file, you can add manual overrides to selected areas of a file. Manual overrides can ignore detected values from Tonic Textual, or add redactions.

    Telephone number synthesis options

    For Phone Number values, you can choose whether to generate a realistic phone number. If you do, then the generated values can be consistent with values generated in Structural.

    hashtag
    Selecting the generator type

    From the Phone number generator type dropdown list:

    Configuring column content types for CSV files

    hashtag
    Available column content types

    When Textual processes a CSV file, it identifies the content of each column as one of the following:

    • Sensitive - The column values are of a single type and format, and the values match an entity type.

    Configuring handling of .docx file components

    circle-info

    Required dataset permission: Edit dataset settings

    In .docx and .xslx files, as long as the URL entity type handling option is not set to Off, Textual automatically changes the destination of hyperlinks to google.com.

    On the Dataset settings page, the Word Document Settings section contains settings to determine how to manage .docx images, tables, and comments.

    To display the Dataset settings page, on the dataset details page, click Project settings

    Enabling and disabling entity types and values

    The entity types settings for a project determine the Textual entity types that Textual detects in new project files. You can enable or disable both . You cannot enable model-based custom entity types.

    For example, you can tell Textual to ignore all Given Name values in new project files.

    For built-in entity types, you can also configure specific values to add to or exclude from the detection. For example, you can keep the Given Name entity type active, but indicate to ignore the value "Mark".

    By default, all of the entity types are active, and there are no added or excluded values.

    Any changes to the project entity type settings only affect files that are added after the change. Files that were already scanned are not affected.

    You can override the entity type settings in individual files. For more information, go to .

    Setting the project status

    circle-info

    Required guided redaction permission: Edit status

    The project status indicates where the project is in the redaction and review process.

    For information about how to configure the available status values, go to .

    Each new project is assigned the built-in Not started status.

    You can set the status from either the Guided Redaction page or the project details page.

    Managing the list of project files

    The project details page contains the list of files in the project. To display the project details, on the Guided Redaction page, click the project name.

    hashtag
    Information in the file list

    For each file in the project, the list includes the following information:

    Viewing the list of guided redaction projects

    circle-info

    Required permissions

    Either:

    Creating and managing guided redaction projects

    hashtag
    Setting up projects

    hashtag
    Configure entity types and access

    Email address synthesis options

    circle-info

    Does not apply to PDF and image files in datasets that .

    For Email Address values, you can choose to preserve the original email address domains. For example, source email addresses for example.com remain example.com email addresses in the output.

    Name of the file
  • Number of redactions in the file

  • Number of pages in the file

  • Number of comments in the file

  • Status of the file

  • Name of the user who most recently made changes in the file

  • When the most recent change occurred

  • hashtag
    Filtering the file list

    You can filter the file list based on the file name.

    To filter the list, in the search field, type text in the file name. As you type, Textual updates the list to only include matching files.

    hashtag
    Sorting the file list

    By default, the file list is sorted in descending order based on the update date. The most recently updated files are at the top of the list.

    You can sort the file list by any column except for the options column. To sort the list by a column, click the column heading. To reverse the sort order, click the column heading again.

    hashtag
    Supported file types

    You can use guided redaction for the following types of files:

    • .pdf

    • .txt

    • .docx - Note that the content is treated as text, and all images are removed.

    hashtag
    Adding files to the project

    circle-info

    Required guided redaction permission: Upload files to a project

    When you add files to a project, Textual automatically assigns the Not started status to those files. It also scans the files for built-in entity types, based on the project configuration.

    To add files to a project:

    1. On the project details page, click Upload Files.

    2. Search for and select the files to add.

    hashtag
    Removing a file from a project

    circle-info

    Required guided redaction permission: Delete files from a project

    To delete a file from the project, either:

    • On the project details page, click the delete icon for the file.

    • On the file details page, click More, then click Delete File.

    Enable consistency with Tonic Structural Generate the same values in both Textual and Structural.

    Display the synthesis options Display the synthesis configuration panel for an entity type.

    Provide replacement values Provide a single constant replacement value, or map original values to replacement values.

    Synthesis options by entity type Configure additional synthesis options for specific built-in entity types.

    Configure synthesis for custom entity types Select and configure the generator to use to synthesize values for a custom entity type.

    Name Configure synthesis options for name values.

    Location Configure synthesis options for location values.

    Email address Configure synthesis options for email address values.

    Age Configure synthesis options for age values.

    Telephone number Configure synthesis options for telephone number values.

    Date/time Configure synthesis options for date/time values.

    Date of birth Configure synthesis options for date of birth values.

    Default datetime formats For date/time and date of birth, the formats that Textual supports by default.

    Select the view Display either the output or original version of the file.

    Configure entity type handling Select the entity type handling option.

    Add manual redactions Select text to redact and assign an entity type to.

    Ignore entity instances in PDF files Ignore specific detected entity values in a PDF file.

    Configure CSV columns Display column format and configuration options from a CSV file preview.

    hashtag
    Displaying the project settings panel

    The entity type configuration for the project is part of the project settings panel.

    To display the project settings panel, either:

    • On the Guided Redaction page, click the settings icon.

    • On the project details panel, click More, then select Settings.

    After you change the project settings, Textual automatically rescans the files to add or remove automatically detected redactions that are affected by the changes.

    hashtag
    Determining the enabled entity types

    By default, a project includes all of the built-in and custom entity types. When you add a file, Textual automatically scans the file to identify instances of those entity types.

    From the settings panel, you can disable entity types. For example, if you disable the Occupation entity type, Textual does not scan for occupation values.

    On the settings panel, to exclude an entity type from the initial file scan, set the entity type toggle to the off position.

    hashtag
    Configuring added and excluded values for built-in entity types

    For each built-in entity type, you can configure added and excluded values.

    You might add values that Textual does not detect because, for example, they are specific to your organization or industry.

    You might exclude values that Textual redacts incorrectly.

    To display the panel to add and exclude values, click the add or exclude values icon.

    On the panel:

    • Use the Add to detection tab to add values.

    • Use the Remove from detection tab to remove values.

    Each value can be either a specific word or phrase to add or exclude, or a regular expression to identify the values to add or exclude. Regular expressions must be C# compatible.

    built-in and regex-based custom entity types
    Overriding the project entity type configuration
    hashtag
    From the Guided Redaction page

    From the Guided Redaction page, to change the status of a project:

    1. In the Status column, click the status value.

    2. From the status dropdown, select the new status.

    hashtag
    From the project details page

    From the project details page, to change the status of the project:

    1. In the page heading, click the status value.

    2. From the status dropdown list, select the new status.

    hashtag
    From the project settings panel

    You can also set the project status from the project settings panel. To display the project settings panel, either:

    • On the Guided Redaction page, click the settings icon.

    • On the project details page, click More, then select Settings.

    On the project settings panel, from the Status dropdown list, select the new project status.

    Configuring the available file and project statuses
    For a user or group, to change the assigned dataset permission sets:
    1. Click Access. The dropdown list displays the list of custom and built-in dataset permission sets.

    2. Under Custom Permission Sets, check the checkbox next to each dataset permission set to assign to the user or group. To remove an assigned dataset permission set, uncheck the checkbox.

    3. Under Built-In Permission Sets, click the dataset permission set to assign to the user or group. You can only assign one built-in permission set. By default, for an added user or group, the Viewer permission set is selected. To not grant any built-in permission set, select None.

    Share icon for a dataset in the datasets list
    Share option in the heading of Dataset settings
    Warning icon for a file that needs to be rescanned
    Refresh icon for a dataset file
    Default supported datetime formats
    Default supported datetime formats
    Noda Time LocalDateTime patternarrow-up-right
    Datetime synthesis options
    File options menu with the download option
    Download File option on the file preview
    File options menu for a file in a JSON output dataset
    Download menu on the file preview for a file in a JSON output dataset
    File options menu for an RTF file in a JSON output dataset
    Download menu on the preview for an RTF file in a JSON output dataset
    Download all files icon on the dataset file list
    Download all options for a JSON output dataset
    Click Add Redaction.
    New manual redaction panel in a text file
    Add Redaction button for a PDF or image file
    Redaction details panel before the entity type is selected
    Done option to exit redaction mode for a PDF or image file
    Manual redaction count for the current page of a PDF
    Existing manual redaction for a text file with the Delete option
    Existing manual redaction for a PDF file with the Delete Redaction opton
    Version history for a dataset
    Expanded details for a version in the Version history
    Version history with a restored version
    Expanded details for a restored version
    Dataset with assigned tags and the Tags option to change the tag assignment
    Editing a dataset's tags from the datasets list
    Dropdown list of existing tags to assign to a dataset
    Tags section on the Project settings page for a dataset
    Tags dropdown list on Project settings
    Options menu for a dataset file with the Preview option
    When you create the model-based custom entity type, you provide an initial description of the entity type. For example, "Scientific names of health conditions". The description is the first version of the model guidelines. The guidelines tell the model how to identify the entity type values.
  • You then select a small set of smaller test files that contain entity values. For example, if you typically use Textual to redact values in patient appointment reports, then you might upload a few of those reports to use as test files. The files should be no more than 5,000 words.

  • Textual uses your initial guidelines to identify values in the files.

  • You then review and correct the annotations to identify the definitive set of entity values that the test files contain.

  • hashtag
    Iterate over model guidelines

    After you establish the entity values in your test files, you iterate over the guidelines for the model.

    For each version of the guidelines, Textual uses the guidelines to detect entity values in the test files.

    Textual then compares the values that the guidelines version detects against the values that you established when you annotated the test files.

    Textual generates scores to identify how well that version of the guidelines performed. If you are not satisfied with the results, you can update the guidelines to create a new version.

    Textual automatically generates suggestions to improve the guidelines, based on how well the current guidelines identified the values. For example, it might suggest more specific wording or additional text to describe exceptions.

    hashtag
    Select training data

    When you have guidelines that you are satisfied with, you select a larger set of data to use for model training.

    The training data should contain at least 1,000 entity values. The files should still be relatively small - no more than 5,000 words.

    For example, when setting up a custom entity type to identify health conditions, you might use 5 or 6 appointment reports for your test data, but several hundred reports for your training data.

    hashtag
    Train models

    When you create a model, you select the guidelines version to use for it.

    The model uses the guidelines to annotate the training data - in other words, to detect entity values in the training files. You review the annotation results to determine whether you are satisfied with the detections.

    If you are not satisfied, you can:

    1. Return to the guidelines refinement to edit the guidelines.

    2. Create a new guidelines version.

    3. Create a model that uses the new version.

    If you are satisfied, then you can start the model training. Model training can take a very long time - sometimes hours or days - depending on the data.

    When the model finishes training, it scans and identifies values in the original test data. Each trained model receives a score to identify how well its detections matched the definitive values that you established.

    hashtag
    Select a model to use

    To make the entity type available to use, you select the trained model to use.

    The custom entity type is then active and can be enabled or disabled within individual datasets.

    Flow to create a model-based custom entity type

    Configure PDF options

    Determine the synthesis process to use, and how to manage PDF signatures.

    Edit an individual file

    Add manual overrides to a PDF file. You can also apply a template.

    Create PDF templates

    PDF templates allow you to add the same overrides to files that have the same structure.

    .

    You can also use the Textual Agent to configure these options.

    hashtag
    Configuring how to handle .docx images

    For .docx images, including .svg files, you can configure the dataset to either:

    • Redact the image content. When you select this option, Textual looks for and blocks out sensitive values in the image.

    • Ignore the image.

    • Replace the images with black boxes.

    On the Dataset settings page, under Image settings for DOCX files:

    • To redact the image content, click Redact contents of images using OCR. This is the default selection.

    • To ignore the images entirely, click Ignore images during scan.

    • To replace the images with black boxes, click Replace images from the output file with black boxes.

    hashtag
    Configuring how to handle .docx tables

    For .docx tables, you can configure the dataset to either:

    • Redact the table content. When you select this option, Textual detects sensitive values and replaces them based on the entity type configuration.

    • Block out all of the table cells. When you select this option, Textual places a black box over each table cell.

    On the Dataset settings page, under Table settings for DOCX files:

    • To redact the table content, click Redact content using the entity type configuration. This is the default selection.

    • To block out the table content, click Block out all table cell content.

    hashtag
    Configuring how to handle .docx comments

    For comments in a .docx file, you can configure the dataset to either:

    • Remove the comments from the file.

    • Ignore the comments and leave them in the file.

    On the Dataset settings page, to remove the comments, toggle Remove comments from the output file to the on position. This is the default configuration.

    To ignore the comments, toggle Remove comments from the output file to the off position.

    Word Document Settings section on the Dataset settings page

    Share access to a project Add users or groups to a guided redaction project.

    View the projects list View the list of guided redaction projects.

    Create a project Start a new guided redaction project.

    Set the project status Track the status of a guided redaction project.

    Assign tags to a project Use tags to further identify a guided redaction project.

    Change the project name and description Give a guided redaction project a new name and add a more detailed description.

    Delete a project Remove a guided redaction project.

    Set the active entity types and values Identify the entity types to scan for. Add and exclude specific values.

    use the new PDF synthesis process
    Synthesis options for email address values
  • The entity value instance in its immediate context.

  • The name of the file that contains the value instance.

  • Entities catalog filtered by value text

    Entity values that match multiple entity types

  • Entity values that share some text

  • Summary counts for the detected entity values
    Entity types list on the Entities Analytics page
    Summary of values per file for an entity type
    Results for a dataset text search
    Filtering the dataset search results by file
    Enabling PDF and image processing
    OCR Engine dropdown list in PDF Settings
    When the file was most recently scanned.
  • For PDF and image files on self-hosted instances, the OCR model used to process the file.

  • Warning icon for a file that needs to be rescanned
    6
    , then the replacement datetime value cannot occur later than 6 days after the original value.
  • To apply the same shift to all birth dates within a given file, check the Apply same shift for entire document checkbox.

  • datetime formats
    Noda Time LocalDateTime patternarrow-up-right
    For all other language values, Textual selects the replacement values.
    entity type-specific synthesis configuration
    mappings of original values to replacement values
    {
      "French": "German",
      "English": "Japanese"
    }
    To work on the guidelines, click Guidelines refinement. The Guidelines refinement option is enabled when you complete the review on the initial set of test files.

    The first time you display the Guidelines refinement page, Textual uses the guidelines that you provided during the entity type creation to populate the Version 1 tab.

    Guidelines refinement page for a model-based custom entity type

    At the left are the guidelines.

    At the right is the list of test data files.

    hashtag
    File statuses for the guidelines refinement

    For each version of the guidelines, Textual uses the guidelines to detect entity values in the test data.

    The file statuses are:

    • Queued for annotation - Textual has not yet scanned the file.

    • Annotating - Textual is in the process of scanning the file.

    • Annotated - The scan is complete.

    hashtag
    Reviewing the test scores for the guidelines

    When Textual uses guidelines to detect entity values in the test files, it sets the number of detected entities and a set of scores. The scores reflect how well the detections match the entity values that you established in the test data setup. If you change the established values in the test data, Textual updates the scores for the guidelines.

    The overall entity count and scores across all files are displayed across the top of the page. The file list displays the entity count and scores for each file.

    Overall scores and file scores in the Guidelines Refinement list

    The scores are:

    • Precision score - Measures the accuracy of positive predictions. Indicates how many of the detected entities were correctly identified. For example, the guidelines detect 10 values. If only 3 of those are correct, then the precision score is lower than if 7 of those are correct.

    • Recall score - Measures the model's ability to find all of the entities. Indicates how many of the actual entities it detected. For example, the guidelines detect 10 correct values. If the total number of correct values is 20, then the recall score is lower than if the total number of correct values is 12.

    • F1 score - The harmonic mean of precision and recall. The goal is to have a balance between precision and recall. The guidelines should produce annotations that are both accurate and complete. Detecting all of the correct values is not useful if the guidelines also detect a large number of incorrect values. And detecting only correct values is not useful if the guidelines only detect a fraction of the total number of correct values.

    hashtag
    Reviewing the guideline detections

    To review the entity values that Textual detected based on the current version of the guidelines, click the file name.

    hashtag
    Editing the guidelines

    Based on how accurately Textual detected the entity values, Textual generates suggested changes to the guidelines.

    For example, it might suggest additional language to more specifically identify values that the previous version either missed or detected incorrectly.

    To start a new version of the guidelines:

    1. Click Edit. If there are suggestions, you can also click Review.

    Edit and Review options for the annotation guidelines
    1. On the Annotation guidelines panel, the current guidelines are displayed in an editable text area on the left. On the right is a summary of the suggested updates to the guidelines. To display the proposed replacement guidelines, toggle Show diff to the on position.

    Annotation guidelines panel with AI suggestions to improve the guidelines in the next version
    1. To update the guidelines, you can either:

      • Update the guidelines manually.

      • Accept all of the suggestions, and replace the current guidelines. To do this, click Accept changes.

      • Manually copy text from the suggestions and paste it into the guidelines.

    2. To save the guidelines version, and start the detection and scoring, click Save new version.

    Textual creates a new tab for the new version of the guidelines. The tab label is Version n, where n is incremented for each new version. The most recent version is at the left.

    Guidelines version tabs on the Guidelines Refinement page
    Global permission - Control access to all guided redaction projects
  • Guided redaction permission - Share guided redaction access

  • Textual uses guided redaction permission sets for role-based access control (RBAC) of each project.

    A guided redaction permission set is a collection of guided redaction permissions.

    Textual provides built-in guided redaction permission sets. Organizations can also configure custom permission sets.

    To share project access, you assign guided redaction permission sets to users and to SSO groups, if you use SSO to manage Textual users. Before you assign a guided redaction permission set to an SSO group, make sure that you are aware of who is in the group. The permissions that are granted to an SSO group automatically are granted to all of the users in the group.

    To change the current access to a project:

    1. Either:

      1. On the Guided Redaction page, click the share icon for the project.

      2. On the project details page, click More, then click Share.

    2. The project access panel contains the current list of users and groups who have access to the project, and displays their assigned guided redaction permission sets. To add a user or group to the list of users and groups:

      1. In the search field, begin to type the user email address or group name.

      2. From the list of matching users or groups, select the user or group to add.

    3. For a user or group, to change the assigned guided redaction permission sets:

      1. Click Access. The dropdown list displays the list of custom and built-in guided redaction permission sets.

      2. Under Custom Permission Sets, check the checkbox next to each guided redaction permission set to assign to the user or group. To remove an assigned guided redaction permission set, uncheck the checkbox.

    To replace each phone number with a randomly generated number, select Random Number.
  • To generate a realistic telephone number, select US Phone Number. The US Phone Number option generates values similar to those generated by the .

  • If you also configured a Textual statistics seed that matches a Structural statistics seed, then the synthesized values are consistent with values generated in Structural. A given source telephone number produces the same output telephone number in both applications.

    For example, in both Textual and Structural, 123-456-6789 might be replaced with 154-567-8901.

    hashtag
    Replacing invalid telephone numbers

    The Replace invalid numbers with valid numbers checkbox determines how Textual handles invalid telephone numbers in the data.

    To replace the invalid with valid telephone numbers, check the checkbox.

    If you do not check the checkbox, then Textual randomly replaces the numeric characters.

    hashtag
    Preserving the area code

    The Preserve US area code checkbox determines whether Textual preserves the area code for telephone numbers in the United States.

    To preserve the area codes from the source values in the output values, check the checkbox.

    If you do not check the checkbox, then Textual replaces the entire telephone number, including the area code.

    Synthesis options for telephone number values

    Not sensitive - The column values are of a single type and format, but the values do not match an entity type.

  • Unstructured - Each column value might contain multiple entities of different types. For example, a column that contains notes or a longer description.

  • hashtag
    Selecting the column content type

    From the file preview, you can review and change the content type assignments for each column.

    To change the assigned CSV column content type:

    1. In the output table heading, click the dropdown icon.

    Column configuration panel for a CSV column
    1. On the configuration panel, click the column content type.

    2. If you clicked Sensitive, then from the Column entity type dropdown list, select the entity type for the column values. Textual displays the current configured handling type for the selected entity type.

    3. If you clicked Sensitive or Not Sensitive, then to save the changes, click Save. Changing the content type to Unstructured requires a rescan of the file. If you clicked Unstructured, then:

      1. To save the changes and rescan the file, click Save & Scan.

      2. To save the changes, but not rescan the file, click Save. If you save without rescanning the file, then Textual marks the file as requiring a rescan.

    Global permission - either:
    • Use guided redaction projects

    • View all guided redaction projects

  • Guided redaction permission: Access to view or perform an action on one or more guided redaction projects

  • The Guided Redaction page contains the list of redaction projects.

    To display the Guided Redaction page, in the Textual navigation menu, click the guided redaction icon.

    The list displays the projects that you have access to.

    Guided Redaction page with the list of guided redaction projects

    hashtag
    Information in the list

    For each project, the list includes:

    • The name of the project

    • Any tags assigned to the project

    • The number of files in the project

    • The project status

    • When the project was created

    • The user who created the project

    hashtag
    Filtering the project list

    You can filter the list based on the project name and assigned tags.

    To filter by name, in the search field, type text from the project name. As you type, Textual updates the list to only include matching projects.

    To filter by assigned tags:

    1. In the Tags column heading, click the filter icon.

    2. On the filter panel, check the checkbox for each tag to include. You can use the search field to look for a specific tag.

    Panel to filter the guided redaction projects based on their assigned tags

    hashtag
    Sorting the project list

    By default, the list is sorted in descending order by the creation date. The most recently created projects are at the top of the list.

    You can also sort by the project name.

    To sort by a column, click the column heading. To reverse the sort order, click the column heading again.

    Tracking and managing file processing
    Dataset files list with the upload option
    Options menu for a dataset file
    Synthesis options for location values

    Selecting the training data for your models

    Before you start training models, on the Model data setup page, you select the training data to use.

    Model Data Setup page with selected training files

    hashtag
    About training data

    The training data is a much larger set of files than the test data, and can include hundreds of files or more. The data should ideally contain at least 1,000 values for the entity type. For example, for an entity type to identify health conditions, you might use 5 medical appointment reports in your test data, but several hundred medical reports for your training data.

    Similar to the test files, the training files should be relatively small - no more than 5,000 words.

    For training data, there is no option to paste in text. Training data files are either uploaded from a local file system or selected from a cloud storage solution.

    If you selected the test data from a cloud storage solution, then you must use the same cloud storage solution for the training data. For example, if you selected the test data from Amazon S3, then you must select the training data from Amazon S3.

    You can add files to the training data at any time. New files are only used for models that are trained after the files are added.

    hashtag
    Uploading files from a local file system

    If there are no training files, then on the Model data setup page, click Upload files, then search for and select the files to upload.

    To add more uploaded files to the training data:

    1. Click Add training data.

    1. Click Upload files.

    2. Search for and select the files to add to the training data.

    hashtag
    Selecting files from cloud storage

    If there are no training files, then on the Model data setup page, click the cloud storage solution to use, then select the files to add.

    If the test data came from a cloud storage solution, then you must use the same cloud storage option for the training data.

    To add more cloud storage files to the training data:

    1. Click Add training data.

    1. Select the cloud storage solution.

    2. Select the files to add to the training data.

    For training data, you can select entire folders. Textual then adds all of the files in the folder.

    hashtag
    Displaying the content of a training data file

    On the Model data setup page, to display the content of an uploaded file, click the file name.

    hashtag
    Training data file statuses

    Each training data file goes through the following statuses:

    • Queued for upload - The file is not yet uploaded.

    • Uploading - Textual is uploading the file.

    • Ready - The file is uploaded and is used for subsequent model training.

    Model training cannot start until all of the currently uploaded files are Ready.

    hashtag
    Deleting training files

    On the Model data setup page, to delete a training file:

    1. Click its delete icon.

    2. On the confirmation panel, you can choose to skip the confirmation when you delete training files. If you select this option, then the next time you delete a training file, the file is deleted immediately, and the panel does not display.

    3. Click Delete.

    When you delete a training file:

    • For existing models that annotated the file:

      • The entity counts continue to reflect the entities that were detected in the file

      • The file name remains in the list on the model details.

    Editing an individual PDF file

    circle-info

    Required dataset permission: Edit dataset settings

    For PDF files, you can add manual overrides to the initial detections, which are based on the detected data types and handling configuration.

    For each manual override, you select an area of the file.

    For the selected area, you can either:

    • Ignore any automatically detected values. For example, a scanned form might show an example or boilerplate content that doesn't actually contain sensitive values.

    • Redact that area. The file might contain sensitive content that Tonic Textual is unable to detect. For example, a scanned form might contain handwritten notes.

    You can also apply a template to the file.

    You can also .

    hashtag
    Selecting the manual override option for a file

    To manage the manual overrides for a PDF file:

    1. In the file list, click the options menu for the file.

    2. In the options menu, click Edit Redactions.

    The File Redactions panel displays the file content. The values that Textual detected are highlighted. The page also shows any manual overrides that were added to the file.

    hashtag
    Applying a PDF template to a file

    If a dataset contains multiple files that have the same format, then you can create a template to apply to those files. For more information, go to .

    On the File Redactions panel, to apply a template to the file, select it from the template dropdown list.

    When you apply a PDF template to a file, the manual overrides from that template are displayed on the file preview. The manual overrides are not included in the Redactions list.

    hashtag
    Adding a manual override

    On the File Redactions panel, to add a manual override to a file:

    1. Select the type of override. To indicate to ignore any automatically detected values in the selected area, click Ignore Redactions. To indicate to redact the selected area, click Add Manual Redaction.

    2. Use the mouse to draw a box around the area to select.

    Textual adds the override to the Redactions list. The icon indicates the type of override.

    In the file content:

    • Overrides that ignore detected values within the selected area are outlined in red.

    • Overrides that redact the selected area are outlined in green.

    hashtag
    Navigating to a manual override

    To select and highlight a manual override in the file content, in the Redactions list, click the navigate icon for the override.

    hashtag
    Removing a manual override

    To remove a manual override, in the Redactions list, click the delete icon for the override.

    hashtag
    Saving the manual overrides

    To save the current manual overrides, click Save.

    Selecting the handling option for entity types

    circle-info

    Required dataset permission: Edit dataset settings

    For datasets that produce redacted files, for each entity type, you choose how to handle the detected values. This determines how each value displays in the output files.

    For datasets that create JSON output, the entity type handling determines the display in downloaded Markdown or HTML files.

    In addition to the dataset details page options, you can also use the Textual Agent to configure the entity type handling.

    hashtag
    Available handling options

    The available options are:

    • Synthesize - Indicates to replace the value with another realistic value. For example, the first name value Michael might be replaced with the value John. The synthesized values are always consistent, meaning that a given entity value always has the same replacement value. For example, if the first name Michael appears multiple times in the text, it is always replaced with John. Textual does not synthesize any excluded values. For custom entity types, Textual scrambles the values.

    • Redact - This is the default option, except for the Full Mailing Address entity type, which is ignored by default. For text files, Redact indicates to tokenize the value - to replace it with a token that identifies the entity type followed by a unique identifier. For example, the first name value Michael might be replaced with NAME_GIVEN_12m5s. The identifiers are consistent, which means that for a given original value, the replacement always has the same unique identifier. For example, the first name Michael might always be replaced with NAME_GIVEN_12m5sb

    hashtag
    Selecting the handling option for a specific entity type

    On the Entity settings page, to select the handling option for an entity type:

    1. In the De-identification Setting column, click the dropdown.

    2. In the dropdown list, click the option.

    On the Analytics page, the also provides an option to set the handling option.

    hashtag
    Selecting the handling option for all of the entity types

    On the Entity settings page, to select the same handling option for all of the entity types, from the Bulk Edit dropdown above the data type list, select the option.

    Configuring the available file and project statuses

    As you redact and review the project files, you set the status of each file and project. Projects and files use the same set of status values.

    Each status is associated with a color.

    hashtag
    Built-in statuses

    Textual comes with a set of built-in statuses. The built-in statuses are:

    • Not started - This is the default status. It is applied automatically to all new projects and files. You cannot delete this status.

    • In progress

    • Ready for review

    • Review in progress

    • Done - This is intended to be the final status for a project or file. You cannot delete this status.

    You can change the name and assigned color of all of the built-in statuses. You can delete the built-in statuses that you do not need, except for the Not started status and the Done status.

    You can also add custom statuses, to accommodate your particular redaction and review process.

    hashtag
    Displaying the status list

    On the Guided Redaction page, to display the status list, click Status settings.

    The status list shows the status name and the associated color.

    hashtag
    Adding a status

    circle-info

    Required global permission: Create guided redaction status values

    From the Status Settings panel, to add a status:

    1. Click Add a status.

    2. On the status configuration panel, in the field, provide a name for the status. Status names must be unique.

    3. Click the color to assign to the status.

    4. Click

    hashtag
    Editing a status

    circle-info

    Required global permission: Edit guided redaction status values

    From the Status Settings panel, to change the status configuration:

    1. Click the status name.

    2. On the status configuration panel, you can change the status name and color. Remember that the status name must be unique.

    1. To save the changes, click Save.

    hashtag
    Deleting a status

    circle-info

    Required global permission: Edit guided redaction status values

    You cannot delete:

    • The built-in Not started status.

    • The built-in Done status.

    • A status that is currently assigned to a project or file.

    From the Status Settings panel, to delete a status:

    1. Click the status name.

    2. On the status configuration panel, click the delete icon.

    About guided redaction

    circle-info

    The guided redaction feature is currently in beta.

    The Textual guided redaction tool blocks out sensitive values in files. For example, you might use guided redaction to prepare documents to provide in response to a Freedom of Information Act request.

    Guided redaction supports built-in and custom entity types. Textual completes an initial scan of the files to identify sensitive values in the files. You can then manually add and remove redactions.

    The values are covered with a black or white box. You assign and display reference codes to identify the type of content for each redaction.

    The overall workflow is as follows:

    hashtag
    Pre-project setup

    Before you create a guided redaction project:

    1. Set up the list of statuses that can be assigned to a file or a project. Each status is associated with a color. Textual provides a set of built-in statuses, including a Not started status that is applied automatically to all new projects and files, and a Done status that indicates that the project or file is complete. You can configure custom statuses and change the names and colors of the built-in statuses. You can also delete the built-in statuses, except for Not started and Done.

    2. If you assign reference codes to the redactions, configure the reference codes. Reference codes identify the type of information that is redacted. You can optionally link each code to up to 5 Textual entity types. For example, if you use a single code for all name values, then you could link that code to both the Given Name and Family Name entity types. Be sure to map reference codes to all of the entity types that are present in your files.

    hashtag
    Project creation and population

    1. .

    2. . For new files, Textual uses its built-in models to scan each file for entity values.

    hashtag
    File redaction

    For each file, in Redaction mode, you can .

    Every redaction must be assigned a reference code. Redactions that do not have reference codes are not redacted in the output.

    As you work on the redaction, you can update the and statuses.

    Use the to track the project activity.

    hashtag
    Redaction review

    In Review mode, .

    As you work on the review, you can update the and statuses.

    Use the to track the project activity.

    hashtag
    Output preview and download

    In Preview mode, based on the current redactions.

    When the redaction and review is complete, . All downloaded output is in PDF format, with the individual PDF files bundled into a .zip file.

    When you download files, you select whether the output is redacted or is to be used for review.

    • Review output does not cover the redactions, and always displays the reference codes.

    • For redacted output, you select the color of the box and whether to display the reference codes.

    Datasets flows

    You use a Textual dataset to detect sensitive values in files. The dataset output can be either:

    • Files in the same format as the original file, with the sensitive values replaced based on the dataset configuration.

    • JSON files that contain a summary of the detected values and replacements.

    You can use the to explore and configure a dataset.

    You can also create and manage datasets from the or

    Creating templates to apply to PDF files

    circle-info

    Required dataset permission: Edit dataset settings

    A dataset might contain multiple files that have the same structure, such as a set of scanned-in forms.

    Instead of adding the same manual overrides for each file, you can use a PDF file in the dataset to create a template that you can apply to other PDF files in the dataset.

    When you , you can apply a template.

    Configuring added and excluded values for built-in entity types

    circle-info

    Required dataset permission: Edit dataset settings

    In a dataset, for each built-in entity type, you can configure additional values to detect, and values to exclude. You cannot define added and excluded values for custom entity types.

    You might add values that Textual does not detect because, for example, they are specific to your organization or industry.

    You might exclude a value because:

    Setting the generator for custom entity types

    By default, when you select the Synthesize option for a custom entity type, Textual scrambles the original value.

    From the generator dropdown list, select the generator to use to create the replacement value.

    The available generators are:

    Generator
    Description

    Reviewing the sensitivity detection results

    circle-info

    Required dataset permission: View dataset settings

    The dataset details page provides information about the results of the sensitivity detection, including the overall results, the results per file, and the results per entity type.

    You can also use the to explore the dataset results.

    Previewing dataset file output

    circle-info

    Required dataset permission: Preview redacted dataset files

    circle-info

    You cannot preview TIF image files. You can preview PNG and JPG files.

    You cannot preview HTML files.

    Exporting and importing a model-based custom entity type

    You can export a model-based entity type to an encrypted file. The export includes the entire configuration.

    You can then import the entity type into another instance of Textual.

    hashtag
    Exporting an entity type

    circle-info

    Creating a guided redaction project

    circle-info

    Required global permission: Create guided redaction projects

    To create a project:

    1. On the Guided Redaction page, click New Project.

    Count of entities per file

    For each dataset file, the Project files page displays:

    • The number of detected entity values in the file. The count does not include values for entity types for which the entity type handling is set to Ignore.

    • The total number of words in the file.

    Selecting the view

    hashtag
    Redacted view - output

    For a redacted file, the preview by default shows the Redacted view, which reflects the output version of the file.

    Sensitive values are redacted, synthesized, or ignored based on the dataset configuration.

    For PDF and image files, for entity types that use the Redact handling option, the values are covered by black boxes.

    Under Built-In Permission Sets, click the guided redaction permission set to assign to the user or group. You can only assign one built-in permission set. By default, for an added user or group, the Viewer permission set is selected. To not grant any built-in permission set, select None.

    On the new project panel, in the Project Name field, provide a name for the project.

  • Click Save.

  • Textual displays the project details page.

    add manual redactions from the file preview
    Creating templates to apply to PDF files
    File options menu for a PDF file
    File Redactions panel
    Redactions list of manual overrides with a navigate icon highlighted
    , while the first name Helen might always be replaced with
    NAME_GIVEN_9ha3m2
    . For PDF files,
    Redact
    indicates to either cover the value with a black box, or, if there is space, display the entity type and identifier. For image files,
    Redact
    indicates to cover the value with a black box. Textual does not redact any excluded values.
  • Ignore - Indicates to not make any changes to the values. For example, the first name value Michael remains Michael. This this the default option for the Full Mailing Address entity type. When you ignore an entity type, Textual does not detect it. The count of detected values is set to 0, and the entity type is not included in the Entities Catalog or Entity Analytics.

  • entity type details panel
    Handling options for a detected entity type
    Bulk Edit dropdown list to apply the same handling option to all of the entity types
    Save
    .
    Status settings list for guided redaction
    Configuration panel for a guided redaction status value
    Configuration panel for a guided redaction status panel, with the delete icon
    Create a guided redaction project
    Add files to the project
    add and remove redactions
    file
    project
    project audit log
    review the redactions and mark each one as reviewed
    file
    project
    project audit log
    preview the redacted output
    download the redacted files
    Guided redaction workflow
    Textual Agent

    Entity values that match multiple types How Textual counts entity values that match more than one entity type.

    Summary results Summary counts on the Project files and Entity settings page.

    Entity counts per file Viewing the counts per file on the Project settings page.

    Detected entity values The Catalog page lists the detected entity values in the dataset.

    Detected values per entity type The Analytics page contains an analysis of the detected entity values for each entity type.

    Entity type list and settings The Entity settings page displays the entity type results and configuration.

    Value and word counts for dataset files
    The file name is dimmed, and you cannot display the file details.
  • For models that are created after the file is deleted, the file is not annotated and is not displayed in the list on the model details.

  • Add training data options to select additional training files
    Add training data options to select additional training files
    .

    hashtag
    Overall workflow

    At a high level, to use Textual to detect sensitive values and create redacted data:

    Diagram of the Tonic Textual dataset workflow

    hashtag
    Create and populate a dataset

    1. Create a Textual dataset, which is a set of files to redact. The files can be uploaded from a local file system, or can come from a cloud storage solution. When you create the dataset, you also choose the type of output, which can be either:

      • The redacted version of the original files. The file is in the same format as the original file.

      • JSON summaries of the files and the detected entities.

    2. Add files to the dataset. Textual supports almost any free-text file, PDF files, .docx files, and .xlsx files.

      For images, Textual supports PNG, JPG (both .jpg and .jpeg), and TIF (both .tif and .tiff) files.

    3. Textual uses its built-in models to scan the files and identify sensitive values. For JSON output, Textual also immediately generates the output files.

    hashtag
    Review the redaction results

    Review the types of entities that were detected in the scanned files.

    hashtag
    Configure entity type handling

    At any time, for datasets that produce redacted files, you can configure how Textual handles the detected values for each entity type.

    For all datasets, you can provide added and excluded values for each built-in entity type.

    You can also create and enable custom entity types.

    hashtag
    Select the handling option for each entity type

    For datasets that produce redacted output files, you configure how Textual redacts the values. This configuration does not apply to datasets that produce JSON output.

    For each entity type, you select the action to perform on detected values. The options are:

    • Redaction - By default, Textual redacts the entity values, which means to replace the values with a token that identifies the type of sensitive value, followed by a unique identifier. For example, NAME_GIVEN_l2m5sb, LOCATION_j40pk6. The identifiers are consistent, which means that for the same original value, the redacted value always has the same identifier. For example, the first name Michael might always be replaced with NAME_GIVEN_12m5sb, while the first name Helen might always be replaced with NAME_GIVEN_9ha3m2. For PDF files, redaction means to either cover the value with a black box, or, if there is space, display the entity type and identifier. For image files, redaction means to cover the value with a black box.

    • Synthesis - For a given entity type, you can instead choose to synthesize the values, which means to replace the original value with a realistic replacement. The synthesized values are always consistent, meaning that a given original value always produces the same replacement value. For example, the first name Michael might always be replaced with the first name John. You can also identify specific replacement values.

    • Ignore - You can choose to ignore the values, and not replace them.

    Textual automatically updates the file previews and downloadable files to reflect the updated configuration.

    hashtag
    Define added and excluded values for entity types

    Optionally, for all datasets, you can create lists of values to add to or exclude from an entity type. You might do this to reflect values that are not detected or that are detected incorrectly.

    hashtag
    Manually update PDF files

    Datasets also provide additional options to redact PDF files.

    You can add manual overrides to a PDF file. When you add a manual override, you draw a box to identify the affected portion of the file.

    You can use manual overrides either to ignore the automatically detected redactions in the selected area, or to redact the selected area.

    To make it easier to process multiple files that have a similar format, such as a form, you can create templates that you can apply to PDF files in the dataset.

    hashtag
    Generate or download output files

    After you complete the redaction configuration and manual updates, to obtain the output files:

    • For local file datasets, you download the output files.

    • For cloud storage datasets, for datasets that produce original format files, you run a generation job that writes the output files to the configured output location. For datasets that produce JSON output, the files are generated to the output location as soon as the the output location is configured.

    hashtag
    File upload and download flows

    For a local file dataset, the file upload and download flows are as follows. For a more general overview of the Textual architecture, go to Textual architecture.

    hashtag
    File upload flow

    When you upload a file to a local file dataset, the flow is as follows:

    File upload flow for a local file dataset
    1. The Textual user uploads the file.

    2. The API service stores the file in either Amazon S3 or the Textual application database. For more information, go to Setting the S3 bucket for file uploads and redactions.

    3. The API service starts a job in the worker.

    4. The worker sends any PDF and image files to the OCR service (Amazon Textract, Document Intelligence, or Tesseract) to extract the file text.

    5. The OCR service returns the PDF and image text to the worker.

    6. The worker submits the file text to the Textual machine learning service to detect and replace entity values.

    7. The machine learning service returns the results to the worker.

    8. The worker stores the results in the application database.

    hashtag
    File download flow

    When you download a redacted file from a local file dataset, the flow is as follows:

    File download flow for a local file dataset
    1. The Textual user makes the request to download the file.

    2. The API service retrieves the file from where it is stored in either Amazon S3 or the application database.

    3. The API service retrieves the detected entities and entity handling settings from the application database.

    4. The API service applies those results to the file.

    5. The API service returns the redacted file to the Textual user.

    Textual Agent
    Textual SDK
    REST API
    hashtag
    Creating a PDF template

    To add a PDF template to a dataset:

    1. On the Dataset settings page, under PDF Settings, click PDF Templates.

    PDF Templates option on the PDF Settings section of the Dataset ssettings page
    1. On the template creation and selection panel, click Create a New Template.

    Panel with option to create a PDF template
    1. On the template details page:

      1. In the Name field, provide a name for the template.

      2. From the file dropdown list, select the dataset file to use to create the template.

      3. Add the manual overrides to the file.

    PDF template details panel for a new template
    1. When you finish adding the manual overrides, click Save New Template.

    hashtag
    Updating an existing PDF template

    When you update a PDF template, it affects any files that use the template.

    To update a PDF template:

    1. On the Dataset settings page, under PDF Settings, click PDF Templates.

    2. Under Edit an Existing Template, select the template, then click Edit Selected Template.

    3. On the template details panel, you can change the template name, and add or remove manual overrides.

    Template details panel for an existing template.
    1. To save the changes, click Update Template.

    hashtag
    Managing the manual overrides

    hashtag
    Adding a manual override

    On the template details panel, to add a manual override to a file:

    1. Select the type of override. To indicate to ignore any automatically detected values in the selected area, click Ignore Redactions. To indicate to redact the selected area, click Add Manual Redaction.

    2. Use the mouse to draw a box around the area to select.

    Tonic Textual adds the override to the Redactions list. The icon indicates the type of override.

    hashtag
    Navigating to a manual override

    To select and highlight a manual override in the file content, in the Redactions list, click the navigate icon for the override.

    Redactions list of manual overrides with a navigate icon highlighted

    hashtag
    Removing a manual override

    To remove a manual override, in the Redactions list, click the delete icon for the override.

    hashtag
    Deleting a PDF template

    When you delete a PDF template, the template and its manual overrides are removed from any files that the template was assigned to.

    To delete a PDF template:

    1. On the Dataset settings page, under PDF Settings, click PDF Templates.

    2. Under Edit an Existing Template, select the template, then click Edit Selected Template.

    3. On the template details panel, click Delete.

    edit a PDF file

    Textual labeled the value incorrectly.

  • You do not want to redact a specific value. For example, you might want to preserve known test values.

  • Note that you can also add manual redactions from the file preview.

    hashtag
    Displaying the Configure Entity Detection panel

    From the Configure Entity Detection panel, you configure both added and excluded values for entity types.

    To display the panel, click the settings icon for the entity type.

    The panel contains an Add to detection tab for added values, and an Exclude from detection tab for excluded values.

    Configure Entity Detection panel to configure added and excluded entity values

    hashtag
    Selecting the entity type to add or exclude values for

    The entity type dropdown list at the top of the Configure Entity Detection panel indicates the entity type to configure added and excluded values for.

    The initial selected entity type is the entity type for which you clicked the icon. To configure values for a different entity type, select the entity type from the list.

    Entity type dropdown for Custom Entity Detection

    hashtag
    Configuring added values

    On the Add to detection tab, you configure the added values for the selected entity type.

    Each value can be a specific word or phrase, or a regular expression to identify the values to add. Regular expressions must be C# compatible.

    hashtag
    Configuring a new added value

    To add an added value:

    1. Click the empty entry.

    2. Type the value into the field.

    Adding an added value for an entity type

    hashtag
    Editing an added value

    To edit an added value:

    1. Click the value.

    2. Update the value text.

    hashtag
    Testing an added value

    For each added value, you can test whether Textual correctly detects it.

    To test a value:

    1. From the Test Entry dropdown list, select the number for the value to test.

    2. In the text field, type or paste content that contains a value or values that Textual should detect.

    The Results field displays the text and highlights matching values.

    Testing an added value

    hashtag
    Removing an added value

    To remove an added value, click its delete icon.

    hashtag
    Configuring excluded values

    On the Exclude from detection tab, you configure the excluded values for the selected entity type.

    Each value can be either a specific word or phrase to exclude, or a regular expression to identify the values to exclude. Regular expressions must be C# compatible, and can include context information for the values to exclude.

    Specific words and phrases can also provide a context within which to ignore a value. For example, in the phrase "one moment, please", you probably do not want the word "one" to be detected as a numeric value. If you specify "one moment, please" as an excluded value for the numeric entity type, then "one" is not identified as a number when it is seen in that context.

    hashtag
    Adding an excluded value

    To add an excluded value:

    1. Click the empty entry.

    2. Type the value into the field.

    Adding an excluded value for an entity type

    hashtag
    Editing an excluded value

    To edit an excluded value:

    1. Click the value.

    2. Update the value text.

    hashtag
    Testing an excluded value

    For each excluded value, you can test whether Textual correctly detects it.

    To test the value that you are currently editing:

    1. From the Test Entry dropdown list, select the number for the value to test.

    2. In the text field, type or paste content that contains a value or values to exclude.

    The Results field displays the text and highlights matching values.

    Testing an excluded value

    hashtag
    Removing an excluded value

    To remove an excluded value, click its delete icon.

    hashtag
    Saving the updated added and excluded values

    New added values are not reflected in the entity types list until Textual runs a new scan.

    When you save the changes, you can choose whether to immediately run a new scan on the dataset files.

    Save options for Custom Entity Detection

    To save the changes and also start a scan, click Save and Scan Files.

    To save the changes, but not run a scan, click Save Without Scanning Files. When you do not run the scan, then on the dataset details page, Textual displays a prompt to run a scan.

    Dataset files list when a rescan is required

    CC Exp

    Generates a credit card expiration date.

    Company Name

    Generates a name of a business.

    Credit Card

    Generates a credit card number.

    CVV

    Generates a credit card security code.

    Date Time

    Generates a datetime value.

    The Date Time generator has the same .

    Email

    Generates an email address.

    HIPAA Address Generator

    Generates a mailing address.

    The generator has the same .

    IP Address

    Generates an IP address.

    MICR Code

    Generates an MICR code.

    Money

    Generates a currency amount.

    Name

    Generates a person's name.

    You configure:

    • Whether to generate the same replacement value from source values that have different capitalization.

    • Whether the replacement value reflects the gender of the original value.

    Numeric Value

    Generates a numeric value.

    You configure whether to use the Integer Primary Key generator to generate the value.

    Person Age

    Generates an age value.

    The Person Age generator has the .

    Phone Number

    Generates a telephone number.

    The Phone Number generator has the same .

    SSN

    Generates a United States Social Security Number.

    URL

    Generates a URL.

    Scramble

    Generator dropdown list for a custom entity type

    This is the default generator.

    Scrambles the original value.

    Required global permission: Edit any custom entity type

    From the details page for an entity type, to export the entity type:

    1. Click the actions menu next to the entity type name, then click Export Entity Type.

    Actions menu for a model-based custom entity type
    1. After it downloads the custom entity type to the encrypted file, Textual displays the encryption key that was used to encrypt the file, and that you will need when you import the entity type. To copy the key, click the copy icon.

    Export Entity panel with the encryption key for the exported custom entity type

    hashtag
    Importing an exported entity type

    circle-info

    Required global permission: Must have one of the following:

    • Create custom entity types

    • Edit any custom entity type

    After you export a model-based custom entity type, you can import it into another instance of Textual.

    When you import an entity type, you are prompted to provide the encryption key that Textual provided during the import.

    You cannot use the import to update or replace an existing entity type. If there is already an entity type with the same name, the import fails.

    Regardless of whether the imported entity type was complete and had an active model, it is inactive when it is imported. After the import, you can update the entity type to identify an active model.

    To import an exported entity type:

    1. On the Custom Entity Types page, click Import Entity Type.

    2. On the Import Entity panel, to search for and select the exported file, click Choose file.

    Import Entity panel to select the exported file and provide the decryption key
    1. In the Decryption key field, paste the encryption key that you copied during the export.

    2. Click Import.

    For other types of files, redacted values are replaced with the entity type followed by a unique identifier.
    Redacted view for a text file

    hashtag
    Original view - source

    To display the original version of the file, click Original. In Original view, the detected entities are highlighted.

    Original view for a PDF file
    Redacted view for a PDF file

    Managing regex-based custom entity types

    circle-info

    Required global permission - either:

    • Create custom entity types

    • Edit any custom entity type

    A regex-based custom entity type uses one or more regular expressions to identify values of that type. If a value matches a configured regular expression for the custom entity type, then it is identified as that entity type.

    Regex-based custom entity types are useful when the entity values have a standard format. For example, to detect an identifier that is specific to your organization, and that always uses the same format, you could create a regex-based custom entity type.

    For a more varied set of values that does not conform to one or a few formats, and that rely more on context, you would instead create a .

    hashtag
    Creating, editing, and deleting a regex-based custom entity type

    hashtag
    Creating a regex-based custom entity type

    circle-info

    Required global permission: Create custom entity types

    To create a regex-based custom entity type, on the Custom Entity Types page:

    1. Click Create Custom Entity Type.

    2. In the dropdown, click Regex-based entity type.

    After you :

    • To save the new type, but not scan dataset files for the new type, click Save Without Scanning Files.

    • To both save the new type and scan for it, click Save and Scan Files.

    To detect new custom entity types in a dataset, Textual needs to run a scan. If you do not run the scan when you save the custom entity type, then on the dataset details page, you are prompted to run a scan.

    hashtag
    Editing a regex-based custom entity type

    circle-info

    Required global permission: You can edit any custom entity type that you create.

    Users with the global permission Edit any custom entity type can edit any custom entity type.

    To edit a custom entity type, in the regex-based entity types list, click the edit icon for the entity type.

    You can also edit a regex-based custom entity type from the dataset details page.

    For an existing entity type, you can change the description, the regular expressions, and the enabled datasets.

    You cannot change the entity type name, which is used to produce the identifier to use to configure the entity type handling from the SDK.

    After you update the configuration:

    • To save the changes, but not scan dataset files based on the updated configuration, click Save Without Scanning Files.

    • To both save the new type and scan based on the updated configuration, click Save and Scan Files.

    To reflect the changes to custom entity types in a dataset, Textual needs to run a scan. If you do not run the scan when you save the changes, then on the dataset details page, you are prompted to run a scan.

    hashtag
    Deleting a regex-based custom entity type

    You cannot delete a custom entity type that is active in a dataset.

    To delete a custom entity type:

    1. In the custom entity types list, click the delete icon for the entity type.

    2. On the confirmation panel, click Delete Entity Type.

    hashtag
    Configuration settings for regex-based custom entity types

    The configuration for a regex-based custom entity type includes:

    • Name and description

    • Regular expressions to identify matching values. From the configuration panel, you can test the expressions against text that you provide.

    • Datasets to make the entity type active for. You can also enable and disable custom entity types from the dataset details pages.

    hashtag
    Name and description

    In the Name field, provide a name for the entity type. Each custom entity type name:

    • Must be unique within an organization.

    • Can only contain alphanumeric characters and spaces. Custom entity type names cannot contain punctuation or other special characters.

    After you save the entity type, you cannot change the name. Textual uses the name as the basis for the identifier that you use to refer to the entity type in the SDK.

    In the Description field, provide a longer description of the custom entity type.

    hashtag
    Regular expressions to identify matching values

    Under Keywords, Phrases, or Regexes, provide expressions to identify matching values for the entity type.

    An entry can be as simple as a single word or phrase, or you can provide a more complex regular expression to identify the values.

    Textual maintains an empty row at the bottom of the list. When you type an expression into the last row, Textual adds a new empty row.

    To add an entry, begin to type the value in the empty row.

    To edit an entry, click the entry field, then edit the value.

    To remove an entry, click its delete icon.

    hashtag
    Testing an expression

    Under Test Entry, you can check whether Textual correctly identifies a value as the entity type based on the provided expression.

    To test an expression:

    1. From the dropdown list, select the entry to test.

    1. In the text area, provide the text to test.

    As you enter the text, Textual automatically scans the text for matches to the selected expression. The Result field displays the input text and highlights the matching values.

    hashtag
    Enabling and disabling the regex-based entity type for datasets and guided redaction projects

    Under Activate Custom Entity Type, you identify the datasets and guided redaction projects to make the entity active for.

    From the dataset details and guided redaction details, you can also enable and disable custom entity types for that dataset or guided redaction project.

    To make the entity active for all current and future datasets and guided redaction projects, check Automatically activate for all current, and new datasets and guided redaction projects.

    The rest of the panel is split into separate lists for datasets and guided redaction projects.

    For each list:

    • To make the entity active for a specific dataset or guided redaction project, set the toggle for the dataset or project to the on position.

    • To filter the list based on the dataset or project name, in the filter field for the list, begin to type text from the name. Textual updates the list to only include matching datasets or projects.

    • To update all of the currently displayed datasets or projects, click Bulk action, then click Enable or Disable.

    For information about enabling and disabling custom entity types from within a dataset, go to .

    For information about enabling and disabling custom entity types from a guided redation project go to:

    Creating a dataset

    circle-info

    Required global permission: Create datasets

    When you create a dataset, you specify:

    • The type of output to produce

    • The source location for the files.

    • If the files are in cloud storage, the connection credentials.

    hashtag
    Setting the name, source type, and output type

    To create a dataset:

    1. On the Datasets page, click Create a Dataset.

    1. In the Dataset Name field, provide a name for the dataset.

    2. Under Output Format, select the type of output to generate.

    3. Under File Source, select the source type. If the source type is a cloud storage option, then provide the required credentials.

    hashtag
    Providing credentials for Amazon S3

    circle-info

    On self-hosted instances, we are deprecating the options to provide credentials on the dataset panel and read credentials from environment variables.

    Instead, the credentials must be included in the configuration of an IAM role that has the correct permissions.

    If the source type is Amazon S3, provide the credentials to use to connect to Amazon S3.

    1. For a self-hosted instance, select the location of the credentials. You can either provide credentials manually, or use credentials that are configured in environment variables. Note that after you save the dataset, you cannot change the selection.

    2. If you are not using environment variables, then in the Access Key field, provide an AWS access key that is associated with an IAM user or role. For an example of a role that has the required permissions for an Amazon S3 dataset, go to .

    3. In the Access Secret field, provide the secret key that is associated with the access key.

    hashtag
    Providing Azure credentials

    If the source type is Azure, provide the connection information:

    1. In the Account Name field, provide the name of your Azure account.

    2. In the Account Key field, provide the access key for your Azure account.

    3. To test the connection, click Test Azure Connection.

    hashtag
    Providing SharePoint credentials

    If the source type is SharePoint, provide the credentials for the Entra ID application.

    The credentials must have the following application permissions (not delegated permissions):

    • Files.Read.All - To see the SharePoint files

    • Files.ReadWrite.All -To write redacted files and metadata back to SharePoint

    • Sites.ReadWrite.All - To view and modify the SharePoint sites

    To provide the credentials:

    1. In the Tenant ID field, provide the SharePoint tenant identifier for the SharePoint site.

    2. In the Client ID field, provide the client identifier for the SharePoint site.

    3. In the Client Secret field, provide the secret to use to connect to the SharePoint site.

    Changing cloud storage credentials and output location

    circle-info

    Required dataset permission: Edit dataset settings

    For a cloud storage dataset, you can:

    • Update the cloud storage credentials. Note that this option is only available if you provided the credentials manually. If you use the credentials set in environment variables, then you cannot change the credentials.

    • Change the output location for the generated output files.

    You configure the connection credentials and output location from the Dataset settings page. To display the Dataset settings page, on the dataset details page, click Project settings.

    After you update the configuration, click Save Dataset.

    hashtag
    Changing cloud storage credentials

    From the credentials section, to update the cloud storage credentials, click Update >Cloud storage solution> Credentials.

    hashtag
    Amazon S3

    To provide updated credentials for Amazon S3:

    1. In the Access Key field, provide an AWS access key that is associated with an IAM user or role. For an example of a role that has the required permissions for an Amazon S3 dataset, go to .

    2. In the Access Secret field, provide the secret key that is associated with the access key.

    3. From the Region dropdown list, select the AWS Region to send the authentication request to.

    hashtag
    Azure

    To provide updated credentials for Azure:

    1. In the Account Name field, provide the name of your Azure account.

    2. In the Account Key field, provide the access key for your Azure account.

    3. To test the connection, click Test Azure Connection.

    hashtag
    SharePoint

    SharePoint credentials must have the following application permissions (not delegated permissions):

    • Files.Read.All - To see the SharePoint files

    • Files.ReadWrite.All -To write redacted files and metadata back to SharePoint

    • Sites.ReadWrite.All - To view and modify the SharePoint sites

    To provide updated credentials for SharePoint:

    1. In the Tenant ID field, provide the SharePoint tenant identifier for the SharePoint site.

    2. In the Client ID field, provide the client identifier for the SharePoint site.

    3. In the Client Secret field, provide the secret to use to connect to the SharePoint site.

    hashtag
    Setting the output location

    The output location is where Textual writes the redacted files.

    When you create a cloud storage database, after you select the initial set of files and folders, Textual prompts you to select the output location.

    For an existing dataset, you set the output location from the Output Location section of the Dataset settings page.

    Click the edit icon, then select the cloud storage folder where Textual writes the output files for the dataset.

    When you generate output for a cloud storage dataset, Textual creates a folder in the output location. The folder name is the identifier of the job that generated the files.

    Within the job folder, Textual recreates the folder structure for the original files.

    Textual then writes the output files to the corresponding folders.

    Configuring reference codes for guided redaction

    Reference codes are used to indicate the type of value that is redacted.

    Before you can start a guided redaction, you must set up the reference codes.

    You can optionally link each reference code to one or more Textual entity types. Make sure that you map reference codes to all of the entity types that appear in your files.

    hashtag
    Displaying the reference codes

    On the Guided Redaction page, to display the list of reference codes, click Reference Codes Settings.

    For each each reference code, the list includes:

    • Code value

    • Any Textual entity types that the code is mapped to

    • The user who most recently updated the code configuration

    • When the code was most recently updated

    hashtag
    Creating a reference code

    circle-info

    Required global permission: Create redaction reference codes

    To create a reference code:

    1. Click New Reference Code.

    2. In the Reference Code field, type the code.

    3. Optionally, check the checkbox next to each built-in entity type that applies to the reference code. You can link up to 5 entity types to a reference code. For example, a reference code is used for any name value. You would link the reference code to both the Given Name and Family Name entity types. The list indicates when an entity type is already linked to a reference code. You can link the same entity type to multiple reference codes. Linking entity types is optional. If the reference code represents a value that is not represented in the Textual entity types, then you do not link it.

    hashtag
    Editing a reference code

    circle-info

    Required global permission: Edit redaction reference codes

    For a reference code, you can change the code value and the assigned built-in entity types.

    To edit a reference code:

    1. Click the settings icon for the code.

    2. On the details panel, update the code. You can change the code and the assigned entity types.

    3. Click Save.

    hashtag
    Deleting a reference code

    circle-info

    Required global permission: Edit redaction reference codes

    You cannot delete a reference code that is currently assigned to a redaction.

    To delete a reference code, click its delete icon.

    Enabling and disabling the entity type for datasets

    circle-info

    You cannot enable model-based custom entity types for guided redaction projects.

    To include a model-based custom entity type in the entity types that Structural scans for in a dataset, you must enable the entity type for the dataset.

    Before you delete a model-based custom entity type, you must make sure that it is not enabled for any datasets.

    In the entity types list, the Activated for column displays the number of datasets that the entity type is enabled for.

    You cannot enable an inactive entity type.

    To change the selected datasets:

    1. Click the database icon. The Activate custom entity panel displays the datasets that you have access to.

    1. To filter the datasets by name, in the search field, type text from the name.

    2. For each dataset, the toggle indicates whether the entity type is active for the dataset. When the toggle is in the on position, the entity type is enabled for the dataset. To enable or disable the entity type for a single dataset, set the toggle.

    3. To enable or disable the entity type for all of the datasets that are currently included in the list, click Bulk Edit, then select whether to enable or disable the entity type.

    You can also enable or disable entity types from the dataset details. For more information, go to .

    List of entity types

    The Entity settings page displays the list of active entity types for the dataset set. This includes:

    • All of the built-in entity types

    • Any custom entity types that are active for the dataset

    You can also use the Textual Agent to ask about the detected entity types.

    Entity settings page for a dataset

    hashtag
    Information in the entity types list

    For each entity type, the list includes:

    • The name of the entity type.

    • The number of detected values for that type in the dataset files. For ignored entity types, the detected value count is 0.

    • The selected handling option.

    The list is sorted in descending order by the number of detected values. The entity type with the highest number of detected values is at the top of the list.

    hashtag
    Filtering the entity types list

    You can filter the entity types list by:

    • Text in the type name or description.

    • Whether the entity type is built-in or custom.

    • Whether there are detected entities for the entity type.

    • The handling option for the entity type

    hashtag
    Filtering by name or description

    To filter by name or description, in the search field, begin to type text in the name or description. As you type, Textual filters the list to only include matching entity types.

    hashtag
    Applying other filters

    To apply other filters, click Filter options, then select the filters to apply.

    Changing entity type handling

    From the file preview, to select the entity type handling option for an entity type:

    1. On either Redacted or Original view, click a detected value.

    2. On the details panel, click the entity type handling option. Textual applies the same option to all entity values of that type.

    Selecting an entity type handling option

    From the preview, you can only select the entity type handling option. For the Synthesis option, you cannot configure synthesis options for an entity type. You must configure those options from the dataset details page. For more information, go to .

    Selecting and reviewing test data

    For a model-based custom entity, you first select a set of test data. You annotate the test data to identify all of the entity values that are in those files.

    The test data is a small set of files - up to around 5 files - that contain typical entity type values. Each file also should be relatively small - no more than 5,000 words.

    For example, for an entity type that identifies health conditions, you might select 5 or 6 medical appointment reports that contain a variety of typical values.

    When you iterate over the model guidelines, Textual uses those guidelines to scan the files, and generates scores to indicate how well its detections matched the set of values that you established during your review.

    When a model finishes training, Textual uses the model to scan the test files, and generates a score to indicate how well its detections matched your established values.

    Default supported datetime formats

    By default, Tonic Textual supports the following datetime formats.

    hashtag
    Date only formats

    Format
    Example value

    Working with custom entity types

    The entity types list includes any custom entity types that are active for the dataset. From the Entity settings page, you can enable and disable custom entity types.

    You can also update the configuration of a regex-based custom entity type, and go to the details page for a model-based custom entity type.

    hashtag
    Enabling and disabling custom entity types

    circle-info

    Display the preview Display the preview for a dataset file.

    Preview for redacted files View and make edits to the redactions in a dataset file.

    Preview for a JSON output dataset file View the content of a dataset file.

    synthesis options as the built-in Date/Time entity type
    synthesis options as the built-in location entity types
    same synthesis options as the built-in Age entity type
    synthesis options as the built-in Phone Number entity type
    model-based custom entity type
    configure the entity type
    Working with custom entity types
    Enabling and disabling entity types and values
    Overriding the project entity type configuration
    Custom entity type creation dropdown
    Details panel for a regex-based custom entity type
    Regular expressions list for a custom entity type
    Dropdown list to select the regular expression to test
    Test results for a custom entity type regular expression
    Activate Custom Entity Type section to select the datasets and guided redaction projects that include the custom entity type
    Click Save.
  • For cloud storage datasets:

    1. Textual prompts you to configure the initial file selection. For more information, go to Selecting cloud storage files.

    2. After you select the files, it prompts you to select an output location. For more information, go to Changing cloud storage credentials and output location.

  • From the Region dropdown list, select the AWS Region to send the authentication request to.

  • In the Session Token field, provide the session token to use for the authentication request.

  • To test the credentials, click Test AWS Connection.

  • By default, connections to Amazon S3 use Amazon S3 encryption. To instead use AWS KMS encryption:

    1. Click Show Advanced Options.

    2. From the Server-Side Encryption Type dropdown list, select AWS KMS.

    3. In the Server-side Encryption AWS KMS ID field, provide the KMS key ID. Note that if the KMS key doesn't exist in the same account that issues the command, you must provide the full key ARN instead of the key ID.

    Note that after you save the new dataset, you cannot change the encryption type.

  • Click Save. Textual prompts you to select the dataset files.

  • Click Save. Textual prompts you to select the dataset files.
    To test the connection, click Test SharePoint Connection.
  • Click Save. Textual prompts you to select the dataset files.

  • Required IAM role permissions for Amazon S3
    Dataset creation panel
    Dataset creation panel
    Credentials fields for an Amazon S3 dataset
    Credentials fields for an Azure dataset
    Credentials fields for a SharePoint dataset

    Click Create.

    Reference codes list for guided redaction
    Working with custom entity types
    Activate custom entity panel to enable the entity type for datasets
    Filtering the entity types list by name
    Filter options for dataset entity types
    Configuring entity type synthesis options
    Example IAM role for file uploads and redactions

    In the Session Token field, provide the session token to use for the authentication request.

  • To test the credentials, click Test AWS Connection.

  • To test the connection, click Test SharePoint Connection.
    Example IAM role for Amazon S3 datasets
    hashtag
    Selecting the initial set of test files

    On the Test data setup page, to select the files, you can do a combination of:

    • Paste text into a text field.

    • Upload files from a local system.

    • Select files from one and only one of the following cloud storage options:

      • An S3 bucket

      • Azure Blob Storage

      • A SharePoint repository

    Test Data Setup page with no files selected

    After you select the initial set of test files, Textual uses the draft guidelines that you provided to identify entity values in the files.

    hashtag
    Pasting text directly

    To paste text directly:

    1. Click Sample Text.

    Sample Text field to create a test file from pasted text
    1. In the file, paste the text.

    2. Click Next.

    hashtag
    Uploading local files

    To upload local files for the draft model to annotate:

    1. Click File Upload.

    2. Click Upload Files.

    3. Search for and select the files.

    4. Click Next.

    hashtag
    Providing Amazon S3 credentials

    To provide credentials for Amazon S3:

    1. Click Amazon S3.

    Credentials fields to connect to Amazon S3
    1. For a self-hosted instance, select the location of the credentials. You can either provide credentials manually, or use credentials that are configured in environment variables. Note that after you save the credentials, you cannot change the selection.

    2. If you are not using environment variables, then in the Access Key field, provide an AWS access key that is associated with an IAM user or role. For an example of a role that has the required permissions for an Amazon S3 dataset, go to Required IAM role permissions for Amazon S3.

    3. In the Access Secret field, provide the secret key that is associated with the access key.

    4. From the Region dropdown list, select the AWS Region to send the authentication request to.

    5. In the Session Token field, provide the session token to use for the authentication request.

    6. To test the credentials, click Test AWS Connection.

    7. Click Next. Textual prompts you to select the files.

    hashtag
    Providing Azure credentials

    To provide credentials for Azure:

    1. Click Azure.

    Credentials fields to connect to Azure
    1. In the Account Name field, provide the name of your Azure account.

    2. In the Account Key field, provide the access key for your Azure account.

    3. To test the connection, click Test Azure Connection.

    4. Click Next. Textual prompts you to select the files.

    hashtag
    Providing SharePoint credentials

    For SharePoint, click SharePoint, then provide the credentials for the Entra ID application.

    Credentials fields to connect to SharePoint

    The credentials must have the following application permissions (not delegated permissions):

    • Files.Read.All - To see the SharePoint files

    • Files.ReadWrite.All -To write redacted files and metadata back to SharePoint

    • Sites.ReadWrite.All - To view and modify the SharePoint sites

    To provide the credentials:

    1. In the Tenant ID field, provide the SharePoint tenant identifier for the SharePoint site.

    2. In the Client ID field, provide the client identifier for the SharePoint site.

    3. In the Client Secret field, provide the secret to use to connect to the SharePoint site.

    4. To test the connection, click Test SharePoint Connection.

    5. Click Next. Textual prompts you to select the files.

    hashtag
    Selecting cloud storage files

    After you provide the credentials, you select the files to use.

    For test data, you cannot select folders. You must select individual files.

    hashtag
    Viewing the file list

    On the Test data setup page:

    • The list of test files displays at the left.

    • The content of the selected file displays at the right, with the entity values highlighted.

    Test Data Setup page with selected files

    hashtag
    Adding data to the list

    You can add to the test data at any time, including when you are iterating over the model guidelines.

    To add data, on the Test data setup page:

    1. Click Add test sample.

    Add test sample dropdown list with source type options to add test files
    1. From the sample type menu, select the source type for the new data. The Write sample text and Upload Files options are always available. If you previously selected data from a cloud storage solution, then that cloud storage solution is available. You cannot add files from a different cloud storage solution. For example, if you initially selected files from Amazon S3, then you cannot select files from Azure or SharePoint. If you did not previously select data from a cloud storage solution, then you can select from any of the cloud storage solutions.

    2. For a cloud storage solution, if needed, provide the credentials for the cloud storage solution, then select the additional files.

    3. For sample text, provide the content.

    4. For upload, search for and select the files.

    When you add to the test data, Textual uses the most recent version of the guidelines to identify entity values in the new data. You can then conduct the review.

    hashtag
    File review statuses

    Each file goes through the following statuses:

    • Queued for upload - Textual is uploading the file to the set of test files.

    • Ready for Review - The file is uploaded, but you have not yet reviewed the file to finalize the entity values that the file contains.

    • Reviewed - You completed the review.

    hashtag
    Reviewing a file and changing the detected values

    To review a file, click the file name. The file content displays to the right. The values from the initial detection are highlighted.

    • To add an instance of an entity value, select the value text.

    • To remove an instance, click its delete icon. On the confirmation panel, click Delete.

    To save the current annotation updates, but not mark the file as reviewed, click Save.

    When you finish the review and complete the changes, click Save and mark as reviewed.

    hashtag
    Deleting test files

    On the Test Data Setup page, to delete a test file:

    1. Click its delete icon.

    2. On the confirmation panel, you can choose to skip the confirmation when you delete test files. If you select this option, then the next time you delete a test file, the file is deleted immediately, and the panel does not display.

    3. Click Delete.

    When you delete a test file:

    • For existing guidelines versions, the file name and scores remain in the list of test files for those guidelines. The file name is dimmed, and you can no longer display a preview of the file content.

    • For existing models that annotated the deleted file during their training, the benchmark score does not change.

    • For new guidelines versions, the file is not used and is not listed.

    • For models that are trained after the file is deleted, the file is not annotated and is not included in the benchmark score.

    2024/1/17

    yyyy-M-d

    2024-1-17

    yyyyMMdd

    20240117

    yyyy.M.d

    2024.1.17

    yyyy, MMM d

    2024, Jan 17

    yyyy-M

    2024-1

    yyyy/M

    2024/1

    d/M/yyyy

    17/1/2024

    d-MMM-yyyy

    17-Jan-2024

    dd-MMM-yy

    17-Jan-24

    d-M-yyyy

    17-1-2024

    d/MMM/yyyy

    17/Jan/2024

    d MMMM yyyy

    17 January 2024

    d MMM yyyy

    17 Jan 2024

    d MMMM, yyyy

    17 January, 2024

    ddd, d MMM yyyy

    Wed, 17 Jan 2024

    M/d/yyyy

    1/17/2024

    M/d/yy

    1/17/24

    M-d-yyyy

    1-17-2024

    MMddyyyy

    01172024

    MMMM d, yyyy

    January 17, 2024

    MMM d, ''yy

    Jan 17, '24

    MM-yyyy

    01-2024

    MMMM, yyyy

    January, 2024

    hashtag
    Date and time formats

    Format
    Example value

    yyyy-M-d HH:mm

    2024-1-17 15:45

    d-M-yyyy HH:mm

    17-1-2024 15:45

    MM-dd-yy HH:mm

    01-17-24 15:45

    hashtag
    Time only formats

    Format
    Example value

    HH:mm

    15:45

    HH:mm:ss

    15:45:30

    HHmmss

    154530

    yyyy/M/d

    Required dataset permission: Edit dataset settings

    The entity types list includes the custom entity types that are active for the dataset.

    To manage which custom entity types are active for the dataset:

    1. Click Custom entity types.

    2. On the Enable custom entity types panel, to search for specific entity types, begin to type text in the entity type name. As you type, Textual updates the list to only display matching entity types.

    Enable custom entity types panel for a dataset
    1. To enable a custom entity type for the dataset, set its toggle to the on position.

    2. To disable a custom entity type for the dataset, set its toggle to the off position.

    3. To enable all of the custom entity types for the dataset, click Bulk, then click Enable all.

    4. To disable all of the custom entity types for the dataset, click Bulk, then click Disable all.

    hashtag
    Updating the configuration of a regex-based custom entity type

    circle-info

    Required global permission - either:

    • Create custom entity types

    • Edit any custom entity type

    From the dataset details, you can edit the configuration of a regex-based custom entity type. To edit a regex-based custom entity type, click the settings icon for that type.

    Note that any changes to the custom entity type settings affect all of the datasets that use the custom entity type.

    For information on how to configure a custom entity type, go to Configuration settings for regex-based custom entity types.

    hashtag
    Viewing the details for a model-based custom entity type

    circle-info

    Required global permission - either:

    • Create custom entity types

    • Edit any custom entity type

    For a model-based custom entity type, to display the details for the entity type, click the settings icon.

    Textual displays the entity type details in a new browser tab.

    hashtag
    Running a new scan to reflect custom entity type changes

    When you enable, disable, or edit custom entity types, the changes do not take effect until you run a new scan on each file.

    Textual marks each file as requiring a rescan, and displays a prompt to rescan all of the files.

    Dataset files when a rescan is required

    To run a new scan for a file, click its scan option.

    Rescan option for an individual dataset file

    Selecting cloud storage files

    circle-info

    Required dataset permission: Edit dataset settings

    For a cloud storage dataset, you manage files from the file selection panel.

    File selection panel for a cloud storage dataset

    When you create a dataset, after you provide the cloud storage credentials and save the dataset, Textual immediately prompts you to select dataset files. After you select the files and click Next, Textual prompts you to set the output location.

    For an existing dataset, to display the File Selection panel

    1. On the dataset details page, click Project files.

    2. On the dataset files page, click Select Files.

    The file selection includes:

    • Whether to restrict the dataset to specific file types

    • The files or folders to include in the dataset

    When you change the file selection, Textual scans the files for entities. For more information, go to .

    hashtag
    Filtering files by file extension

    When you select files, you can filter the selectable files based on file extension.

    To limit the file extensions to include:

    1. Click File Extension Filter. By default, all file extensions are included, and none of the checkboxes are checked.

    2. Check the checkbox for each file extension to include. As you select the file extensions to include, Textual updates the navigation pane so that you can only select files that have one of those file extensions. It hides files that have other file extensions and folders that do not contain files with the selected file extensions.

    hashtag
    Selecting files and folders to include

    In the file selection area, you navigate to and select the folders and files to add to the dataset.

    hashtag
    Navigating through the folders

    In the navigation area, to display the contents of a folder, click the Open link for the folder.

    hashtag
    Selecting a file or folder

    To add a folder or file to the dataset, check its checkbox.

    hashtag
    Managing selected folders

    In the navigation pane, when you check a folder checkbox, Textual adds it to the Prefix Patterns list.

    hashtag
    Adding a folder manually

    Instead of navigating to a folder and selecting it, you can add the path to the list manually

    To add a folder path:

    1. Click Add Prefix Pattern.

    2. In the field, type the path to the folder, then click the save icon.

    hashtag
    Removing folder paths

    To remove a folder path from the dataset, either:

    • In the navigation pane, uncheck its checkbox.

    • In the Prefix Patterns list, click its delete icon.

    For the selected folders, the dataset includes all of the applicable files in the folder that:

    • Are of a file type that Textual supports

    • Match the file extension filter

    hashtag
    Managing selected files

    In the navigation pane, when you select an individual file, Textual adds it to the Selected Files list.

    To delete a file, either:

    • In the navigation pane, uncheck its checkbox.

    • In the Selected Files list, click its delete icon.

    Creating and training models for a model-based entity type

    After you select your training data, on the Model training page, you create one or more trained models.

    For each model, you select the version of the guidelines to use. Textual first uses those guidelines to annotate the training data. Based on how well the guidelines identified the values in the training data, you decide whether to start the model training.

    When the training is complete, the model scans the test data. The model is scored based on how well it detected the definitive values that you confirmed in the test data.

    hashtag
    Information on the model list

    For each model, the model list includes:

    • Model - The model name. Models are automatically named Model n, where n is the number of the model. For example, the first model you create is Model 1, the second is Model 2, and so on.

    • Status - The model status. The possible statuses are:

      • Annotating - The model is using the selected guidelines to annotate the training data.

    hashtag
    Starting a new model

    To start a new model:

    1. Click Create new model.

    1. On the Create new model panel, from the Guideline version dropdown list, select the version of the guidelines to use for the model.

    2. Click Save.

    Textual adds the model to the list and uses the selected guidelines version to annotate the training data files.

    hashtag
    Reviewing the annotations for a model

    Before you train the model, you review the annotations to see how well the model performed.

    To review the annotations, click the model name. Models that are ready to review also display a Review and Train link next to the model name.

    On the model details page:

    • On the left is the list of training data files, with the number of entities detected in each file.

    • On the right is the list of the entities in the training files, in descending order by the number of occurrences.

    To display the content of a file with the annotations highlighted, click the file name.

    After you review the annotations, if you are not satisfied with the results, to return to the guidelines refinement:

    1. In the model list, in the Guideline version column, click the view icon.

    2. On the guidelines panel, click Go to guidelines refinement.

    For a model that is not trained yet, the model details page also displays a Modify guidelines option.

    Textual displays the Guidelines Refinement page, and selects that guidelines version. You can then , then that uses the new version.

    hashtag
    Training the model

    If you are satisfied with the annotation results, then on the model details page, to start the training, click Train model.

    hashtag
    Downloading a data package for a model

    To help troubleshoot issues with a trained model, you can download a model data package to send to Tonic.ai.

    The data package is a .zip file that contains the following:

    • General information about the custom entity type and model. Includes the entity type name entity type identifier, and the model identifier.

    • The set of test files, including the established entity values that you identified.

    • The set of training files, including the entity values that the model identified.

    To download the data package, either:

    • On the Model Training page, click the download icon for the model.

    • On the model details page, click Download Training Data.

    Built-in entity types

    Tonic Textual's built-in models identify a range of sensitive values, such as:

    • Locations and addresses

    • Names of people and organizations

    • Identifiers and account numbers

    d/M/yy HH:mm:ss

    17/1/24 15:45:30

    d/M/yyyy HH:mm:ss

    17/1/2024 15:45:30

    yyyy/M/d HH:mm:ss

    2024/1/17 15:45:30

    yyyy-M-dTHH:mm:ss

    2024-1-17T15:45:30

    yyyy/M/dTHH:mm:ss

    2024/1/17T15:45:30

    yyyy-M-d HH:mm:ss'Z'

    2024-1-17 15:45:30Z

    yyyy-M-d'T'HH:mm:ss'Z'

    2024-1-17T15:45:30Z

    yyyy-M-d HH:mm:ss.fffffff

    2024-1-17 15:45:30.1234567

    yyyy-M-dd HH:mm:ss.FFFFFF

    2024-1-17 15:45:30.123456

    yyyy-M-dTHH:mm:ss.fff

    2024-1-17T15:45:30.123

    hh:mm:ss tt

    03:45:30 PM

    HH:mm:ss'Z'

    15:45:30Z

    Ready for training - The annotation is complete. For models with this status, Textual displays a Review option to allow you to review the annotations.

  • Training - The training is in progress. Textual displays the percentage of training data that the model has trained on.

  • Ready - The model is trained. You can select any trained model as the active model for the entity type.

  • Guideline version - The version of the guidelines used for the model. To view the guidelines text, click the view icon.

  • Benchmark score - A score that indicates how well the model performed when it annotated the test data after training.

  • Detected entities - The number of entity values that the model detected in the training data.

  • # of files - The number of training files that were used for the annotation and model training.

  • edit the guidelines to create a new version
    create a new model
    Model training page
    Create new model panel to select the guidelines version for the model
    Model details page with the list of detected values
    Model details page with the content of an annotated training file
    Guidelines panel for a model, with the option to return to the guidelines refinement
    Train model option for a model
    Download Training Data option on the model details for a trained model
    Tracking and managing file processing
    File list with option to select files
    File extension filter on the file selection panel
    File selection panel with selected files
    Selected paths for a cloud storage dataset
    Selected files for a cloud storage dataset
    The built-in entity types are:
    Entity type name
    Identifier (for API)
    Description

    CC Exp

    CC_EXP

    The expiration date of a credit card.

    Credit Card

    CREDIT_CARD

    Previewing Textual detection and redaction

    circle-info

    Required global permission: Use the playground on the Home page

    The Tonic Textual home page provides a tool that allows you to see how Textual detects and replaces values in plain text or an uploaded file.

    It also provides a preview of the redaction configuration options, including:

    • How to replace the values for each entity type.

    A credit card number.

    CVV

    CVV

    The card verification value for a credit card.

    Date Time

    DATE_TIME

    A date or timestamp.

    DOB

    DOB

    A person's date of birth.

    Email Address

    EMAIL_ADDRESS

    An email address.

    Event

    EVENT

    The name of an event.

    Gender Identifier

    GENDER_IDENTIFIER

    An identifier of a person's gender.

    Healthcare Identifier

    HEALTHCARE_ID

    An identifier associated with healthcare, such as a patient number.

    IBAN Code

    IBAN_CODE

    An international bank account number used to identify an overseas bank account.

    IP Address

    IP_ADDRESS

    An IP address.

    Language

    LANGUAGE

    The name of a spoken language.

    Law

    LAW

    A title of a law.

    Location

    LOCATION

    A value related to a location. Can include any part of a mailing address.

    Occupation

    OCCUPATION

    A job title or profession.

    Street Address

    LOCATION_ADDRESS

    A street address.

    City

    LOCATION_CITY

    The name of a city.

    State

    LOCATION_STATE

    A state name or abbreviation.

    Zip

    LOCATION_ZIP

    A postal code.

    Country

    LOCATION_COUNTRY

    The name of a country.

    Full Mailing Address

    LOCATION_COMPLETE_ADDRESS

    A full postal address. By default, the entity type handling option for this entity type is Off.

    Medical License

    MEDICAL_LICENSE

    The identifier of a medical license.

    Money

    MONEY

    A monetary value.

    Given Name

    NAME_GIVEN

    A given name or first name.

    Family Name

    NAME_FAMILY

    A family name or surname.

    NRP

    NRP

    A nationality, religion, or political group.

    Numeric Identifier

    NUMERIC_PII

    A numeric value that acts as an identifier.

    Numeric Value

    NUMERIC_VALUE

    A numeric value.

    Organization

    ORGANIZATION

    The name of an organization.

    Password

    PASSWORD

    A password used for authentication.

    Person Age

    PERSON_AGE

    The age of a person.

    Phone Number

    PHONE_NUMBER

    A telephone number.

    Product

    PRODUCT

    The name of a product.

    URL

    URL

    A URL to a web page.

    US Bank Number

    US_BANK_NUMBER

    The account number of a bank in the United States.

    US Bank Routing Number

    US_ROUTING_TRANSIT_NUMBER

    The routing number of a bank in the United States.

    US ITIN

    US_ITIN

    An Individual Taxpayer Identification Number in the United States.

    US Passport

    US_PASSPORT

    A United States passport identifier.

    US SSN

    US_SSN

    A United States Social Security number.

    Username

    USERNAME

    A username for an account.

    Added and excluded values for each entity type.

    The home page displays automatically when you log in to Textual. To return to the home page from other pages, in the navigation menu, click the home option.

    Initial view of the Textual Home page

    hashtag
    Providing the content to redact

    To provide the content to redact, you can enter text directly, or you can upload a file.

    hashtag
    Entering text

    As you enter or paste text in the Textual playground text area, Textual displays the redacted version in the Results panel at the right.

    Home page with redacted text

    hashtag
    Using one of the samples

    Textual also provides sample text options for some common use cases. To populate the text with a sample, click Try a sample, then select the sample to use.

    Sample text options for the Home page

    hashtag
    Uploading a file

    You can also redact .txt or .docx files.

    To provide a file, click Upload, then search for and select the file.

    Textual processes the file and then displays the redacted version in the Results panel. The Textual playground text area is removed.

    Home page with the content of an uploaded file

    If you try to upload a file type that isn't supported, such as a PDF file, Textual prompts you to create a dataset that contains the file.

    Dataset creation prompt when the file type is not supported for the preview tool

    hashtag
    Clearing the text

    To clear the text, click Clear.

    hashtag
    Selecting the handling option for an entity type

    The handling option indicates how Textual replaces a detected value for an entity type. You can experiment with different handling options.

    Note that the updated configuration is only used for the current redacted text. When you clear the text, Textual also clears the configuration.

    The options are:

    • Redaction - This is the default value. Textual replaces the value with the name of the entity type, followed by a token to distinguish values of the same type. The same value always has the same token. For example, the first name John might be replaced with NAME_GIVEN_dySb5. In the same file, the first name Mary might be replaced with NAME_GIVEN_zrL2f.

    • Synthesis - Textual replaces the value with a realistic generated value. For example, the first name John is replaced with the first name Michael. The replacement values are consistent, which means that a given value always has the same replacement. For example, Michael is always the replacement value for John.

    • Ignore - Textual ignores the value and copies it as is to the Results panel.

    To change the handling option for an entity type:

    1. In the Results panel, click an instance of the entity type.

    2. On the configuration panel, click the handling option to use.

    Selecting the handling option for an entity type

    Textual updates all instances of that entity type to use the selected handling option.

    For example, if you change the handling option for NAME_GIVEN to Synthesis, then all instances of first names are replaced with realistic values.

    Redacted text with given name value synthesized

    hashtag
    Providing specific synthesized values

    For text that you type in, or for the sample text, when you set the handling type to synthesize, you can provide specific replacement values. This option is not available for uploaded files.

    For example, if you set the given name entity type to Synthesis, then you can indicate to always replace John with Michael.

    To provide a specific replacement value:

    1. In the output panel, click the value to specify the override for.

    2. In the Custom override field, provide the replacement value.

    Providing a custom override for a synthesized entity value
    1. Click the save icon.

    hashtag
    Defining added and excluded values

    For each entity type in entered text, you can use regular expressions to define added and excluded values.

    • Added values are values that Textual does not detect for an entity type, but that you want to include. For example, you might have values that are specific to your company or industry.

    • Excluded values are values that you do not want Textual to identify as a given entity type.

    Note that the configuration is only used for the current redacted text. When you clear the text, Textual also clears the configuration.

    Also, this option is only available for text that you enter directly. For an uploaded file, to do additional configuration or to download the file, you must create a dataset from the file.

    hashtag
    Displaying the configuration panel

    To display the configuration panel for added and excluded values, click Fine-tune Results.

    The Fine-Tune Results panel displays the list of configured rules for the current text. For each rule, the list includes:

    • The entity type.

    • Whether the rule adds or excludes values.

    • The regular expression to identify the added or excluded values.

    Fine-Tune Results panel for added and excluded values

    hashtag
    Adding a rule to add or exclude values

    On the Fine-Tune Results panel, to create a rule:

    1. Click Add Rule.

    Row to define a new rule for added or excluded values
    1. From the entity type dropdown list, select the entity type that the rule applies to.

    2. From the rule type dropdown list:

      • If the rule adds values, then select Include.

      • If the rule excludes values, then select Exclude.

    3. In the regular expression field, provide the regular expression to use to identify the values to add or exclude.

    4. To save the rule, click the save icon.

    hashtag
    Editing a rule

    To edit a rule:

    1. On the Fine-Tune Results panel, click the edit icon for the rule.

    2. Update the configuration.

    3. Click the save icon.

    hashtag
    Deleting a rule

    On the Fine-Tune Results panel, to delete a rule, click its delete icon.

    hashtag
    Creating a dataset from an uploaded file

    From an uploaded file, you can create a dataset that contains the file.

    You can then provide additional configuration, such as added and excluded values, and download the redacted file.

    To create a dataset from an uploaded file:

    1. Click Download.

    2. Click Create a Dataset.

    Textual displays the dataset details for the new dataset. The dataset name is Playground Dataset <number>, where the number reflects the number of datasets that were created from the Home page.

    The dataset contains the uploaded file.

    hashtag
    Viewing and copying the request code

    When Textual generates the redacted version of the text, it also generates the corresponding API request. The request includes the entity type configuration.

    To view the API request code, click Show Code.

    Code to create the redaction request, including the entity type handling and added and excluded values

    To hide the code, click Hide Code.

    hashtag
    Selecting the request code type

    On the code panel:

    • The Python tab contains the Python version of the request.

    • The cURL tab contains the cURL version of the request.

    hashtag
    Copying the request code

    To copy the currently selected version of the request code, click Copy Code.

    hashtag
    Enabling and using additional LLM processing of detected entities

    Textual offers an option to send detected entity information to a custom Large Language Model (LLM) to synthesize accurate replacements.

    circle-info

    The Textual LLM functionality runs only on the Textual Cloud infrastructure. It does not use any third-party LLM providers.

    hashtag
    LLM synthesis methods

    Textual provides the following LLM synthesis methods.

    hashtag
    ReplacementSynthesis

    ReplacementSynthesis redacts sensitive values. It uses the LLM to generate contextually appropriate replacements based on the surrounding text.

    When you use this method:

    1. Textual identifies sensitive values in the text.

    2. Textual redacts the values and sends the following to the LLM:

      • Redacted placeholders for the detected values, such as ORGANIZATION or NAME_GIVEN

      • The positions of the detected entities

      • The surrounding text context

      Textual does not send the original sensitive values to the LLM.

    3. The LLM analyzes the context.

    4. The LLM generates realistic replacement values that fit naturally within the text.

    hashtag
    GroupingSynthesis

    GroupingSynthesis does the following:

    • Groups related entities

    • Generates new entity names

    • Uses the LLM to reproduce the original format of the value

    When you use this method:

    1. Textual sends the detected entity values and surrounding text to the LLM. To enable grouping and format pattern recognition, Textual must send the original sensitive values.

    2. The LLM groups entities based on whether they refer to the same thing, concept, or person. Grouping is only done within each entity type. For example, Lyon the person and Lyon the city are never grouped together.

    3. The LLM chooses a representative value for each group. For example, if the content includes Will, William, and W.I.L.L, it chooses William as the most complete form.

    4. The representative value is sent to Textual's standard, non-LLM synthesis generators to get a replacement value.

    5. The LLM formats the replacement to match the original format. For example, if Will is replaced with Rob, then W.I.L.L becomes R.O.B.

    hashtag
    Making the LLM processing available

    To enable the LLM processing, set the environment variable ENABLE_EXPERIMENTAL_SYNTHESIS to True. If this is not set to True, then the LLM processing does not work.

    You must also set up the Solar.LLM container.

    hashtag
    Configuring the Solar.LLM container

    To configure the container, you can use the following Docker Compose content as a reference:

    The AWS keys are used to download the Textual custom models. To obtain a copy of the keys, contact your Tonic.ai support representative.

    hashtag
    Enabling the LLM processing for entered text

    After you enter text in the Textual playground panel, to enable the LLM processing, in the Results panel, click Use an LLM to perform AI synthesis.

    You cannot use this option for text that contains more than 100 words.

    By default, the LLM processing applies the following synthesis methods to the entity types:

    When you clear the text, Textual reverts to the default processing.

    hashtag
    Processing with the SDK

    In the Python SDK, to use LLM synthesis, call the redact function.

    services:
      textual-llm:
        image: textual-llm:[textual-version-here]
        container_name: textual-llm
        volumes:
          - llm-models:/app/models
        ports:
          - "11443:11443"
        secrets:
          - llm_aws_key_id
          - llm_aws_access_key
        deploy:
          resources:
            reservations:
              devices:
                - driver: nvidia
                  count: all
                  capabilities: [gpu]
        restart: unless-stopped
        networks:
          - llm-network
    
    volumes:
      llm-models:
    
    networks:
      llm-network:
        driver: bridge
    
    secrets:
      llm_aws_key_id:
        environment: "LLM_AWS_KEY_ID"
      llm_aws_access_key:
        environment: "LLM_AWS_ACCESS_KEY"
    
    generator_config = {
        "NUMERIC_VALUE": PiiState.ReplacementSynthesis,
        "LANGUAGE": PiiState.ReplacementSynthesis,
        "MONEY": PiiState.ReplacementSynthesis,
        "PRODUCT": PiiState.ReplacementSynthesis,
        "EVENT": PiiState.ReplacementSynthesis,
        "WORK_OF_ART": PiiState.ReplacementSynthesis,
        "LAW": PiiState.ReplacementSynthesis,
        "US_PASSPORT": PiiState.ReplacementSynthesis,
        "MEDICAL_LICENSE": PiiState.ReplacementSynthesis,
        "DATE_TIME": PiiState.GroupingSynthesis,
        "US_BANK_NUMBER": PiiState.ReplacementSynthesis,
        "NRP": PiiState.ReplacementSynthesis,
        "US_SSN": PiiState.GroupingSynthesis,
        "IP_ADDRESS": PiiState.Synthesis,
        "ORGANIZATION": PiiState.GroupingSynthesis,
        "PHONE_NUMBER": PiiState.GroupingSynthesis,
        "US_ITIN": PiiState.ReplacementSynthesis,
        "LOCATION": PiiState.GroupingSynthesis,
        "LOCATION_ADDRESS": PiiState.GroupingSynthesis,
        "LOCATION_CITY": PiiState.GroupingSynthesis,
        "LOCATION_STATE": PiiState.GroupingSynthesis,
        "LOCATION_ZIP": PiiState.GroupingSynthesis,
        "LOCATION_COUNTRY": PiiState.ReplacementSynthesis,
        "CREDIT_CARD": PiiState.GroupingSynthesis,
        "US_DRIVER_LICENSE": PiiState.ReplacementSynthesis,
        "EMAIL_ADDRESS": PiiState.ReplacementSynthesis,
        "IBAN_CODE": PiiState.ReplacementSynthesis,
        "URL": PiiState.ReplacementSynthesis,
        "NAME_GIVEN": PiiState.GroupingSynthesis,
        "NAME_FAMILY": PiiState.GroupingSynthesis,
        "PERSON": PiiState.GroupingSynthesis,
        "GENDER_IDENTIFIER": PiiState.ReplacementSynthesis,
        "OCCUPATION": PiiState.ReplacementSynthesis,
        "USERNAME": PiiState.ReplacementSynthesis,
        "PASSWORD": PiiState.ReplacementSynthesis,
        "PERSON_AGE": PiiState.GroupingSynthesis,
        "DOB": PiiState.GroupingSynthesis,
        "CC_EXP": PiiState.GroupingSynthesis,
        "CVV": PiiState.GroupingSynthesis,
        "PROJECT_NAME": PiiState.ReplacementSynthesis,
        "MICR_CODE": PiiState.ReplacementSynthesis,
        "HEALTHCARE_ID": PiiState.ReplacementSynthesis,
        "NUMERIC_PII": PiiState.ReplacementSynthesis,
        "LOCATION_COMPLETE_ADDRESS": PiiState.ReplacementSynthesis,
    }

    Viewing the dataset list and details

    hashtag
    Displaying the Datasets page

    To display the Datasets page, in the navigation menu, click the datasets icon.

    Datasets page

    The datasets list only displays the datasets that you have access to.

    Users who have the global permission View all datasets can see the complete list of datasets.

    For each dataset, the Datasets page includes:

    • The name of the dataset

    • The number of files in the dataset

    • Any tags assigned to the dataset. For datasets that you can edit, there is also an option to assign tags. For more information, go to .

    hashtag
    Filtering the list of datasets

    hashtag
    Filtering the datasets by name

    To filter the datasets by name, in the search field, begin to type text that is in the dataset name.

    As you type, the list is filtered to only include datasets with names that contain the filter text.

    hashtag
    Filtering the datasets by tag

    You can assign tags to each dataset. Tags can help you to organize and provide a quick glance into the dataset configuration.

    On the Datasets page, to filter the datasets based on whether they are assigned specific tags:

    1. Click Filters.

    2. On the filter panel, expand the Tags section.

    3. In the tag list, check each tag to include. The datasets list is filtered to only include datasets that have any of the checked tags. To find a specific tag, in the search field, type the tag name.

    hashtag
    Filtering the datasets by creator

    To filter the datasets based on the user who created them:

    1. Click Filters.

    2. On the filter panel, expand the Creator section. Your username is always at the top of the list.

    3. In the creator list, check each user to include.

      The datasets list is filtered to only include datasets that were created by one of the checked users. To find a specific user, in the search field, type the user email address.

    hashtag
    Filtering the datasets by source type

    To filter the datasets based on the location of the source files:

    1. Click Filters.

    2. On the filter panel, expand the Source Type section.

    3. Check each source type to include. The datasets list is filtered to only include datasets that contain files from one of the checked source types.

    hashtag
    Filtering the datasets by file type

    To filter the datasets based on whether they include specific types of files:

    1. Click Filter.

    2. On the filter pane, expand the File Type section.

    3. In the file type list, check each file type to include. The datasets list is filtered to only include datasets that contain files for at least one of the checked file types.

    hashtag
    Sorting the list of datasets

    You can sort the datasets list by:

    • Name

    • Number of files

    • Creation date

    • Most recent update date

    To sort by the name, file count, or creation date, you can click the column heading. To reverse the sort order, click the heading again.

    You can also select the sort option from the sort dropdown list. The sort dropdown list displays the current sort order. It also provides the options to sort by the date that the dataset was most recently updated.

    hashtag
    Displaying details for a dataset

    circle-info

    Required dataset permission: View dataset settings

    To display the details page for a dataset, on the Datasets page, click the dataset name.

    The menu across the top includes:

    The user who most recently updated the dataset
  • When the dataset was created

  • The list of entity types, with options to configure how Textual transforms the entities for each type.

    Project settings

    Settings to configure:

    • Dataset name

    • Credentials for cloud storage

    • Output location for cloud storage

    Project files

    The list of files in the dataset. For a cloud storage dataset, where the files can be located across multiple folders, Textual navigates to the first folder that contains selected dataset files.

    Entities Catalog

    The list of detected entity values in the dataset.

    Entities Analytics

    Summarizes the count of entity values by entity type.

    Dataset search

    Allows you to search for a word or phrase, to determine whether it was detected.

    Assigning tags to datasets
    Tag filter for the datasets list
    Source type filter for the datasets list
    File type filter for the datasets list
    Sort options for the datasets list
    Dataset details page

    Entity settings

    Language support in Textual

    Tonic Textual supports languages in addition to English. Textual automatically detects the language and applies the correct model.

    On self-hosted instances, you configure whether to support multiple languages.

    hashtag
    Supported languages

    Textual can detect values in the following languages:

    Name
    Code

    hashtag
    Self-hosted instances

    On a self-hosted instance, you configure whether Textual supports multiple languages.

    When you enable multi-language support, you can limit Textual to its multi-language model whenever it detects non-English content.

    hashtag
    Enabling multi-language support

    To enable support for languages other than English, TEXTUAL_MULTI_LINGUAL=true.

    The setting is used by the machine learning container.

    hashtag
    Using only the multi-language model for non-English content

    When TEXTUAL_MULTI_LINGUAL=true, then by default, when Textual detects any non-English content, it runs both its English model and its multi-language model.

    To instead only use the multi-language model, and not use the English model, set the environment variable TEXTUAL_MULTI_LINGUAL_XLM_ONLY=true. This can improve the precision of detections in non-English text.

    Handling of .docx and PDF file components

    Assamese

    as

    Azerbaijani

    az

    Basque

    eu

    Belarusian

    be

    Bengali

    bn

    Bengali Romanized

    Bosnian

    bs

    Breton

    br

    Bulgarian

    bg

    Burmese

    my

    Burmese (alternative)

    Catalan

    ca

    Chinese (Simplified)

    zh

    Chinese (Traditional)

    zh

    Croatian

    hr

    Czech

    cs

    Danish

    da

    Dutch

    nl

    English

    en

    Esperanto

    eo

    Estonian

    et

    Filipino

    tl

    Finnish

    fi

    French

    fr

    Galician

    gl

    Irish

    ga

    Georgian

    ka

    German

    de

    Greek

    el

    Gujarati

    gu

    Hausa

    ha

    Hebrew

    he

    Hindi

    hi

    Hindi Romanized

    Hungarian

    hu

    Icelandic

    is

    Indonesian

    id

    Italian

    it

    Japanese

    ja

    Javanese

    jv

    Kannada

    kn

    Kazakh

    kk

    Khmer

    km

    Korean

    ko

    Kurdish (Kurmanji)

    ku

    Kyrgyz

    ky

    Lao

    lo

    Latin

    la

    Latvian

    lv

    Lithuanian

    lt

    Macedonian

    mk

    Malagasy

    mg

    Malay

    ms

    Malayalam

    ml

    Marathi

    mr

    Mongolian

    mn

    Nepali

    ne

    Norwegian

    no

    Oriya

    or

    Oromo

    om

    Pashto

    ps

    Persian

    fa

    Polish

    pl

    Portuguese

    pt

    Punjabi

    pa

    Romanian

    ro

    Russian

    ru

    Sanskrit

    sa

    Scottish Gaelic

    gd

    Serbian

    sr

    Sinhala

    si

    Sindhi

    sd

    Slovak

    sk

    Slovenian

    sl

    Somali

    so

    Spanish

    es

    Sundanese

    su

    Swahili

    sw

    Swedish

    sv

    Tamil

    ta

    Tamil Romanized

    Telugu

    te

    Telugu Romanized

    Thai

    th

    Turkish

    tr

    Ukrainian

    uk

    Urdu

    ur

    Urdu Romanized

    Uyghur

    ug

    Uzbek

    uz

    Vietnamese

    vi

    Welsh

    cy

    Western Frisian

    fy

    Xhosa

    xh

    Yiddish

    yi

    Afrikaans

    af

    Albanian

    sq

    Amharic

    am

    Arabic

    ar

    Armenian

    hy

    set the environment variable

    Managing model-based custom entity types

    circle-info

    Required global permission - either:

    • Create custom entity types

    • Edit any custom entity type

    Self-hosted instances must also .

    While a regex-based entity type identifies values that match regular expressions, for a model-based custom entity type, you train a model to identify the entity values.

    A model-based entity type is useful when the values are identified more by context than by format. For example, for an entity type that identifies the names of health conditions, it would not be possible to set up regular expressions that identify the values.

    You iterate over text-based guidelines that identify the entity type values in a smaller set of data, then use a larger set of data to train one or more models.

    Each trained model is based on a selected version of the guidelines.

    You select the trained model to use for the custom entity type, and select the datasets to enable the custom entity type for.

    hashtag
    Getting started

    hashtag
    Defining the entity type

    hashtag
    Activating and managing an entity type

    configure a connection to the LLM to use

    Overview of the model definition process General workflow to define a model-based custom entity type

    Start a new entity type Begin the process of creating a new model-based custom entity type

    Select test data Identify the entity values in a small set of files.

    Refine model guidelines Fine-tune the guidelines used to identify values for the entity type.

    Select training data Assemble a much larger set of files to use to train models for the entity type.

    Create and train models Create models that are based on a selected guidelines version, and train those models on the training data.

    Select the active model Identify the trained model to use for the entity type.

    Rename or delete the entity type Change the entity type name or delete the entity type.

    Enable and disable the entity type in datasets Identify the datasets that use the entity type.

    Export and import an entity type Export the entity type to an encrypted .zip file, and import the entity type into another instance.

    Structure of JSON output files

    The JSON output provides access to Markdown content and identifies the entities that were detected in the file.

    hashtag
    Common elements in the JSON output

    hashtag
    Information about the entire file

    All JSON output files contain the following elements that contain information for the entire file:

    For specific file types, the JSON output includes additional objects and properties to reflect the file structure.

    hashtag
    Hashed and Markdown content

    The JSON output contains hashed and Markdown content for the entire file and for individual file components.

    hashtag
    Entities

    The JSON output contains entities arrays for the entire file and for individual file components.

    Each entity in the entities array has the following properties:

    hashtag
    Plain text files

    For plain text files, the JSON output only contains the information for the entire file.

    hashtag
    .csv files

    For .csv files, the structure contains a tables array.

    The tables array contains a table object that contains header and data arrays..

    For each row in the file, the data array contains a row array.

    For each value in a row, the row array contains a value object.

    The value object contains the entities, hashed content, and Markdown content for the value.

    hashtag
    .xlsx files

    For .xlsx files, the structure contains a tables array that provides details for each worksheet in the file.

    For each worksheet, the tables array contains a worksheet object.

    For each row in a worksheet, the worksheet object contains a header array and a data array. The data array contains a row array.

    For each cell in a row, the row array contains a cell object.

    Each cell object contains the entities, hashed content, and Markdown content for the cell.

    hashtag
    .docx files

    For .docx files, the JSON output structure adds:

    • A footnotes array for content in footnotes.

    • An endnotes array for content in endnotes.

    • A header object for content in the page headers. Includes separate objects for the first page header, even page header, and odd page header.

    These arrays and objects contain the entities, hashed content, and Markdown content for the notes, headers, and footers.

    hashtag
    PDF and image files

    PDF and image files use the same structure. Textual extracts and scans the text from the files.

    For PDF and image files, the JSON output structure adds the following content.

    hashtag
    pages array

    The pages array contains all of the content on the pages. This includes content in tables and key-value pairs, which are also listed separately in the output.

    For each page in the file, the pages array contains a page array.

    For each component on the page - such as paragraphs, headings, headers, and footers - the page array contains a component object.

    Each component object contains the component entities, hashed content, and Markdown content.

    hashtag
    tables array

    The tables array contains content that is in tables.

    For each table in the file, the tables array contains a table array.

    For each row in a table, the table array contains a row array.

    For each cell in a row, the row array contains a cell object.

    Each cell object identifies the type of cell (header or content). It also contains the entities, hashed content, and Markdown content for the cell.

    hashtag
    keyValuePairs array

    The keyValuePairs array contains key-value pair content. For example, for a PDF of a form with fields, a key-value pair might represent a field label and a field value.

    For each key-value pair, the keyValuePairs array contains a key-value pair object.

    The key-value pair object contains:

    • An automatically incremented identifier. For example, id for the first key-value pair is 1, for the second key-value pair is 2, and so on.

    • The start and end position of the key-value pair

    • The text of the key

    hashtag
    PDF and image JSON outline

    hashtag
    .eml and .msg files

    For email message files, the JSON output structure adds the following content.

    hashtag
    Email message identifiers

    The JSON output includes the following email message identifiers:

    • The identifier of the current message

    • If the message was a reply to another message, the identifier of that message

    • An array of related email messages. This includes the email message that the message replied to, as well as any other messages in an email message thread.

    hashtag
    Recipients

    The JSON output includes the email address and display name of the message recipients. It contains separate lists for the following:

    • Recipients in the To line

    • Recipients in the CC line

    • Recipients in the BCC line

    hashtag
    Subject line

    The subject object contains the message subject line. It includes:

    • Markdown and hashed versions of the message subject line.

    • The entities that were detected in the subject line.

    hashtag
    Message timestamp

    sentDate provides the timestamp when the message was sent.

    hashtag
    Message body

    The plainTextBodyContent object contains the body of the email message.

    It contains:

    • Markdown and hashed versions of the message body.

    • The entities that were detected in the message body.

    hashtag
    Message attachments

    The attachments array provides information about any attachments to the email message. For each attached file, it includes:

    • The identifier of the message that the file is attached to.

    • The identifier of the attachment.

    • The JSON output for the file.

    • The count of words in the original file.

    hashtag
    Email message JSON outline

    hashtag
    RTF files

    For RTF files, the JSON output structure adds the following content.

    hashtag
    htmlContent object

    The htmlContent object contains details about the HTML version of the file content. It includes the following:

    • innerTextEntities lists the entity values that are contained in the displayed content of the file.

    • attributeEntities lists the entity values that are contained in the HTML attributes for the file.

    • text contains the text of the HTML content.

    hashtag
    tables array

    The tables array contains content that is in tables.

    For each table in the file, the tables array contains a table array.

    For each row in a table, the table array contains a row array.

    For each cell in a row, the row array contains a cell object.

    Each cell object identifies the type of cell (header or content). It also contains the entities, hashed content, and Markdown content for the cell.

    hashtag
    RTF file JSON outline

    The confidence score for the entity.

    Indicates how confident Textual is that the value is an entity of the specified type.

    language

    The language code to identify the language for the entity value. For example, en indicates that the value is in English.

    A footer object for content in the page footers. Includes separate objects for the first page footer, even page footer, and odd page footer.

    The entities, hashed content, and Markdown content for the value

    The count of words in the redacted version of the file.

    hash contains the hashed version of the HTML content.

    fileType

    The type of the original file.

    content

    Details about the file content. It includes:

    • Hashed and Markdown content for the file

    • Entities in the file

    schemaVersion

    An integer that identifies the version of the JSON schema that was used for the JSON output.

    Textual uses this to convert content from older schemas to the most recent schema.

    hash

    The hashed version of the file or component content.

    text

    The file or component content in Markdown notation.

    start

    Within the file or component, the location where the entity value starts.

    For example, in the following text:

    My name is John.

    John is an entity that starts at 11.

    end

    Within the file or component, the location where the entity value ends.

    For example, in the following text:

    My name is John.

    John is an entity that ends at 14.

    label

    The type of entity.

    For a list of the built-in entity types that Textual detects, go to Built-in entity types.

    text

    The text of the entity.

    score

    {
      "fileType": "<file type>",
      "content": {
        "text": "<Markdown file content>",
        "hash": "<hashed file content>",
        "entities": [   //Entry for each entity in the file
          {
            "start": <start location>,
            "end": <end location>,
            "label": "<value type>",
            "text": "<value text>",
            "score": <confidence score>,
            "language": "<language code>"
          }
        ]
      },
      "schemaVersion": <integer schema version>
    }
    {
      "fileType": "<file type>",
      "content": {
        "text": "<Markdown content>",
        "hash": "<hashed content>",
        "entities": [   //Entry for each entity in the file
          {
            "start": <start location>,
            "end": <end location>,
            "label": "<value type>",
            "text": "<value text>",
            "score": <confidence score>,
            "language": "<language code>"      }
        ]
      },
      "schemaVersion": <integer schema version>
    }
    {
      "tables": [
        {
          "tableName": "csv_table",
          "header": [//Columns that contain heading info (col_0, col_1, and so on)
            "<column identifier>"
          ],
          "data": [  //Entry for each row in the file
            [   //Entry for each value in the row
              {    
                "entities": [   //Entry for each entity in the value
                  {
                    "start": <start location>,,
                    "end": <end location>,
                    "label": "<value type>",
                    "text": "<value text>",
                    "score": <confidence score>,
                    "language": "<language code>"
                  }
                ],
                "hash": "<hashed value content>",
                "text": "<Markdown value content>"
              }
            ]
          ]
        }
      ],
      "fileType": "<file type>",
      "content": {
        "text": "<Markdown file content>",
        "hash": "<hashed file content>",
        "entities": [   ///Entry for each entity in the file
          {
            "start": <start location>,
            "end": <end location>
            "label": "<value type>",
            "text": "<value text>",
            "score": <confidence score>,
            "language": "<language code>"
          }
        ]
      },
      "schemaVersion": <integer schema version>
    }
    {
      "tables": [   //Entry for each worksheet
        {
          "tableName": "<Name of the worksheet>",
          "header": [ //Columns that contain heading info (col_0, col_1, and so on)
            "<column identifier>"
          ],
          "data": [   //Entry for each row
            [   //Entry for each cell in the row
              {
                "entities": [   //Entry for each entity in the cell
                  {
                    "start": <start location>,
                    "end": <end location>,
                    "label": "<value type>",
                    "text": "<value text>",
                    "score": <confidence score>,
                    "language": "<language code>"
                  }
                ],
                "hash": "<hashed cell content>",
                "text": "<Markdown cell content>"
              }
            ]
          ]
        }
      ],
      "fileType": "<file type>",
      "content": {
        "text": "<Markdown file content>",
        "hash": "<hashed file content>",
        "entities": [   //Entry for each entity in the file
          {
            "start": <start location>,
            "end": <end location>,
            "label": "<value type>",
            "text": "<value text>",
            "score": <confidence score>,
            "language": "<language code>"
          }
        ]
      },
      "schemaVersion": <integer schema version>
    }
    {
      "footNotes": [   //Entry for each footnote
        {
          "entities": [   //Entry for each entity in the footnote
            {
              "start": <start location>,
              "end": <end location>,
              "pythonStart": <start location in Python>,
              "pythonEnd": <end location in Python>,
              "label": "<value type>",
              "text": "<value text>",
              "score": <confidence score>,
              "language": "<language code>"
              "exampleRedaction": null
            }
          ],
          "hash": "<hashed footnote content>",
          "text": "<Markdown footnote content>"
        }
      ],
      "endNotes": [   //Entry for each endnote
        {
          "entities": [   //Entry for each entity in the endnote
            {
              "start": <start location>,
              "end": <end location>,
              "label": "<value type>",
              "text": "<value text>",
              "score": <confidence score>,
              "language": "<language code>"
            }
          ],
          "hash": "<hashed endnote content>",
          "text": "<Markdown endnote content>"
        }
      ],
      "header": {
        "first": {
          "entities": [   //Entry for each entity in the first page header
            {
              "start": <start location>,
              "end": <end location>,
              "label": "<value type>",
              "text": "<value text>",
              "score": <confidence score>,
              "language": "<language code>"
            }
          ],
          "hash": "<hashed first page header content>",
          "text": "<Markdown first page header content>"
        },
        "even": {
          "entities": [   //Entry for each entity in the even page header
            {
              "start": <start location>,
              "end": <end location>,
              "label": "<value type>",
              "text": "<value text>",
              "score": <confidence score>,
              "language": "<language code>"
            }
          ],
          "hash": "<hashed even page header content>",
          "text": "<Markdown even page header content>"
        },
        "odd": {
          "entities": [   //Entry for each entity in the odd page header
            {
              "start": <start location>,
              "end": <end location>,
              "label": "<value type>",
              "text": "<value text>",
              "score": <confidence score>,
              "language": "<language code>"
            }
          ],
          "hash": "<hashed odd page header content>",
          "text": "<Markdown odd page header content>"
        }
      },
      "footer": {
        "first": {
          "entities": [   //Entry for each entity in the first page footer
            {
              "start": <start location>,
              "end": <end location>,
              "label": "<value type>",
              "text": "<value text>",
              "score": <confidence score>,
              "language": "<language code>"
            }
          ],
          "hash": "<hashed first page footer content>",
          "text": "<Markdown first page footer content>"
        },
        "even": {
          "entities": [   //Entry for each entity in the even page footer
            {
              "start": <start location>,
              "end": <end location>,
              "label": "<value type>",
              "text": "<value text>",
              "score": <confidence score>,
              "language": "<language code>"
            }
          ],
          "hash": "<hashed even page footer content>",
          "text": "<Markdown even page footer content>"
        },
        "odd": {
          "entities": [   //Entry for each entity in the odd page footer
            {
              "start": <start location>,
              "end": <end location>,
              "label": "<value type>",
              "text": "<value text>",
              "score": <confidence score>,
              "language": "<language code>"
            }
          ],
          "hash": "<hashed odd page footer content>",
          "text": "<Markdown odd page footer content>"
        }
      },
      "fileType": "<file type>",
      "content": {
        "text": "<Markdown file content>",
        "hash": "<hashed file content>",
        "entities": [   //Entry for each entity in the file
          {
            "start": <start location>,
            "end": <end location>,
            "label": "<value type>",
            "text": "<value text>",
            "score": <confidence score>,
            "language": "<language code>"
          }
        ]
      },
      "schemaVersion": <integer schema version>
    }
    {
      "pages": [   //Entry for each page in the file
        [   //Entry for each component on the page
          {
            "type": "<page component type>",
            "content": {
              "entities": [   //Entry for each entity in the component
                {
                  "start": <start location>,
                  "end": <end location>,
                  "label": "<value type>",
                  "text": "<value text>",
                  "score": <confidence score>,
                  "language": "<language code>"
                }
              ],
              "hash": "<hashed component content>",
              "text": "<Markdown component content>"
            }
          }
        ],
      "tables": [   //Entry for each table in the file
        [   //Entry for each row in the table
          [   //Entry for each cell in the row
            {
              "type": "<content type>",   //ColumnHeader or Content
              "content": {
                "entities": [  //Entry for each entity in the cell
                  {
                    "start": <start location>,
                    "end": <end location>,
                    "label": "<value type>",
                    "text": "<value text>",
                    "score": <confidence score>,
                    "language": "<language code>"
                  }
                ],
                "hash": "<hashed cell text>",
                "text": "<Markdown cell text>"
              }
            }
          ]
        ]
      ],
      "keyValuePairs": [   //Entry for each key-value pair in the file
        {
          "id": <incremented identifier>,
          "key": "<key text>",
          "value": {
            "entities": [  //Entry for each entity in the value
              {
                "start": <start location>,
                "end": <end location>,
                "label": "<value type>",
                "text": "<value text>",
                "score": <confidence score>,
                "language": "<language code>"
              }
            ],
            "hash": "<hashed value text>",
            "text": "<Markdown value text>"
          },
          "start": <start location of the key-value pair>,
          "end": <end location of the key-value pair>
        }
      ],
      "fileType": "<file type>",
      "content": {
        "text": "<Markdown file content>",
        "hash": "<hashed file content>",
        "entities": [   ///Entry for each entity in the file
          {
            "start": <start location>,
            "end": <end location>,
            "label": "<value type>",
            "text": "<value text>",
            "score": <confidence score>,
            "language": "<language code>"
          }
        ]
      },
      "schemaVersion": <integer schema version>
    }
    {
      "messageId": "<email message identifier>",
      "inReplyToMessageId": <message that this message replied to>,
      "messageIdReferences": [<related email messages>],
      "senderAddress": {
        "address": "<sender email address>",
        "displayName": "<sender display name>"
      },
      "toAddresses": [  //Entry for each recipient in the To list
        {
          "address": "<recipient email address>",
          "displayName": "<recipient display name>"
        }
      ],
      "ccAddresses": [ //Entry for each recipient in the CC list
        {
          "address": "<recipient email address>",
          "displayName": "<recipient display name>"
        }
      ],
      "bccAddresses": [ //Entry for each recipient in the BCC list
        {
          "address": "<recipient email address>",
          "displayName": "<recipient display name>"
        }
      ],
      "sentDate": "<timestamp when the message was sent>",
      "subject": {
        "text": "<Markdown version of the subject line>",
        "hash": "<hashed version of the subject line>",
        "entities": [   //Entry for each entity in the subject line
          {
            "start": <start location>,
            "end": <end location>,
            "label": "<value type>",
            "text": "<value text>",
            "score": <confidence score>,
            "language": "<language code>"
          }
        ]
      },
      "plainTextBodyContent": {
        "text": "<Markdown version of the message body>",
        "hash": "<hashed version of the message body>",
        "entities": [ //Entry for each entity in the message body
          {
            "start": <start location>,
            "end": <end location>,
            "label": "<value type>",
            "text": "<value text>",
            "score": <confidence score>,
            "language": "<language code>"
          }
        ]
      },
      "attachments": [ //Entry for each attached file
        {
          "parentMessageId": "<the message that the file is attached to>",
          "contentId": "<identifier of the attachment>",
          "fileName": "<name of the attachment file>",
          "document": {<JSON for the attached file>},
          "wordCount": <number of words in the attachment>,
          "redactedWordCount": <number of words in the redacted attachment>
        }
      ],
      "fileType": "<file type>",
      "content": {
        "text": "<Markdown file content>",
        "hash": "<hashed file content>",
        "entities": [ //Entry for each entity in the file
          {
            "start": <start location>,
            "end": <end location>,
            "label": "<value type>",
            "text": "<value text>",
            "score": <confidence score>,
            "language": "<language code>"
          }
        ]
      },
      "schemaVersion": <integer schema version>
    }
    {
      "fileType": "<file type>",
      "content": {
        "text": "<Markdown file content>",
        "hash": "<hashed file content>",
        "entities": [   //Entry for each entity in the file
          {
            "start": <start location>,
            "end": <end location>,
            "label": "<value type>",
            "text": "<value text>",
            "score": <confidence score>,
            "language": "<language code>"
          }
        ]
      },
      "htmlContent" {
        "innerTextEntities": [ //Entry for each entity in the content text
          {
            "start": <start location>,
            "end": <end location>,
            "label": "<value type>",
            "text": "<value text>",
            "score": <confidence score>,
            "language": "<language code>"
          }
         ],
         "attributeEntities":[ // Entry for each entity in the HTML attributes
           {
             "start": <start location>,
             "end": <end location>,
             "label": "<value type>",
             "text": "<value text>",
             "score": <confidence score>,
             "language": "<language code>"
           }
          ],
         "hash": "<hashed HTML content>",
         "text": "<HTML content text>"
      },
     "tables": [   //Entry for each table in the file
        [   //Entry for each row in the table
          [   //Entry for each cell in the row
            {
              "type": "<content type>",   //ColumnHeader or Content
              "content": {
                "entities": [  //Entry for each entity in the cell
                  {
                    "start": <start location>,
                    "end": <end location>,
                    "label": "<value type>",
                    "text": "<value text>",
                    "score": <confidence score>,
                    "language": "<language code>"
                  }
                ],
                "hash": "<hashed cell text>",
                "text": "<Markdown cell text>"
              }
            }
          ]
        ]
      ],
      "schemaVersion": <integer schema version>
    }
    HIPAA Address generator in Structural
    Address generator in Structural
    Phone generator in Structural

    Enabling consistency with Tonic Structural

    If you also use Tonic Structural, then you can configure Textual to enable selected synthesized values to be consistent between the two applications.

    For example, a given source telephone number can produce the same replacement telephone number in both Structural and Textual.

    To enable this consistency, you configure a statistics seed value as the value of the Textual environment variable SOLAR_STATISTICS_SEED. A statistics seed is a signed 32-bit integer.

    The value must match a Structural statistics seed value, either:

    • The value of the Structural environment setting TONIC_STATISTICS_SEED.

    • A statistics seed configured for an individual Structural workspace.

    The current statistics seed value is displayed on the System Settings page.