For each entity type in a dataset, you can configure additional values to detect, and values to exclude.
You might add values that Textual does not detect because, for example, they are specific to your organization or industry.
You might exclude a value because:
Textual labeled the value incorrectly.
You do not want to redact a specific value. For example, you might want to preserve known test values.
Note that for a pipeline that redacts files, you cannot exclude or add specific values.
In the entity types list, the add values and exclude values icons indicate whether there are configured included and blocked values for the entity type.
When added or excluded values are configured, the corresponding icon is green.
When there are no configured values, the corresponding icon is black.
From the Custom Entity Detection panel, you configure both added and excluded values for entity types.
To display the panel, either:
Click the add values or exclude values icon for an entity type
In the word count panel, click Custom Entity Detection
The panel contains an Add to detection tab for added values, and an Exclude from detection tab for excluded values.
The entity type dropdown list at the top of the Custom Entity Detection panel indicates the entity type that you are configuring added and excluded values for.
If you display the panel from an add values or exclude values icon, then the initial selected entity type is the entity type for which you clicked the icon. To configure values for a different entity type, select the entity type from the list.
If you display the panel from the Custom Entity Detection option, then there is no default selection. You must select the entity type.
On the Add to detection tab, you configure the added values for the selected entity type.
Each value can be a specific word or phrase, or a regular expression to identify the values to add. Regular expressions must be C# compatible.
To add an added value:
Click the empty entry.
Type the value into the field.
To edit an added value:
Click the value.
Update the value text.
For each added value, you can test whether Textual correctly detects it.
To test a value:
From the Test Entry dropdown list, select the number for the value to test.
In the text field, type or paste content that contains a value or values that Textual should detect.
The Results field displays the text and highlights matching values.
To remove an added value, click its delete icon.
On the Exclude from detection tab, you configure the excluded values for the selected entity type.
Each value can be either a specific word or phrase to exclude, or a regular expression to identify the values to exclude. The regular expression must be C# compatible.
To add an excluded value:
Click the empty entry.
Type the value into the field.
To edit an excluded value:
Click the value.
Update the value text.
For each excluded value, you can test whether Textual correctly detects it.
To test the value that you are currently editing:
From the Test Entry dropdown list, select the number for the value to test.
In the text field, type or paste content that contains a value or values to exclude.
The Results field displays the text and highlights matching values.
To remove an excluded value, click its delete icon.
The new added and excluded values are not reflected in the entity types list until Textual runs a new scan.
When you save the changes, you can choose whether to run a new scan on the dataset files.
To save the changes and also start a scan, click Save and Scan Files.
To save the changes, but not run a scan, click Save Without Scanning Files. When you do not run the scan, then on the dataset details page, Textual displays a prompt to run a scan.
Add or exclude values for an entity type
Identify additional values to detect for an entity type, and values to not identify as an entity type.
Select the handling option
Indicate whether to redact, synthesize, or ignore values.
Configure synthesis for datetime values
Add rules to indicate how to adjust datetime values.
For some entity types, when you select the Synthesis option, you can configure additional options for how Tonic Textual generates the replacement values.
To display the available options, click Options.
Location values include the following types:
Location
Location Address
Location State
Location Zip
For each location types other than Location State, you can specify whether to use a realistic replacement value. For Location State, based on HIPAA guidelines, both the Synthesis option and the Off option pass through the value.
For location types that include zip codes, you can also specify how to generate the new zip code values.
By default, Textual replaces a location value with a realistic corresponding value. For example, "Main Street" might be replaced with "Fourth Avenue".
To instead scramble the values, uncheck Replace with realistic values.
By default, to generate a new zip code, Textual selects a real zip code that starts with the same three digits as the original zip code. For a low population area, Textual instead selects a random zip code from the United States.
To instead replace the last two digits of the zip code with zeros, uncheck Replace zeroes for zip codes. For a low population area, Textual instead replaces all of the digits in the zip code with zeros.
By default, when you select the Synthesis option for Date/Time and Date of Birth values, Textual shifts the datetime values to a value that occurs within 7 days before or after the original value.
To customize how Textual sets the new values, you can:
Set a different range within which Textual sets the new values
Indicate whether to scramble date values that Textual cannot parse
Add additional date formats for Textual to recognize
By default, Textual adjusts the dates to values that are within 7 days before or after the original date.
To change the range, in the # of Days To Shift +/- field, enter the number of days before and the original date within which the replacement datetime value must occur. For example, if you enter 10, then the replacement datetime value must occur within 10 days before or after the original value.
Textual can parse datetime values that use either a format in #synthesis-options-datetime-default-formats or a format that you add.
The Scramble Unrecognized Dates checkbox indicates how Textual should handle datetime values that it does not recognize.
By default, the checkbox is checked, and Textual scrambles those values.
To instead pass through the values without changing them, uncheck Scramble Unrecognized Dates.
By default, Textual is able to recognize datetime values that use a format from #synthesis-options-datetime-default-formats.
Under Additional Date Formats, you can add other datetime formats that you know are present in your data.
The formats must use a Noda Time LocalDateTime pattern.
To add a format, type the format in the field, then click +.
To remove a format, click its delete icon.
By default, Textual supports the following datetime formats.
By default, when you select the Synthesis option for Age values, Textual shifts the age value to a value that is within seven years before or after the original value. For age values that it cannot synthesize, it scrambles the value.
To configure the synthesis for Age values:
In the Range of Years +/- for the Shifted Age field, enter the number of years before and after the original value to use as the range for the synthesized value.
By default, Textual scrambles age values that it cannot parse. To instead pass through the value unchanged, uncheck Scramble Unrecognized Ages.
For each entity type, you choose how to handle the sensitive data values.
The available options are:
Synthesis - Indicates to replace the value with another realistic value. For example, the first name value Michael might be replaced with the value John. Textual does not synthesize any blocked values.
Redaction - This is the default option. For text files, Redaction indicates to replace the value with a placeholder that identifies the entity type. For example, the first name value Michael is replaced with the value PERSON. For PDF files and image files, Redaction indicates to cover the value with a black box. Textual does not redact any blocked values.
Off - Indicates to not make any changes to the values. For example, the first name value Michael remains Michael.
To select the handling option for an individual entity type, click the option for that type.
To select the same handling option for all of the entity types, from the Bulk Edit dropdown above the data type list, select the option.
Format | Example value |
---|---|
Format | Example value |
---|---|
Format | Example value |
---|---|
yyyy/M/d
2024/1/17
yyyy-M-d
2024-1-17
yyyyMMdd
20240117
yyyy.M.d
2024.1.17
yyyy, MMM d
2024, Jan 17
yyyy-M
2024-1
yyyy/M
2024/1
d/M/yyyy
17/1/2024
d-MMM-yyyy
17-Jan-2024
dd-MMM-yy
17-Jan-24
d-M-yyyy
17-1-2024
d/MMM/yyyy
17/Jan/2024
d MMMM yyyy
17 January 2024
d MMM yyyy
17 Jan 2024
d MMMM, yyyy
17 January, 2024
ddd, d MMM yyyy
Wed, 17 Jan 2024
M/d/yyyy
1/17/2024
M/d/yy
1/17/24
M-d-yyyy
1-17-2024
MMddyyyy
01172024
MMMM d, yyyy
January 17, 2024
MMM d, ''yy
Jan 17, '24
MM-yyyy
01-2024
MMMM, yyyy
January, 2024
yyyy-M-d HH:mm
2024-1-17 15:45
d-M-yyyy HH:mm
17-1-2024 15:45
MM-dd-yy HH:mm
01-17-24 15:45
d/M/yy HH:mm:ss
17/1/24 15:45:30
d/M/yyyy HH:mm:ss
17/1/2024 15:45:30
yyyy/M/d HH:mm:ss
2024/1/17 15:45:30
yyyy-M-dTHH:mm:ss
2024-1-17T15:45:30
yyyy/M/dTHH:mm:ss
2024/1/17T15:45:30
yyyy-M-d HH:mm:ss'Z'
2024-1-17 15:45:30Z
yyyy-M-d'T'HH:mm:ss'Z'
2024-1-17T15:45:30Z
yyyy-M-d HH:mm:ss.fffffff
2024-1-17 15:45:30.1234567
yyyy-M-dd HH:mm:ss.FFFFFF
2024-1-17 15:45:30.123456
yyyy-M-dTHH:mm:ss.fff
2024-1-17T15:45:30.123
HH:mm
15:45
HH:mm:ss
15:45:30
HHmmss
154530
hh:mm:ss tt
03:45:30 PM
HH:mm:ss'Z'
15:45:30Z