Selecting the handling option for entity types
Last updated
Was this helpful?
Last updated
Was this helpful?
For each entity type, you choose how to handle the detected values.
The available options are:
Synthesis - Indicates to replace the value with another realistic value. For example, the first name value Michael might be replaced with the value John. The synthesized values are always consistent, meaning that a given entity value always has the same replacement value. For example, if the first name Michael appears multiple times in the text, it is always replaced with John. Textual does not synthesize any excluded values. For custom entity types, Textual scrambles the values.
Redaction - This is the default option.
For text files, Redaction indicates to tokenize the value - to replace it with a token that identifies the entity type followed by a unique identifier. For example, the first name value Michael might be replaced with NAME_GIVEN_12m5s
. The identifiers are consistent, which means that for a given original value, the replacement always has the same unique identifier. For example, the first name Michael might always be replaced with NAME_GIVEN_12m5sb
, while the first name Helen might always be replaced with NAME_GIVEN_9ha3m2
.
For PDF files and image files, Redaction indicates to cover the value with a black box.
Textual does not redact any excluded values.
Off - Indicates to not make any changes to the values. For example, the first name value Michael remains Michael.
To select the handling option for an individual entity type, click the option for that type.
For a dataset, to select the same handling option for all of the entity types, from the Bulk Edit dropdown above the data type list, select the option.
For a pipeline that generates synthesized files, on the Generator Config tab, use the Bulk Edit options at the top of the entity types list.