Configure entity type handling for redaction

Required dataset permission: Edit dataset settings

By default, when you:

Configure a dataset
Redact a string
Retrieve a redacted file

Textual does the following:

For the string and file redaction, replaces detected values with tokens.
For LLM synthesis, generates realistic synthesized values.

When you make the request, you can:

Override the default behavior.
For individual files and text strings, specify custom entity types to include.

Specifying the handling option for entity types

For each entity type, you can choose to redact, synthesize, or ignore the value.

When you redact a value, Textual replaces the value with a token that consists of the entity type. For example, ORGANIZATION.
When you synthesize a value, Textual replaces the value with a different realistic value.
When you ignore a value, Textual passes through the original value.

To specify the handling option for entity types, you use the generator_config parameter.

generator_config={'<entity_type>':'<handling_option>'}

Where:

<entity_type> is the identifier of the entity type. For example, ORGANIZATION. For the list of built-in entity types that Textual scans for, go to Built-in entity types. For custom entity types, the identifier is the entity type name in all caps. Spaces are replaced with underscores, and the identifier is prefixed with CUSTOM_. For example, for a custom entity type named My New Type, the identifier is CUSTOM_MY_NEW_TYPE. From the Custom Entity Types page, to copy the identifier of a custom entity type, click its copy icon.

<handling_option> is the handling option to use for the specified entity type. The possible values are Redact, Synthesis, and Off.

For example, to synthesize organization values, and ignore languages:

generator_config={'ORGANIZATION':'Synthesis', 'LANGUAGE':'Off'}

Specifying a default handling option

For string and file redaction, you can specify a default handling option to use for entity types that are not specified in generator_config.

To do this, you use the generator_default parameter.

generator_default can be either Redact, Synthesis, or Off.

Providing added and excluded values for entity types

You can also configure added and excluded values for each entity type.

You add values that Textual does not detect for an entity type, but should. You exclude values that you do not want Textual to identify as that entity type.

To specify the added values, use label_allow_lists.
To specify the excluded values, use label_block_lists.

For each of these parameters, the value is a list of entity types to specify the added or excluded values for. To specify the values, you provide an array of regular expressions.

{'<entity_type>':['<regex>']}

The following example uses label_allow_lists to add values:

For NAME_GIVEN, adds the values There and Here.
For NAME_FAMILY, adds values that match the regular expression ([a-z]{2}).

(label_allow_lists={
    'NAME_GIVEN':['There','Here'], 
    'NAME_FAMILY':['([a-z]{2})']
    }
)

Including custom entity types

When you redact a string or download a redacted file, you can provide a comma-separated list of custom entity types to include. Textual then scans for and redacts those entity types based on the configuration in generator_config.

custom_entities="["<entity type identifier>"]

For example:

custom_entities=["CUSTOM_COGNITIVE_ACCESS_KEY", "CUSTOM_PERSONAL_GRAVITY_INDEX"]

Last updated 3 months ago

Was this helpful?