Configure entity type handling for redaction

By default, when you:

  • Configure a dataset

  • Redact a string

  • Retrieve a redacted file

Textual does the following:

  • For the string and file redaction, replaces detected values with tokens.

  • For LLM synthesis, generates realistic synthesized values.

When you make the request, you can override the default behavior.

Specifying the handling option for entity types

For each entity type, you can choose to redact, synthesize, or ignore the value.

  • When you redact a value, Textual replaces the value with a token that consists of the entity type. For example, ORGANIZATION.

  • When you synthesize a value, Textual replaces the value with a different realistic value.

  • When you ignore a value, Textual passes through the original value.

To specify the handling option for entity types, you use the generator_config parameter.

generator_config={'<entity_type>':'<handling_option>'}

Where:

  • <entity_type> is the identifier of the entity type. For example, ORGANIZATION. For the list of built-in entity types that Textual scans for, go to Entity types that Textual detects.

  • <handling_option> is the handling option to use for the specified entity type. The possible values are Redact, Synthesis, and Off.

For example, to synthesize organization values, and ignore languages:

generator_config={'ORGANIZATION':'Synthesis', 'LANGUAGE':'Off'}

Specifying a default handling option

For string and file redaction, you can specify a default handling option to use for entity types that are not specified in generator_config.

To do this, you use the generator_default parameter.

generator_default can be either Redact, Synthesis, or Off.

Providing added and excluded values for entity types

You can also configure added and excluded values for each entity type.

You add values that Textual does not detect for an entity type, but should. You exclude values that you do not want Textual to identify as that entity type.

  • To specify the added values, use label_allow_lists.

  • To specify the excluded values, use label_block_lists.

For each of these parameters, the value is a list of entity types to specify the added or excluded values for. To specify the values, you provide an array of regular expressions.

{'<entity_type>':['<regex>']}

The following example uses label_allow_lists to add values:

  • For NAME_GIVEN, adds the values There and Here.

  • For NAME_FAMILY, adds values that match the regular expression ([a-z]{2}).

(label_allow_lists={
    'NAME_GIVEN':['There','Here'], 
    'NAME_FAMILY':['([a-z]{2})']
    }
)

Last updated