You can use the Textual SDK to redact and synthesize values in individual files.
Before you perform these tasks, remember to instantiate the SDK client.
For a self-hosted instance, you can also configure the S3 bucket to use to store the files. This is the same S3 bucket that is used to store files for uploaded file pipelines. For more information, go to Setting the S3 bucket for file uploads and redactions. For an example of an IAM role with the required permissions, go to #file-upload-example-iam-role.
To send an individual file to Textual, you use textual.start_file_redaction
.
You first open the file so that Textual can read it, then make then call for Textual to read the file.
The response includes:
The file name
The identifier of the job that processed the file. You use this identifier to retrieve a transformed version of the file.
After you use textual.start_file_redaction
to send the file to Textual, you use tonic.download_redacted_file
to retrieve a transformed version of the file.
To identify the file, you use the job identifier that you received from textual.start_file_redaction
. You can also specify whether to redact, synthesize, or ignore specific entity types. By default, all of the values are redacted.
Before you make the call to download the file, you specify the path to download the file content to.
Textual uses datasets to produce files with sensitive values replaced.
Before you perform these tasks, remember to instantiate the SDK client.
To create a new dataset and then upload a file to it, use textual.create_dataset
.
To add files to the dataset, use dataset.add_file
.
Textual creates the dataset, scans the uploaded file, and redacts the detected values.
To change the configuration of a dataset, use dataset.edit
.
You can use dataset.edit
to change:
The name of the dataset
To get the current status of the files in the current dataset, use dataset.describe
:
The response includes:
The name and identifier of the dataset
The number of files in the dataset
The number of files that are waiting to be processed (scanned and redacted)
The number of files that had errors during processing
For example:
To get a list of files that have a specific statuse, use the following:
The file list includes:
File identifier and name
Number of rows and columns
Processing status
For failed files, the error
When the file was uploaded
To delete a file from a dataset, use dataset.delete_file
.
To get the redacted content in JSON format for a dataset, use dataset.fetch_all_json()
:
For example:
The response looks something like:
Before you perform these tasks, remember to instantiate the SDK client.
You can use the Tonic Textual SDK to redact individual strings, including:
Plain text strings
JSON content
XML content
For a text string, you can also request synthesized values from a large language model (LLM).
The redaction request can include the handling configuration for entity types.
The redaction response includes the redacted or synthesized content and details about the detected entity values.
To send a plain text string for redaction, use textual.redact
:
For example:
To send a JSON string for redaction, use textual.redact_json. You can send the JSON content as a JSON string or a Python dictionary.
redact_json
ensures that only the values are redacted. It ignores the keys.
For example:
To send an XML string for redaction, use textual.redact_xml
.
redact_xml
ensures that only the values are redacted. It ignores the XML markup.
For example:
Produces the following output:
You can also request synthesized values from a large language model (LLM).
When you use this process, Textual first identifies the sensitive values in the text. It then sends the value locations and redacted values to the LLM. For example, if Textual identifies a product name, it sends the location and the redacted value PRODUCT
to the LLM. Textual does not send the original values to the LLM.
The LLM then generates realistic synthesized values of the appropriate value types.
To send text to an LLM, use textual.llm_synthesis
:
For example:
The response provides the redacted or synthesized version of the string, and the list of detected entity values.
For each redacted item, the response includes:
The location of the value in the original text (start
and end
)
The location of the value in the redacted version of the string (new_start
and new_end
)
The entity type (label
)
The original value (text
)
The redacted or synthesized value (new_text
). new_text
is null
in the following cases:
The entity type is ignored
The response is from llm_synthesis
A score to indicate confidence in the detection and redaction (score
)
The detected language for the value (language
)
For responses from textual.redact_json
, the JSON path to the entity in the original document (json_path
)
For responses from textual.redact_xml
, the Xpath to the entity in the original XML document (xml_path
)
By default, when you:
Configure a dataset
Redact or synthesize a string
Retrieve a redacted file
Textual does the following:
For the string and file redaction, redacts the detected sensitive values.
For LLM synthesis, generates realistic synthesized values.
When you make the request, you can override the default behavior.
For each entity type, you can choose to redact, synthesize, or ignore the value.
When you redact a value, Textual replaces the value with <entity type>_<generated identifier>
. For example, ORGANIZATION_EPfC7XZUZ
.
When you synthesize a value, Textual replaces the value with a different realistic value.
When you ignore a value, Textual passes through the original value.
To specify the handling option for entity types, you use the generator_config
parameter.
Where:
<handling_option>
is the handling option to use for the specified entity type. The possible values are Redact
, Synthesis
, and Off
.
For example, to synthesize organization values, and ignore languages:
For string and file redaction, you can specify a default handling option to use for entity types that are not specified in generator_config
.
To do this, you use the generator_default
parameter.
generator_default
can be either Redact
, Synthesis
, or Off
.
You can also configure added and excluded values for each entity type.
You add values that Textual does not detect for an entity type, but should. You exclude values that you do not want Textual to identify for that entity type.
To specify the added values, use label_allow_lists
.
To specify the excluded values, use label_block_lists
.
For each of these parameters, the value is a list of entity types to specify the added or excluded values for. To specify the values, you provide an array of regular expressions.
The following example uses label_allow_lists
to add values:
For NAME_GIVEN
, adds the values There
and Here
.
For NAME_FAMILY
, adds values that match the regular expression ([a-z]{2})
.
<entity_type>
is the identifier of the entity type. For example, ORGANIZATION
. For the list of built-in entity types that Textual scans for, go to .
Create and manage datasets
Create, update, and get redacted files from a Textual dataset
Redact and synthesize individual strings
Send a plain text, JSON, or XML string for redaction
Redact and synthesize individual files
Send a file for redaction and retrieve the results
Configure entity type handling
Configure how Textual treats each type of entity in a dataset, redacted file, or redacted string