To send an individual .txt or .csv file to Textual, you use textual.start_file_redaction
.
You first open the file so that Textual can read it, then make then call for Textual to read the file.
The response includes:
The file name
The identifier of the job that processed the file. You use this identifier to retrieve a transformed version of the file.
After you use textual.start_file_redaction
to send the file to Textual, you use tonic.download_redacted_file
to retrieve a transformed version of the file.
To identify the file, you use the job identifier that you received from textual.start_file_redaction
. You can also specify whether to redact, synthesize, or ignore specific entity types. By default, all of the values are redacted.
Before you make the call to download the file, you specify the path to download the file content to.
Whenever you call the Textual API, you first instantiate the SDK client.
If the API key is configured as the value of TONIC_TEXTUAL_API_KEY
, then you do not need to provide the API key when you instantiate the SDK client.
If the API key is not configured as the value of TONIC_TEXTUAL_API_KEY
, then you must include the API key in the request.
To create a new dataset and then upload a file to it, use textual.create_dataset
. To add files to the dataset, use dataset.upload_then_add_file
.
For example:
Textual creates the dataset, scans the uploaded file, and redacts the detected values.
To get the current status of the files in the current dataset, use dataset.describe
:
The response includes:
The name and identifier of the dataset
The number of files in the dataset
The number of files that are waiting to be processed (scanned and redacted)
The number of files that had errors during processing
For example:
To get the redacted content in JSON format for a dataset, use textual
.get_dataset
:
For example:
The response looks something like:
To redact a specific text string and view the results, use textual.redact
:
The response provides the redacted version of the string, and the list of redacted values. For each redacted item, the response includes:
The location of the value
The type of sensitive value
The original value
A score to indicate confidence in the detection and redaction
For example:
By default, Textual redacts detected sensitive values. The redacted value is <value type>_<generated identifier>
. For example, ORGANIZATION_EPfC7XZUZ
.
For each value type, you can instead choose to either synthesize or ignore the value.
When you synthesize a value, Textual replaces the value with a different realistic value.
When you ignore a value, Textual passes through the original value.
To specify the handling type for a value type, you use the generator_config
parameter.
Where:
<value_type>
is the identifier of the type of value. For example, ORGANIZATION
. For the list of built-in value types that Textual scans for, go to About entity types in Textual.
<handling_type>
is the handling type to use for the specified value type. The possible values are Redact
, Synthesis
, and Off
.
To specify a handling type to use for value types that are not specified in generator_config
, you use the generator_default
parameter. generator_default
can be either Redact
, Synthesis
, or Off
.
The following example redacts a string and indicates to synthesize organization values. In the response, the organization value is replaced with a realistic value instead of an ORGANIZATION
placeholder value.
You can also request synthesized values from a large language model (LLM).
When you use this process, Textual first identifies the sensitive values in the text. It then sends the value locations and redacted values to the LLM. For example, if Textual identifies a product name, it sends the location and the redacted value PRODUCT
to the LLM. Textual does not send the original values to the LLM.
The LLM then generates realistic synthesized values of the appropriate value types.
To send text to an LLM, use textual.llm_synthesis
:
The response provides the text with the synthesized replacement values, followed by the list of synthesized values. For each value, the list includes:
Where the value is located in the string
The type of value
The original value
A score to indicate confidence in the detection and synthesis
Here is an example of a request to send a text string to an LLM, and the response with the updated string and value list:
For details about all of the available Tonic Textual classes, go to the generated API documentation.
Instantiate the SDK client
Required for every call to the API.
Create and manage datasets
Redact and synthesize data in a set of files.
Redact and synthesize individual strings
Detect and transform values in specified text strings.
Redact and synthesize individual files
Work with files outside of a dataset.