1 of 5

Using the Textual API

For details about all of the available Tonic Textual classes, go to the generated API documentation.

Instantiate the SDK client

Whenever you call the Textual API, you first instantiate the SDK client.

If the API key is configured as the value of TONIC_TEXTUAL_API_KEY, then you do not need to provide the API key when you instantiate the SDK client.

If the API key is not configured as the value of TONIC_TEXTUAL_API_KEY, then you must include the API key in the request.

Create and manage datasets

Create and populate a dataset

To create a new dataset and then upload a file to it, use textual.create_dataset. To add files to the dataset, use dataset.upload_then_add_file.

dataset = textual.create_dataset('<dataset name>')
dataset.upload_then_add_file("file path")

For example:

dataset = textual.create_dataset('mydataset')
dataset.upload_then_add_file("patient_notes.txt")

Textual creates the dataset, scans the uploaded file, and redacts the detected values.

Get the current status of dataset files

To get the current status of the files in the current dataset, use dataset.describe:

dataset.describe()

The response includes:

The name and identifier of the dataset
The number of files in the dataset
The number of files that are waiting to be processed (scanned and redacted)
The number of files that had errors during processing

For example:

    Dataset: example [879d4c5d-792a-c009-a9a0-60d69be20206]
    Number of Files: 1
    Files that are waiting for processing: 
    Files that encountered errors while processing: 
    Number of Rows: 0
    Number of rows fetched: 0

Get redacted content for a dataset

To get the redacted content in JSON format for a dataset, use textual.get_dataset:

dataset = textual.get_dataset('<dataset name>')
dataset.fetch_all_json()

For example:

dataset = textual.get_dataset('mydataset')
dataset.fetch_all_json()

The response looks something like:

'[["PERSON_Rz8NtJTPONTKgcB95i Portrait by PERSON_blatU6mAWFCQoSa5E, DATE_TIME_Rcl58 ...]'

Redact and synthesize individual strings

Redact a single text string

To redact a specific text string and view the results, use textual.redact:

redaction_response = textual.redact("""<text of the string>""")
redaction_response.describe()

The response provides the redacted version of the string, and the list of redacted values. For each redacted item, the response includes:

The location of the value
The type of sensitive value
The original value
A score to indicate confidence in the detection and redaction

For example:

redaction_response = textual.redact("""Contact Tonic AI with questions""")
redaction_response.describe()

Contact ORGANIZATION_EPfC7XZUZ with questions
    
{"start": 8, "end": 16, "label": "ORGANIZATION", "text": "Tonic AI", "score": 0.85}

Redact a text string and specify the handling type

By default, Textual redacts detected sensitive values. The redacted value is <value type>_<generated identifier>. For example, ORGANIZATION_EPfC7XZUZ.

For each value type, you can instead choose to either synthesize or ignore the value.

When you synthesize a value, Textual replaces the value with a different realistic value.
When you ignore a value, Textual passes through the original value.

To specify the handling type for a value type, you use the generator_config parameter.

generator_config={'<value_type>':'<handling_type>'}

Where:

<value_type> is the identifier of the type of value. For example, ORGANIZATION. For the list of built-in value types that Textual scans for, go to About entity types in Textual.
<handling_type> is the handling type to use for the specified value type. The possible values are Redact, Synthesis, and Off.

To specify a handling type to use for value types that are not specified in generator_config, you use the generator_default parameter. generator_default can be either Redact, Synthesis, or Off.

The following example redacts a string and indicates to synthesize organization values. In the response, the organization value is replaced with a realistic value instead of an ORGANIZATION placeholder value.

redaction_response = textual.redact(
    """Contact Tonic AI with questions""", 
    generator_config={'ORGANIZATION':'Synthesis'}
)

redaction_response.describe()

    Contact Live Torch Works with questions
    
    {"start": 8, "end": 16, "label": "ORGANIZATION", "text": "Tonic AI", "score": 0.85}

Using an LLM to generate synthesized values

You can also request synthesized values from a large language model (LLM).

When you use this process, Textual first identifies the sensitive values in the text. It then sends the value locations and redacted values to the LLM. For example, if Textual identifies a product name, it sends the location and the redacted value PRODUCT to the LLM. Textual does not send the original values to the LLM.

The LLM then generates realistic synthesized values of the appropriate value types.

To send text to an LLM, use textual.llm_synthesis:

raw_synthesis = textual.llm_synthesis("Text of the string")

The response provides the text with the synthesized replacement values, followed by the list of synthesized values. For each value, the list includes:

Where the value is located in the string
The type of value
The original value
A score to indicate confidence in the detection and synthesis

Here is an example of a request to send a text string to an LLM, and the response with the updated string and value list:

raw_synthesis = textual.llm_synthesis("My name is John, and today I am demoing Textual, a software product created by Tonic")
raw_synthesis.describe()

My name is John, and on Monday afternoon I am demoing Widget Pro, a software product created by Initech Enterprises.
{"start": 11, "end": 15, "label": "NAME_GIVEN", "text": "John", "score": 0.9}
{"start": 21, "end": 26, "label": "DATE_TIME", "text": "today", "score": 0.85}
{"start": 40, "end": 47, "label": "PRODUCT", "text": "Textual", "score": 0.85}
{"start": 79, "end": 84, "label": "ORGANIZATION", "text": "Tonic", "score": 0.85}

Redact and synthesize individual files

Sending a file to Textual

To send an individual .txt or .csv file to Textual, you use textual.start_file_redaction.

You first open the file so that Textual can read it, then make then call for Textual to read the file.

with open("<path to the file>", "r") as f:
    j = textual.start_file_redaction(f,"<file name>")

The response includes:

The file name
The identifier of the job that processed the file. You use this identifier to retrieve a transformed version of the file.

Getting the file with redacted or synthesized values

After you use textual.start_file_redaction to send the file to Textual, you use tonic.download_redacted_file to retrieve a transformed version of the file.

To identify the file, you use the job identifier that you received from textual.start_file_redaction. You can also specify whether to redact, synthesize, or ignore specific entity types. By default, all of the values are redacted.

Before you make the call to download the file, you specify the path to download the file content to.

with open("<path to output location>", "wb") as fo:
    fo.write(textual.download_redacted_file(<job identifier>)

Create and manage datasets

Create and populate a dataset

To create a new dataset and then upload a file to it, use textual.create_dataset. To add files to the dataset, use dataset.upload_then_add_file.

dataset = textual.create_dataset('<dataset name>')
dataset.upload_then_add_file("file path")

For example:

dataset = textual.create_dataset('mydataset')
dataset.upload_then_add_file("patient_notes.txt")

Textual creates the dataset, scans the uploaded file, and redacts the detected values.

Get the current status of dataset files

To get the current status of the files in the current dataset, use dataset.describe:

dataset.describe()

The response includes:

The name and identifier of the dataset
The number of files in the dataset
The number of files that are waiting to be processed (scanned and redacted)
The number of files that had errors during processing

For example:

    Dataset: example [879d4c5d-792a-c009-a9a0-60d69be20206]
    Number of Files: 1
    Files that are waiting for processing: 
    Files that encountered errors while processing: 
    Number of Rows: 0
    Number of rows fetched: 0

Get redacted content for a dataset

To get the redacted content in JSON format for a dataset, use textual.get_dataset:

dataset = textual.get_dataset('<dataset name>')
dataset.fetch_all_json()

For example:

dataset = textual.get_dataset('mydataset')
dataset.fetch_all_json()

The response looks something like:

'[["PERSON_Rz8NtJTPONTKgcB95i Portrait by PERSON_blatU6mAWFCQoSa5E, DATE_TIME_Rcl58 ...]'

Redact and synthesize individual strings

Redact a single text string

To redact a specific text string and view the results, use textual.redact:

redaction_response = textual.redact("""<text of the string>""")
redaction_response.describe()

The response provides the redacted version of the string, and the list of redacted values. For each redacted item, the response includes:

The location of the value
The type of sensitive value
The original value
A score to indicate confidence in the detection and redaction

For example:

redaction_response = textual.redact("""Contact Tonic AI with questions""")
redaction_response.describe()

Contact ORGANIZATION_EPfC7XZUZ with questions
    
{"start": 8, "end": 16, "label": "ORGANIZATION", "text": "Tonic AI", "score": 0.85}

Redact a text string and specify the handling type

By default, Textual redacts detected sensitive values. The redacted value is <value type>_<generated identifier>. For example, ORGANIZATION_EPfC7XZUZ.

For each value type, you can instead choose to either synthesize or ignore the value.

When you synthesize a value, Textual replaces the value with a different realistic value.
When you ignore a value, Textual passes through the original value.

To specify the handling type for a value type, you use the generator_config parameter.

generator_config={'<value_type>':'<handling_type>'}

Where:

<value_type> is the identifier of the type of value. For example, ORGANIZATION. For the list of built-in value types that Textual scans for, go to About entity types in Textual.
<handling_type> is the handling type to use for the specified value type. The possible values are Redact, Synthesis, and Off.

redaction_response = textual.redact(
    """Contact Tonic AI with questions""", 
    generator_config={'ORGANIZATION':'Synthesis'}
)

redaction_response.describe()

    Contact Live Torch Works with questions
    
    {"start": 8, "end": 16, "label": "ORGANIZATION", "text": "Tonic AI", "score": 0.85}

Using an LLM to generate synthesized values

You can also request synthesized values from a large language model (LLM).

The LLM then generates realistic synthesized values of the appropriate value types.

To send text to an LLM, use textual.llm_synthesis:

raw_synthesis = textual.llm_synthesis("Text of the string")

The response provides the text with the synthesized replacement values, followed by the list of synthesized values. For each value, the list includes:

Where the value is located in the string
The type of value
The original value
A score to indicate confidence in the detection and synthesis

Here is an example of a request to send a text string to an LLM, and the response with the updated string and value list:

raw_synthesis = textual.llm_synthesis("My name is John, and today I am demoing Textual, a software product created by Tonic")
raw_synthesis.describe()

My name is John, and on Monday afternoon I am demoing Widget Pro, a software product created by Initech Enterprises.
{"start": 11, "end": 15, "label": "NAME_GIVEN", "text": "John", "score": 0.9}
{"start": 21, "end": 26, "label": "DATE_TIME", "text": "today", "score": 0.85}
{"start": 40, "end": 47, "label": "PRODUCT", "text": "Textual", "score": 0.85}
{"start": 79, "end": 84, "label": "ORGANIZATION", "text": "Tonic", "score": 0.85}