LogoLogo
Release notesPython SDK docsDocs homeTextual CloudTonic.ai
  • Tonic Textual guide
  • Getting started with Textual
  • Previewing Textual detection and redaction
  • Entity types that Textual detects
    • Built-in entity types
    • Managing custom entity types
  • Language support in Textual
  • Datasets - Create redacted files
    • Datasets workflow for text redaction
    • Creating and managing datasets
    • Assigning tags to datasets
    • Adding and removing dataset files
    • Reviewing the sensitivity detection results
    • Configuring the redaction
      • Configuring added and excluded values for built-in entity types
      • Working with custom entity types
      • Selecting the handling option for entity types
      • Configuring synthesis options
      • Configuring handling of file components
    • Adding manual overrides to PDF files
      • Editing an individual PDF file
      • Creating templates to apply to PDF files
    • Sharing dataset access
    • Previewing the original and redacted data in a file
    • Downloading redacted data
  • Pipelines - Prepare LLM content
    • Pipelines workflow for LLM preparation
    • Viewing pipeline lists and details
    • Assigning tags to pipelines
    • Setting up pipelines
      • Creating and editing pipelines
      • Supported file types for pipelines
      • Creating custom entity types from a pipeline
      • Configuring file synthesis for a pipeline
      • Configuring an Amazon S3 pipeline
      • Configuring a Databricks pipeline
      • Configuring an Azure pipeline
      • Configuring a Sharepoint pipeline
      • Selecting files for an uploaded file pipeline
    • Starting a pipeline run
    • Sharing pipeline access
    • Viewing pipeline results
      • Viewing pipeline files, runs, and statistics
      • Displaying details for a processed file
      • Structure of the pipeline output file JSON
    • Downloading and using pipeline output
  • Textual Python SDK
    • Installing the Textual SDK
    • Creating and revoking Textual API keys
    • Obtaining JWT tokens for authentication
    • Instantiating the SDK client
    • Datasets and redaction
      • Create and manage datasets
      • Redact individual strings
      • Redact individual files
      • Transcribe and redact an audio file
      • Configure entity type handling for redaction
      • Record and review redaction requests
    • Pipelines and parsing
      • Create and manage pipelines
      • Parse individual files
  • Textual REST API
    • About the Textual REST API
    • REST API authentication
    • Redaction
      • Redact text strings
  • Datasets
    • Manage datasets
    • Manage dataset files
  • Snowflake Native App and SPCS
    • About the Snowflake Native App
    • Setting up the app
    • Using the app
    • Using Textual with Snowpark Container Services directly
  • Install and administer Textual
    • Textual architecture
    • Setting up and managing a Textual Cloud pay-as-you-go subscription
    • Deploying a self-hosted instance
      • System requirements
      • Deploying with Docker Compose
      • Deploying on Kubernetes with Helm
    • Configuring Textual
      • How to configure Textual environment variables
      • Configuring the number of textual-ml workers
      • Configuring the number of jobs to run concurrently
      • Configuring the format of Textual logs
      • Setting a custom certificate
      • Configuring endpoint URLs for calls to AWS
      • Enabling PDF and image processing
      • Setting the S3 bucket for file uploads and redactions
      • Required IAM role permissions for Amazon S3
      • Configuring model preferences
    • Viewing model specifications
    • Managing user access to Textual
      • Textual organizations
      • Creating a new account in an existing organization
      • Single sign-on (SSO)
        • Viewing the list of SSO groups in Textual
        • Azure
        • GitHub
        • Google
        • Keycloak
        • Okta
      • Managing Textual users
      • Managing permissions
        • About permissions and permission sets
        • Built-in permission sets and available permissions
        • Viewing the lists of permission sets
        • Configuring custom permission sets
        • Configuring access to global permission sets
        • Setting initial access to all global permissions
    • Textual monitoring
      • Downloading a usage report
      • Tracking user access to Textual
Powered by GitBook
On this page
  • Redact a plain text string
  • Redact multiple plain text strings
  • Redact JSON content
  • Basic JSON redaction example
  • Specifying entity types for specific JSON paths
  • Redact XML content
  • Redact HTML content
  • Using an LLM to generate synthesized values
  • Format of the redaction and synthesis response

Was this helpful?

Export as PDF
  1. Textual Python SDK
  2. Datasets and redaction

Redact individual strings

Last updated 1 month ago

Was this helpful?

Required global permission: Use the API to parse or redact a text string

Before you perform these tasks, remember to .

You can use the Tonic Textual SDK to redact individual strings, including:

  • Plain text strings

  • JSON content

  • XML content

For a text string, you can also request synthesized values from a large language model (LLM).

The redaction request can include the .

The includes the redacted or synthesized content and details about the detected entity values.

Redact a plain text string

To send a plain text string for redaction, use :

redaction_response = textual.redact("""<text of the string>""")
redaction_response.describe()

For example:

redaction_response = textual.redact("""Contact Tonic AI with questions""")
redaction_response.describe()

Contact ORGANIZATION_EPfC7XZUZ with questions
    
{"start": 8, "end": 16, "new_start": 8, "new_end": 30, "label": "ORGANIZATION", "text": "Tonic AI", "new_text": "[ORGANIZATION]", "score": 0.85, "language": "en"}

The redact call provides an option to record the request, to allow you to preview the results in the Textual application. For more information, go to Record and review redaction requests.

Redact multiple plain text strings

bulk_response = textual.redact_bulk([<List of strings])

For example:

bulk_response = textual.redact_bulk(["Tonic.ai was founded in 2018", "John Smith is a person"])
bulk_response.describe()

[ORGANIZATION_5Ve7OH] was founded in [DATE_TIME_DnuC1]

{"start": 0, "end": 5, "new_start": 0, "new_end": 21, "label": "ORGANIZATION", "text": "Tonic", "score": 0.9, "language": "en", "new_text": "[ORGANIZATION]"}
{"start": 21, "end": 25, "new_start": 37, "new_end": 54, "label": "DATE_TIME", "text": "2018", "score": 0.9, "language": "en", "new_text": "[DATE_TIME]"}

[NAME_GIVEN_dySb5] [NAME_FAMILY_7w4Db3] is a person

{"start": 0, "end": 4, "new_start": 0, "new_end": 18, "label": "NAME_GIVEN", "text": "John", "score": 0.9, "language": "en", "new_text": "[NAME_GIVEN]"}
{"start": 5, "end": 10, "new_start": 19, "new_end": 39, "label": "NAME_FAMILY", "text": "Smith", "score": 0.9, "language": "en", "new_text": "[NAME_FAMILY]"}

Redact JSON content

json_redaction = textual.redact_json(<JSON string or Python dictionary>)

redact_json ensures that only the values are redacted. It ignores the keys.

Basic JSON redaction example

Here is a basic example of a JSON redaction request:

d=dict()
d['person']={'first':'John','last':'OReilly'}
d['address']={'city': 'Memphis', 'state':'TN', 'street': '847 Rocky Top', 'zip':1234}
d['description'] = 'John is a man that lives in Memphis.  He is 37 years old and is married to Cynthia.'

json_redaction = textual.redact_json(d)

print(json.dumps(json.loads(json_redaction.redacted_text), indent=2))

It produces the following JSON output:

{
"person": {
    "first": "[NAME_GIVEN]",
    "last": "[NAME_FAMILY]"
},
"address": {
    "city": "[LOCATION_CITY]",
    "state": "[LOCATION_STATE]",
    "street": "[LOCATION_ADDRESS]",
    "zip": "[LOCATION_ZIP]"
},
"description": "[NAME_GIVEN] is a man that lives in [LOCATION_CITY].  He is [DATE_TIME] and is married to [NAME_GIVEN]."
}

Specifying entity types for specific JSON paths

When you redact a JSON string, you can optionally assign specific entity types to selected JSON paths.

To do this, you include the jsonpath_allow_lists parameter. Each entry consists of an entity type and a list of JSON paths for which to always use that entity type. Each JSON path must point to a simple string or numeric value.

jsonpath_allow_lists={'entity_type':['JSON Paths']}

The specified entity type overrides both the detected entity type and any added or excluded values.

In the following example, the value of the key1 node is always treated as a telephone number:

response = textual.redact_json('{"key1":"Ex123", "key2":"Johnson"}', jsonpath_allow_lists={'PHONE_NUMBER':['$.key1']})

It produces the following redacted output:

{"key1":"[PHONE_NUMBER]","key2":"My name is [NAME_FAMILY]"}

Redact XML content

redact_xml ensures that only the values are redacted. It ignores the XML markup.

For example:

xml_string = '''<?xml version="1.0" encoding="UTF-8"?>
    <!-- This XML document contains sample PII with namespaces and attributes -->
    <PersonInfo xmlns="http://www.example.com/default" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:contact="http://www.example.com/contact">
        <!-- Personal Information with an attribute containing PII -->
        <Name preferred="true" contact:userID="john.doe123">
            <FirstName>John</FirstName>
            <LastName>Doe</LastName>He was born in 1980.</Name>

        <contact:Details>
            <!-- Email stored in an attribute for demonstration -->
            <contact:Email address="john.doe@example.com"/>
            <contact:Phone type="mobile" number="555-6789"/>
        </contact:Details>

        <!-- SSN stored as an attribute -->
        <SSN value="987-65-4321" xsi:nil="false"/>
        <data>his name was John Doe</data>
    </PersonInfo>'''

response = textual.redact_xml(xml_string)

redacted_xml = response.redacted_text

Produces the following XML output:

<?xml version="1.0" encoding="UTF-8"?><!-- This XML document contains sample PII with namespaces and attributes -->\n<PersonInfo xmlns="http://www.example.com/default" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:contact="http://www.example.com/contact"><!-- Personal Information with an attribute containing PII --><Name preferred="true" contact:userID="[NAME_GIVEN]">[GENDER_IDENTIFIER] was born in [DOB].<FirstName>[NAME_GIVEN]</FirstName><LastName>[NAME_FAMILY]</LastName></Name><contact:Details><!-- Email stored in an attribute for demonstration --><contact:Email address="[EMAIL_ADDRESS]"></contact:Email><contact:Phone type="mobile" number="[PHONE_NUMBER]"></contact:Phone></contact:Details><!-- SSN stored as an attribute --><SSN value="[PHONE_NUMBER]" xsi:nil="false"></SSN><data>[GENDER_IDENTIFIER] name was [NAME_GIVEN] [NAME_FAMILY]</data></PersonInfo>

Redact HTML content

redact_html ensures that only the values are redacted. It ignores the HTML markup.

For example:

html_content = """
<!DOCTYPE html>
<html>
    <head>
        <title>John Doe</title>
    </head>
    <body>
        <h1>John Doe</h1>
        <p>John Doe is a person who lives in New York City.</p>
        <p>John Doe's phone number is 555-555-5555.</p>
    </body>
</html>
"""

# Run the redact_xml method
redacted_html = redact.redact_html(html_content, generator_config={
            "NAME_GIVEN": "Synthesis",
            "NAME_FAMILY": "Synthesis"
        }) 

print(redacted_html.redacted_text)

Produces the following HTML output:

<!DOCTYPE html>
<html>
    <head>
        <title>Scott Roley</title>
    </head>
    <body>
        <h1>Scott Roley</h1>
        <p>Scott Roley is a person who lives in [LOCATION_CITY].</p>
        <p>Scott Roley's phone number is [PHONE_NUMBER].</p>
    </body>
</html>

Using an LLM to generate synthesized values

You can also request synthesized values from a large language model (LLM).

When you use this process, Textual first identifies the sensitive values in the text. It then sends the value locations and redacted values to the LLM. For example, if Textual identifies a product name, it sends the location and the redacted value PRODUCT to the LLM. Textual does not send the original values to the LLM.

The LLM then generates realistic synthesized values of the appropriate value types.

raw_synthesis = textual.llm_synthesis("Text of the string")

For example:

raw_synthesis = textual.llm_synthesis("My name is John, and today I am demoing Textual, a software product created by Tonic")
raw_synthesis.describe()

My name is John, and on Monday afternoon I am demoing Widget Pro, a software product created by Initech Enterprises.
{"start": 11, "end": 15, "new_start": 11, "new_end": 15, "label": "NAME_GIVEN", "text": "John", "new_text": null, "score": 0.9, "language": "en"}
{"start": 21, "end": 26, "new_start": 21, "new_end": 40, "label": "DATE_TIME", "text": "today", "new_text": null, "score": 0.85, "language": "en"}
{"start": 40, "end": 47, "new_start": 54, "new_end": 64, "label": "PRODUCT", "text": "Textual", "new_text": null, "score": 0.85, "language": "en"}
{"start": 79, "end": 84, "new_start": 96, "new_end": 115, "label": "ORGANIZATION", "text": "Tonic", "new_text": null, "score": 0.85, "language": "en"}

Format of the redaction and synthesis response

The response provides the redacted or synthesized version of the string, and the list of detected entity values.

Contact ORGANIZATION_EPfC7XZUZ with questions
    
{"start": 8, "end": 16, "new_start": 8, "new_end": 30, "label": "ORGANIZATION", "text": "Tonic AI", "new_text": "[ORGANIZATION]", "score": 0.85, "language": "en"}

For each redacted item, the response includes:

  • The location of the value in the original text (start and end)

  • The location of the value in the redacted version of the string (new_start and new_end)

  • The entity type (label)

  • The original value (text)

  • The replacement value (new_text). new_text is null in the following cases:

    • The entity type is ignored

    • The response is from llm_synthesis

  • A score to indicate confidence in the detection and redaction (score)

  • The detected language for the value (language)

  • For responses from textual.redact_json, the JSON path to the entity in the original document (json_path)

  • For responses from textual.redact_xml, the XPath to the entity in the original XML document (xml_path)

To send multiple plain text strings for redaction, use :

To send a JSON string for redaction, use . You can send the JSON content as a JSON string or a Python dictionary.

To send an XML string for redaction, use .

To send an HTML string for redaction, use .

To send text to an LLM, use :

Before you can use this endpoint, you must enable additional LLM processing. The additional processing sends the values and surrounding text to the LLM. For an overview of the LLM processing and how to enable it, go to .

textual.redact_bulk
textual.redact_json
textual.redact_xml
textual.redact_html
textual.llm_synthesis
instantiate the SDK client
handling configuration for entity types
textual.redact
redaction response
Enabling and using additional LLM processing of detected entities