Redact and synthesize individual strings

Before you perform these tasks, remember to instantiate the SDK client.

You can use the Tonic Textual SDK to redact individual strings, including:

  • Plain text strings

  • JSON content

  • XML content

For a text string, you can also request synthesized values from a large language model (LLM).

The redaction request can include the handling configuration for entity types.

The redaction response includes the redacted or synthesized content and details about the detected entity values.

Sending a string for redaction

Redact a plain text string

To send a plain text string for redaction, use textual.redact:

redaction_response = textual.redact("""<text of the string>""")
redaction_response.describe()

For example:

redaction_response = textual.redact("""Contact Tonic AI with questions""")
redaction_response.describe()

Contact ORGANIZATION_EPfC7XZUZ with questions
    
{"start": 8, "end": 16, "new_start": 8, "new_end": 30, "label": "ORGANIZATION", "text": "Tonic AI", "new_text": "[ORGANIZATION_EPfC7XZUZ]", "score": 0.85, "language": "en"}

Redact JSON content

To send a JSON string for redaction, use textual.redact_json. You can send the JSON content as a JSON string or a Python dictionary.

json_redaction = textual.redact_json(<JSON string or Python dictionary>)

redact_json ensures that only the values are redacted. It ignores the keys.

For example:

d=dict()
d['person']={'first':'John','last':'OReilly'}
d['address']={'city': 'Memphis', 'state':'TN', 'street': '847 Rocky Top', 'zip':1234}
d['description'] = 'John is a man that lives in Memphis.  He is 37 years old and is married to Cynthia'

json_redaction = textual.redact_json(d)

print(json.dumps(json.loads(json_redaction.redacted_text), indent=2))

{
"person": {
    "first": "[NAME_GIVEN_WpFV4]",
    "last": "[NAME_FAMILY_orTxwj3I]"
},
"address": {
    "city": "[LOCATION_CITY_UtpIl2tL]",
    "state": "[LOCATION_STATE_n24]",
    "street": "[LOCATION_ADDRESS_KwZ3MdDLSrzNhwB]",
    "zip": "[LOCATION_ZIP_L42eP19]"
},
"description": "[NAME_GIVEN_WpFV4] is a man that lives in [LOCATION_CITY_UtpIl2tL].  He is [DATE_TIME_LLr6L3gpNcOcl3] and is married to [NAME_GIVEN_yWfthDa6]"
}

Redact XML content

To send an XML string for redaction, use textual.redact_xml.

redact_xml ensures that only the values are redacted. It ignores the XML markup.

For example:

xml_string = '''<?xml version="1.0" encoding="UTF-8"?>
    <!-- This XML document contains sample PII with namespaces and attributes -->
    <PersonInfo xmlns="http://www.example.com/default" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:contact="http://www.example.com/contact">
        <!-- Personal Information with an attribute containing PII -->
        <Name preferred="true" contact:userID="john.doe123">
            <FirstName>John</FirstName>
            <LastName>Doe</LastName>He was born in 1980.</Name>

        <contact:Details>
            <!-- Email stored in an attribute for demonstration -->
            <contact:Email address="john.doe@example.com"/>
            <contact:Phone type="mobile" number="555-6789"/>
        </contact:Details>

        <!-- SSN stored as an attribute -->
        <SSN value="987-65-4321" xsi:nil="false"/>
        <data>his name was John Doe</data>
    </PersonInfo>'''

response = textual.redact_xml(xml_string)

redacted_xml = response.redacted_text

Produces the following output:

<?xml version="1.0" encoding="UTF-8"?><!-- This XML document contains sample PII with namespaces and attributes -->\n<PersonInfo xmlns="http://www.example.com/default" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:contact="http://www.example.com/contact"><!-- Personal Information with an attribute containing PII --><Name preferred="true" contact:userID="[NAME_GIVEN_NUhdshJf3SkI0]">[GENDER_IDENTIFIER_gh1] was born in [DOB_nHfb2].<FirstName>[NAME_GIVEN_HI1h7]</FirstName><LastName>[NAME_FAMILY_bKk1]</LastName></Name><contact:Details><!-- Email stored in an attribute for demonstration --><contact:Email address="[EMAIL_ADDRESS_DSlxAYEPw0XkIiADi0WbpW1]"></contact:Email><contact:Phone type="mobile" number="[PHONE_NUMBER_5LWjT19Ee]"></contact:Phone></contact:Details><!-- SSN stored as an attribute --><SSN value="[PHONE_NUMBER_4B2QKKwghix90]" xsi:nil="false"></SSN><data>[GENDER_IDENTIFIER_XN92] name was [NAME_GIVEN_HI1h7] [NAME_FAMILY_bKk1]</data></PersonInfo>

Using an LLM to generate synthesized values

You can also request synthesized values from a large language model (LLM).

When you use this process, Textual first identifies the sensitive values in the text. It then sends the value locations and redacted values to the LLM. For example, if Textual identifies a product name, it sends the location and the redacted value PRODUCT to the LLM. Textual does not send the original values to the LLM.

The LLM then generates realistic synthesized values of the appropriate value types.

To send text to an LLM, use textual.llm_synthesis:

raw_synthesis = textual.llm_synthesis("Text of the string")

For example:

raw_synthesis = textual.llm_synthesis("My name is John, and today I am demoing Textual, a software product created by Tonic")
raw_synthesis.describe()

My name is John, and on Monday afternoon I am demoing Widget Pro, a software product created by Initech Enterprises.
{"start": 11, "end": 15, "new_start": 11, "new_end": 15, "label": "NAME_GIVEN", "text": "John", "new_text": null, "score": 0.9, "language": "en"}
{"start": 21, "end": 26, "new_start": 21, "new_end": 40, "label": "DATE_TIME", "text": "today", "new_text": null, "score": 0.85, "language": "en"}
{"start": 40, "end": 47, "new_start": 54, "new_end": 64, "label": "PRODUCT", "text": "Textual", "new_text": null, "score": 0.85, "language": "en"}
{"start": 79, "end": 84, "new_start": 96, "new_end": 115, "label": "ORGANIZATION", "text": "Tonic", "new_text": null, "score": 0.85, "language": "en"}

Format of the redaction and synthesis response

The response provides the redacted or synthesized version of the string, and the list of detected entity values.

Contact ORGANIZATION_EPfC7XZUZ with questions
    
{"start": 8, "end": 16, "new_start": 8, "new_end": 30, "label": "ORGANIZATION", "text": "Tonic AI", "new_text": "[ORGANIZATION_EPfC7XZUZ]", "score": 0.85, "language": "en"}

For each redacted item, the response includes:

  • The location of the value in the original text (start and end)

  • The location of the value in the redacted version of the string (new_start and new_end)

  • The entity type (label)

  • The original value (text)

  • The redacted or synthesized value (new_text). new_text is null in the following cases:

    • The entity type is ignored

    • The response is from llm_synthesis

  • A score to indicate confidence in the detection and redaction (score)

  • The detected language for the value (language)

  • For responses from textual.redact_json, the JSON path to the entity in the original document (json_path)

  • For responses from textual.redact_xml, the Xpath to the entity in the original XML document (xml_path)

Last updated