The redaction response includes the redacted or synthesized content and details about the detected entity values.
Redact a plain text string
To send a plain text string for redaction, use textual.redact:
redaction_response = textual.redact("""<text of the string>""")redaction_response.describe()
For example:
redaction_response = textual.redact("""Contact Tonic AI with questions""")redaction_response.describe()Contact ORGANIZATION_EPfC7XZUZ with questions{"start":8,"end":16,"new_start":8,"new_end":30,"label":"ORGANIZATION","text":"Tonic AI","new_text":"[ORGANIZATION]","score":0.85,"language":"en"}
The redact call provides an option to record the request, to allow you to preview the results in the Textual application. For more information, go to Record and review redaction requests.
bulk_response = textual.redact_bulk([<List of strings])
For example:
bulk_response = textual.redact_bulk(["Tonic.ai was founded in 2018", "John Smith is a person"])bulk_response.describe()[ORGANIZATION_5Ve7OH] was founded in [DATE_TIME_DnuC1]{"start":0,"end":5,"new_start":0,"new_end":21,"label":"ORGANIZATION","text":"Tonic","score":0.9,"language":"en","new_text":"[ORGANIZATION]"}{"start":21,"end":25,"new_start":37,"new_end":54,"label":"DATE_TIME","text":"2018","score":0.9,"language":"en","new_text":"[DATE_TIME]"}[NAME_GIVEN_dySb5] [NAME_FAMILY_7w4Db3] is a person{"start":0,"end":4,"new_start":0,"new_end":18,"label":"NAME_GIVEN","text":"John","score":0.9,"language":"en","new_text":"[NAME_GIVEN]"}{"start":5,"end":10,"new_start":19,"new_end":39,"label":"NAME_FAMILY","text":"Smith","score":0.9,"language":"en","new_text":"[NAME_FAMILY]"}
Redact JSON content
To send a JSON string for redaction, use textual.redact_json. You can send the JSON content as a JSON string or a Python dictionary.
json_redaction = textual.redact_json(<JSON string or Python dictionary>)
redact_json ensures that only the values are redacted. It ignores the keys.
Basic JSON redaction example
Here is a basic example of a JSON redaction request:
d=dict()d['person']={'first':'John','last':'OReilly'}d['address']={'city':'Memphis','state':'TN','street':'847 Rocky Top','zip':1234}d['description']='John is a man that lives in Memphis. He is 37 years old and is married to Cynthia.'json_redaction = textual.redact_json(d)print(json.dumps(json.loads(json_redaction.redacted_text), indent=2))
It produces the following JSON output:
{"person":{"first":"[NAME_GIVEN]","last":"[NAME_FAMILY]"},"address":{"city":"[LOCATION_CITY]","state":"[LOCATION_STATE]","street":"[LOCATION_ADDRESS]","zip":"[LOCATION_ZIP]"},"description":"[NAME_GIVEN] is a man that lives in [LOCATION_CITY]. He is [DATE_TIME] and is married to [NAME_GIVEN]."}
Specifying entity types for specific JSON paths
When you redact a JSON string, you can optionally assign specific entity types to selected JSON paths.
To do this, you include the jsonpath_allow_lists parameter. Each entry consists of an entity type and a list of JSON paths for which to always use that entity type. Each JSON path must point to a simple string or numeric value.
redact_xml ensures that only the values are redacted. It ignores the XML markup.
For example:
xml_string ='''<?xml version="1.0" encoding="UTF-8"?> <!-- This XML document contains sample PII with namespaces and attributes --> <PersonInfo xmlns="http://www.example.com/default" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:contact="http://www.example.com/contact"> <!-- Personal Information with an attribute containing PII --> <Name preferred="true" contact:userID="john.doe123"> <FirstName>John</FirstName> <LastName>Doe</LastName>He was born in 1980.</Name> <contact:Details> <!-- Email stored in an attribute for demonstration --> <contact:Email address="john.doe@example.com"/> <contact:Phone type="mobile" number="555-6789"/> </contact:Details> <!-- SSN stored as an attribute --> <SSN value="987-65-4321" xsi:nil="false"/> <data>his name was John Doe</data> </PersonInfo>'''response = textual.redact_xml(xml_string)redacted_xml = response.redacted_text
Produces the following XML output:
<?xml version="1.0" encoding="UTF-8"?><!-- This XML document contains sample PII with namespaces and attributes -->\n<PersonInfo xmlns="http://www.example.com/default" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:contact="http://www.example.com/contact"><!-- Personal Information with an attribute containing PII --><Name preferred="true" contact:userID="[NAME_GIVEN]">[GENDER_IDENTIFIER] was born in [DOB].<FirstName>[NAME_GIVEN]</FirstName><LastName>[NAME_FAMILY]</LastName></Name><contact:Details><!-- Email stored in an attribute for demonstration --><contact:Email address="[EMAIL_ADDRESS]"></contact:Email><contact:Phone type="mobile" number="[PHONE_NUMBER]"></contact:Phone></contact:Details><!-- SSN stored as an attribute --><SSN value="[PHONE_NUMBER]" xsi:nil="false"></SSN><data>[GENDER_IDENTIFIER] name was [NAME_GIVEN] [NAME_FAMILY]</data></PersonInfo>
redact_html ensures that only the values are redacted. It ignores the HTML markup.
For example:
html_content ="""<!DOCTYPE html><html> <head> <title>John Doe</title> </head> <body> <h1>John Doe</h1> <p>John Doe is a person who lives in New York City.</p> <p>John Doe's phone number is 555-555-5555.</p> </body></html>"""# Run the redact_xml methodredacted_html = redact.redact_html(html_content, generator_config={"NAME_GIVEN": "Synthesis","NAME_FAMILY": "Synthesis" })print(redacted_html.redacted_text)
Produces the following HTML output:
<!DOCTYPE html><html><head><title>Scott Roley</title></head><body><h1>Scott Roley</h1><p>Scott Roley is a person who lives in [LOCATION_CITY].</p><p>Scott Roley's phone number is [PHONE_NUMBER].</p></body></html>
Using an LLM to generate synthesized values
You can also request synthesized values from a large language model (LLM).
When you use this process, Textual first identifies the sensitive values in the text. It then sends the value locations and redacted values to the LLM. For example, if Textual identifies a product name, it sends the location and the redacted value PRODUCT to the LLM. Textual does not send the original values to the LLM.
The LLM then generates realistic synthesized values of the appropriate value types.
raw_synthesis = textual.llm_synthesis("Text of the string")
For example:
raw_synthesis = textual.llm_synthesis("My name is John, and today I am demoing Textual, a software product created by Tonic")raw_synthesis.describe()My name is John,and on Monday afternoon I am demoing Widget Pro, a software product created by Initech Enterprises.{"start":11,"end":15,"new_start":11,"new_end":15,"label":"NAME_GIVEN","text":"John","new_text": null,"score":0.9,"language":"en"}{"start":21,"end":26,"new_start":21,"new_end":40,"label":"DATE_TIME","text":"today","new_text": null,"score":0.85,"language":"en"}{"start":40,"end":47,"new_start":54,"new_end":64,"label":"PRODUCT","text":"Textual","new_text": null,"score":0.85,"language":"en"}{"start":79,"end":84,"new_start":96,"new_end":115,"label":"ORGANIZATION","text":"Tonic","new_text": null,"score":0.85,"language":"en"}
Format of the redaction and synthesis response
The response provides the redacted or synthesized version of the string, and the list of detected entity values.
Contact ORGANIZATION_EPfC7XZUZ with questions{"start":8,"end":16,"new_start":8,"new_end":30,"label":"ORGANIZATION","text":"Tonic AI","new_text":"[ORGANIZATION]","score":0.85,"language":"en"}
For each redacted item, the response includes:
The location of the value in the original text (start and end)
The location of the value in the redacted version of the string (new_start and new_end)
The entity type (label)
The original value (text)
The replacement value (new_text). new_text is null in the following cases:
The entity type is ignored
The response is from llm_synthesis
A score to indicate confidence in the detection and redaction (score)
The detected language for the value (language)
For responses from textual.redact_json, the JSON path to the entity in the original document (json_path)
For responses from textual.redact_xml, the XPath to the entity in the original XML document (xml_path)