bulk_response = textual.redact_bulk([<List of strings])
For example:
bulk_response = textual.redact_bulk(["Tonic.ai was founded in 2018", "John Smith is a person"])bulk_response.describe()[ORGANIZATION_5Ve7OH] was founded in [DATE_TIME_DnuC1]{"start": 0, "end": 5, "new_start": 0, "new_end": 21, "label": "ORGANIZATION", "text": "Tonic", "score": 0.9, "language": "en", "new_text": "[ORGANIZATION_5Ve7OH]"}
{"start": 21, "end": 25, "new_start": 37, "new_end": 54, "label": "DATE_TIME", "text": "2018", "score": 0.9, "language": "en", "new_text": "[DATE_TIME_DnuC1]"}
[NAME_GIVEN_dySb5] [NAME_FAMILY_7w4Db3] is a person{"start": 0, "end": 4, "new_start": 0, "new_end": 18, "label": "NAME_GIVEN", "text": "John", "score": 0.9, "language": "en", "new_text": "[NAME_GIVEN_dySb5]"}
{"start": 5, "end": 10, "new_start": 19, "new_end": 39, "label": "NAME_FAMILY", "text": "Smith", "score": 0.9, "language": "en", "new_text": "[NAME_FAMILY_7w4Db3]"}
Redact JSON content
To send a JSON string for redaction, use textual.redact_json. You can send the JSON content as a JSON string or a Python dictionary.
json_redaction = textual.redact_json(<JSON string or Python dictionary>)
redact_json ensures that only the values are redacted. It ignores the keys.
Basic JSON redaction example
Here is a basic example of a JSON redaction request:
d=dict()d['person']={'first':'John','last':'OReilly'}d['address']={'city':'Memphis','state':'TN','street':'847 Rocky Top','zip':1234}d['description']='John is a man that lives in Memphis. He is 37 years old and is married to Cynthia'json_redaction = textual.redact_json(d)print(json.dumps(json.loads(json_redaction.redacted_text), indent=2))
It produces the following JSON output:
{"person":{"first":"[NAME_GIVEN_WpFV4]","last":"[NAME_FAMILY_orTxwj3I]"},"address":{"city":"[LOCATION_CITY_UtpIl2tL]","state":"[LOCATION_STATE_n24]","street":"[LOCATION_ADDRESS_KwZ3MdDLSrzNhwB]","zip":"[LOCATION_ZIP_L42eP19]"},"description": "[NAME_GIVEN_WpFV4] is a man that lives in [LOCATION_CITY_UtpIl2tL]. He is [DATE_TIME_LLr6L3gpNcOcl3] and is married to [NAME_GIVEN_yWfthDa6]"
}
Specifying entity types for specific JSON paths
When you redact a JSON string, you can optionally assign specific entity types to selected JSON paths.
To do this, you include the jsonpath_allow_lists parameter. Each entry consists of an entity type and a list of JSON paths for which to always use that entity type. Each JSON path must point to a simple string or numeric value.
redact_xml ensures that only the values are redacted. It ignores the XML markup.
For example:
xml_string ='''<?xml version="1.0" encoding="UTF-8"?> <!-- This XML document contains sample PII with namespaces and attributes --> <PersonInfo xmlns="http://www.example.com/default" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:contact="http://www.example.com/contact">
<!-- Personal Information with an attribute containing PII --> <Name preferred="true" contact:userID="john.doe123"> <FirstName>John</FirstName> <LastName>Doe</LastName>He was born in 1980.</Name> <contact:Details> <!-- Email stored in an attribute for demonstration --> <contact:Email address="john.doe@example.com"/> <contact:Phone type="mobile" number="555-6789"/> </contact:Details> <!-- SSN stored as an attribute --> <SSN value="987-65-4321" xsi:nil="false"/> <data>his name was John Doe</data> </PersonInfo>'''response = textual.redact_xml(xml_string)redacted_xml = response.redacted_text
Produces the following XML output:
<?xml version="1.0" encoding="UTF-8"?><!-- This XML document contains sample PII with namespaces and attributes -->\n<PersonInfo xmlns="http://www.example.com/default" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:contact="http://www.example.com/contact"><!-- Personal Information with an attribute containing PII --><Name preferred="true" contact:userID="[NAME_GIVEN_NUhdshJf3SkI0]">[GENDER_IDENTIFIER_gh1] was born in [DOB_nHfb2].<FirstName>[NAME_GIVEN_HI1h7]</FirstName><LastName>[NAME_FAMILY_bKk1]</LastName></Name><contact:Details><!-- Email stored in an attribute for demonstration --><contact:Email address="[EMAIL_ADDRESS_DSlxAYEPw0XkIiADi0WbpW1]"></contact:Email><contact:Phone type="mobile" number="[PHONE_NUMBER_5LWjT19Ee]"></contact:Phone></contact:Details><!-- SSN stored as an attribute --><SSN value="[PHONE_NUMBER_4B2QKKwghix90]" xsi:nil="false"></SSN><data>[GENDER_IDENTIFIER_XN92] name was [NAME_GIVEN_HI1h7] [NAME_FAMILY_bKk1]</data></PersonInfo>
redact_html ensures that only the values are redacted. It ignores the HTML markup.
For example:
html_content ="""<!DOCTYPE html><html> <head> <title>John Doe</title> </head> <body> <h1>John Doe</h1> <p>John Doe is a person who lives in New York City.</p> <p>John Doe's phone number is 555-555-5555.</p> </body></html>"""# Run the redact_xml methodredacted_html = redact.redact_html(html_content, generator_config={"NAME_GIVEN": "Synthesis","NAME_FAMILY": "Synthesis" })print(redacted_html.redacted_text)
Produces the following HTML output:
<!DOCTYPE html><html><head><title>Scott Roley</title></head><body><h1>Scott Roley</h1><p>Scott Roley is a person who lives in [LOCATION_CITY_HwTG541HnrMzfO7].</p><p>Scott Roley's phone number is [PHONE_NUMBER_apZd0xjh3Z3lf4].</p></body></html>
Using an LLM to generate synthesized values
You can also request synthesized values from a large language model (LLM).
When you use this process, Textual first identifies the sensitive values in the text. It then sends the value locations and redacted values to the LLM. For example, if Textual identifies a product name, it sends the location and the redacted value PRODUCT to the LLM. Textual does not send the original values to the LLM.
The LLM then generates realistic synthesized values of the appropriate value types.