# Redact individual strings

{% hint style="info" %}
**Required global permission:** Use the API to parse or redact a text string
{% endhint %}

Before you perform these tasks, remember to [instantiate the SDK client](https://docs.tonic.ai/textual/tonic-textual-api/textual-api-instantiate-sdk).

You can use the Tonic Textual SDK to redact individual strings, including:

* Plain text strings
* JSON content
* XML content

For a text string, you can also request synthesized values from a large language model (LLM).

The redaction request can include the [handling configuration for entity types](https://docs.tonic.ai/textual/tonic-textual-api/datasets-redaction/api-redaction-entity-type-handling).

The [redaction response](#api-redact-string-response-format) includes the redacted or synthesized content and details about the detected entity values.

## Redact a plain text string <a href="#textual-api-redact-string-plain-text" id="textual-api-redact-string-plain-text"></a>

To send a plain text string for redaction, use [`textual.redact`](https://tonic-textual-sdk.readthedocs-hosted.com/en/latest/redact/api.html#tonic_textual.redact_api.TextualNer.redact):

```python
redaction_response = textual.redact("""<text of the string>""")
redaction_response.describe()
```

For example:

<pre class="language-python" data-overflow="wrap"><code class="lang-python">redaction_response = textual.redact("""Contact Tonic AI with questions""")
redaction_response.describe()

<strong>Contact ORGANIZATION_EPfC7XZUZ with questions
</strong>    
{"start": 8, "end": 16, "new_start": 8, "new_end": 30, "label": "ORGANIZATION", "text": "Tonic AI", "new_text": "[ORGANIZATION]", "score": 0.85, "language": "en"}
</code></pre>

The `redact` call provides an option to record the request, to allow you to preview the results in the Textual application. For more information, go to [redaction-requeset-record-review](https://docs.tonic.ai/textual/tonic-textual-api/datasets-redaction/redaction-requeset-record-review "mention").

## Redact multiple plain text strings <a href="#sdk-bulk-redact" id="sdk-bulk-redact"></a>

To send multiple plain text strings for redaction, use [`textual.redact_bulk`](https://tonic-textual-sdk.readthedocs-hosted.com/en/latest/redact/api.html#tonic_textual.redact_api.TextualNer.redact_bulk):

{% code overflow="wrap" %}

```python
bulk_response = textual.redact_bulk([<List of strings])
```

{% endcode %}

For example:

{% code overflow="wrap" %}

```python
bulk_response = textual.redact_bulk(["Tonic.ai was founded in 2018", "John Smith is a person"])
bulk_response.describe()

[ORGANIZATION_5Ve7OH] was founded in [DATE_TIME_DnuC1]

{"start": 0, "end": 5, "new_start": 0, "new_end": 21, "label": "ORGANIZATION", "text": "Tonic", "score": 0.9, "language": "en", "new_text": "[ORGANIZATION]"}
{"start": 21, "end": 25, "new_start": 37, "new_end": 54, "label": "DATE_TIME", "text": "2018", "score": 0.9, "language": "en", "new_text": "[DATE_TIME]"}

[NAME_GIVEN_dySb5] [NAME_FAMILY_7w4Db3] is a person

{"start": 0, "end": 4, "new_start": 0, "new_end": 18, "label": "NAME_GIVEN", "text": "John", "score": 0.9, "language": "en", "new_text": "[NAME_GIVEN]"}
{"start": 5, "end": 10, "new_start": 19, "new_end": 39, "label": "NAME_FAMILY", "text": "Smith", "score": 0.9, "language": "en", "new_text": "[NAME_FAMILY]"}
```

{% endcode %}

## Redact JSON content <a href="#api-redact-string-json" id="api-redact-string-json"></a>

To send a JSON string for redaction, use [`textual.redact_json`](https://tonic-textual-sdk.readthedocs-hosted.com/en/latest/redact/api.html#tonic_textual.redact_api.TextualNer.redact_json). You can send the JSON content as a JSON string or a Python dictionary.

```python
json_redaction = textual.redact_json(<JSON string or Python dictionary>)
```

`redact_json` ensures that only the values are redacted. It ignores the keys.

### Basic JSON redaction example <a href="#sdk-redact-json-example" id="sdk-redact-json-example"></a>

Here is a basic example of a JSON redaction request:

{% code overflow="wrap" %}

```python
d=dict()
d['person']={'first':'John','last':'OReilly'}
d['address']={'city': 'Memphis', 'state':'TN', 'street': '847 Rocky Top', 'zip':1234}
d['description'] = 'John is a man that lives in Memphis.  He is 37 years old and is married to Cynthia.'

json_redaction = textual.redact_json(d)

print(json.dumps(json.loads(json_redaction.redacted_text), indent=2))
```

{% endcode %}

It produces the following JSON output:

{% code overflow="wrap" %}

```python
{
"person": {
    "first": "[NAME_GIVEN]",
    "last": "[NAME_FAMILY]"
},
"address": {
    "city": "[LOCATION_CITY]",
    "state": "[LOCATION_STATE]",
    "street": "[LOCATION_ADDRESS]",
    "zip": "[LOCATION_ZIP]"
},
"description": "[NAME_GIVEN] is a man that lives in [LOCATION_CITY].  He is [DATE_TIME] and is married to [NAME_GIVEN]."
}
```

{% endcode %}

### Specifying entity types for specific JSON paths <a href="#sdk-redact-json-path-allowlist" id="sdk-redact-json-path-allowlist"></a>

When you redact a JSON string, you can optionally assign specific entity types to selected JSON paths.

To do this, you include the `jsonpath_allow_lists` parameter. Each entry consists of an entity type and a list of JSON paths for which to always use that entity type. Each JSON path must point to a simple string or numeric value.

```python
jsonpath_allow_lists={'entity_type':['JSON Paths']}
```

The specified entity type overrides both the detected entity type and any added or excluded values.

In the following example, the value of the `key1` node is always treated as a telephone number:

{% code overflow="wrap" %}

```python
response = textual.redact_json('{"key1":"Ex123", "key2":"Johnson"}', jsonpath_allow_lists={'PHONE_NUMBER':['$.key1']})
```

{% endcode %}

It produces the following redacted output:

```python
{"key1":"[PHONE_NUMBER]","key2":"My name is [NAME_FAMILY]"}
```

## Redact XML content <a href="#api-redact-string-xml" id="api-redact-string-xml"></a>

To send an XML string for redaction, use [`textual.redact_xml`](https://tonic-textual-sdk.readthedocs-hosted.com/en/latest/redact/api.html#tonic_textual.redact_api.TextualNer.redact_xml).

`redact_xml` ensures that only the values are redacted. It ignores the XML markup.

For example:

{% code overflow="wrap" %}

```python
xml_string = '''<?xml version="1.0" encoding="UTF-8"?>
    <!-- This XML document contains sample PII with namespaces and attributes -->
    <PersonInfo xmlns="http://www.example.com/default" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:contact="http://www.example.com/contact">
        <!-- Personal Information with an attribute containing PII -->
        <Name preferred="true" contact:userID="john.doe123">
            <FirstName>John</FirstName>
            <LastName>Doe</LastName>He was born in 1980.</Name>

        <contact:Details>
            <!-- Email stored in an attribute for demonstration -->
            <contact:Email address="john.doe@example.com"/>
            <contact:Phone type="mobile" number="555-6789"/>
        </contact:Details>

        <!-- SSN stored as an attribute -->
        <SSN value="987-65-4321" xsi:nil="false"/>
        <data>his name was John Doe</data>
    </PersonInfo>'''

response = textual.redact_xml(xml_string)

redacted_xml = response.redacted_text
```

{% endcode %}

Produces the following XML output:

{% code overflow="wrap" %}

```python
<?xml version="1.0" encoding="UTF-8"?><!-- This XML document contains sample PII with namespaces and attributes -->\n<PersonInfo xmlns="http://www.example.com/default" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:contact="http://www.example.com/contact"><!-- Personal Information with an attribute containing PII --><Name preferred="true" contact:userID="[NAME_GIVEN]">[GENDER_IDENTIFIER] was born in [DOB].<FirstName>[NAME_GIVEN]</FirstName><LastName>[NAME_FAMILY]</LastName></Name><contact:Details><!-- Email stored in an attribute for demonstration --><contact:Email address="[EMAIL_ADDRESS]"></contact:Email><contact:Phone type="mobile" number="[PHONE_NUMBER]"></contact:Phone></contact:Details><!-- SSN stored as an attribute --><SSN value="[PHONE_NUMBER]" xsi:nil="false"></SSN><data>[GENDER_IDENTIFIER] name was [NAME_GIVEN] [NAME_FAMILY]</data></PersonInfo>
```

{% endcode %}

## Redact HTML content

To send an HTML string for redaction, use [`textual.redact_html`](https://tonic-textual-sdk.readthedocs-hosted.com/en/latest/redact/api.html#tonic_textual.redact_api.TextualNer.redact_html).

`redact_html` ensures that only the values are redacted. It ignores the HTML markup.

For example:

```python
html_content = """
<!DOCTYPE html>
<html>
    <head>
        <title>John Doe</title>
    </head>
    <body>
        <h1>John Doe</h1>
        <p>John Doe is a person who lives in New York City.</p>
        <p>John Doe's phone number is 555-555-5555.</p>
    </body>
</html>
"""

# Run the redact_xml method
redacted_html = redact.redact_html(html_content, generator_config={
            "NAME_GIVEN": "Synthesis",
            "NAME_FAMILY": "Synthesis"
        }) 

print(redacted_html.redacted_text)
```

Produces the following HTML output:

```python
<!DOCTYPE html>
<html>
    <head>
        <title>Scott Roley</title>
    </head>
    <body>
        <h1>Scott Roley</h1>
        <p>Scott Roley is a person who lives in [LOCATION_CITY].</p>
        <p>Scott Roley's phone number is [PHONE_NUMBER].</p>
    </body>
</html>
```

## Synthesis with Large Language Models

You can request synthesized values from a large language model (LLM) using two different approaches.

### ReplacementSynthesis

ReplacementSynthesis redacts sensitive values and uses the LLM to generate realistic replacements based on the surrounding context.

When you use this process, Textual first identifies the sensitive values in the text. It then sends the value locations and redacted values to the LLM. For example, if Textual identifies a product name, it sends the location and the redacted value `PRODUCT` to the LLM. Textual does not send the original values to the LLM.

The LLM then generates realistic synthesized values of the appropriate value types based on the context of the surrounding text.

**Example:**

{% code overflow="wrap" %}

```python
from textual import PiiState

sample_text = "My name is John, and today I am demoing Textual, a software product created by Tonic"

# Configure to synthesize organization entities
generator_config = {"ORGANIZATION": PiiState.ReplacementSynthesis}
generator_default = PiiState.Off

response = textual.redact(
    sample_text,
    generator_config=generator_config,
    generator_default=generator_default,
)
```

{% endcode %}

**Output:**

{% code overflow="wrap" %}

```
My name is John, and today I am demoing Textual, a software product created by Initech Enterprises.
```

{% endcode %}

### GroupingSynthesis

`GroupingSynthesis` groups related entities, generates new entity names, then uses the LLM to reproduce the original format of the value.

This approach is particularly useful when values have specific formats that need to be preserved. For example, if a name is spelled out using the phonetic alphabet (e.g., "B as in boy, O as in orange, B as in boy" for "Bob"), `GroupingSynthesis` will:

1. Identify the grouped entity ("Bob")
2. Generate a new entity name without using LLM ("Tom")
3. Use the LLM to reproduce the same format ("T as in toy, O as in orange, M as in mark")

**Example:**

{% code overflow="wrap" %}

```python
from textual import PiiState

sample_text = "The caller spelled their name: B as in boy, O as in orange, B as in boy"

# Configure to use grouping synthesis for names
generator_config = {"NAME_GIVEN": PiiState.GroupingSynthesis}
generator_default = PiiState.Off

response = textual.redact(
    sample_text,
    generator_config=generator_config,
    generator_default=generator_default,
)
```

{% endcode %}

**Output:**

```
The caller spelled their name: T as in toy, O as in orange, M as in mark
```

### Configuration

Use the `generator_config` parameter to specify which entity types should use synthesis and which synthesis method to apply. Use `generator_default` to set the default behavior for entity types not explicitly configured.

For more information about configuring entity type handling, see [api-redaction-entity-type-handling](https://docs.tonic.ai/textual/tonic-textual-api/datasets-redaction/api-redaction-entity-type-handling "mention").

***

**Note:** Before you can use either synthesis method, you must enable additional LLM processing. The additional processing sends the values and surrounding text to the LLM. For an overview of the LLM processing and how to enable it, see the documentation about configuring the Solar.LLM container in[#llm-processing-configure-model](https://docs.tonic.ai/textual/textual-playground#llm-processing-configure-model "mention")

## Format of the redaction and synthesis response <a href="#api-redact-string-response-format" id="api-redact-string-response-format"></a>

The response provides the redacted or synthesized version of the string, and the list of detected entity values.

{% code overflow="wrap" %}

```python
Contact ORGANIZATION_EPfC7XZUZ with questions
    
{"start": 8, "end": 16, "new_start": 8, "new_end": 30, "label": "ORGANIZATION", "text": "Tonic AI", "new_text": "[ORGANIZATION]", "score": 0.85, "language": "en"}
```

{% endcode %}

For each redacted item, the response includes:

* The location of the value in the original text (`start` and `end`)
* The location of the value in the redacted version of the string (`new_start` and `new_end`)
* The entity type (`label`)
* The original value (`text`)
* The replacement value (`new_text`). `new_text` is `null` in the following cases:
  * The entity type is ignored
* A score to indicate confidence in the detection and redaction (`score`)
* The detected language for the value (`language`)
* For responses from `textual.redact_json`, the JSON path to the entity in the original document (`json_path`)
* For responses from `textual.redact_xml`, the XPath to the entity in the original XML document (`xml_path`)
