# Textual LangChain integration

The [Textual LangChain integration](https://github.com/TonicAI/langchain-textual) provides Textual tools that you can use to detect and de-identify sensitive data in text, JSON, HTML, and files.

You can replace entity values with realistic generated values, or with tokenized placeholders. You can also extract the list of detected entities.

You can drop the Textual tools into any LangChain chain or agent as standard tools.

## Installing the integration

```
pip install langchain-textual
```

## Providing a Textual API key

To use the Textual tools, you must provide a [Textual API key](https://docs.tonic.ai/textual/tonic-textual-api/textual-api-keys).

To set the API key as an environment variable value:

```
export TONIC_TEXTUAL_API_KEY="your-api-key"
```

To provide the API key when you call a tool:

```
tool = TonicTextualRedactText(tonic_textual_api_key="your-api-key")
```

## Calling a tool

To call a tool on Textual Cloud:

```
from langchain_textual import <tool name>

tool = <tool name>(<configuration parameters>)
tool.invoke(<tool target>)
```

To call a tool on a self-hosted instance of Textual, the configuration parameters must include your Textual instance URL:

```
tool = <tool name>(tonic_textual_base_url="https://textual.your-company.com")
```

## Available tools

<table><thead><tr><th valign="top">Tool</th><th width="177.4140625" valign="top">Input</th><th valign="top">Use to</th></tr></thead><tbody><tr><td valign="top"><code>TonicTextualRedactText</code></td><td valign="top">Plain text string</td><td valign="top">Synthesize or tokenize entities in raw text or the content of a <code>.txt</code> file.</td></tr><tr><td valign="top"><code>TonicTextualRedactJson</code></td><td valign="top">JSON string</td><td valign="top">Synthesize or tokenize entities in raw JSON or the content of a <code>.json</code> file.</td></tr><tr><td valign="top"><code>TonicTextualRedactHtml</code></td><td valign="top">HTML string</td><td valign="top">Synthesize or tokenize entities in raw HTML, or the content of an <code>.html</code> or <code>.htm</code> file.</td></tr><tr><td valign="top"><code>TonicTextualRedactFile</code></td><td valign="top">File path</td><td valign="top">Synthesize or tokenize entities in PDF, image (JPG, PNG), CSV, or  TSV files.<br><br>For .txt, .json, .htm, or .html files, you read the file content, then pass the content to the text, JSON, or HTML tool.</td></tr><tr><td valign="top"><code>TonicTextualExtractEntities</code></td><td valign="top">Plain text string</td><td valign="top">Return a list of detected entities.<br><br>For each entity, identify the type, value, location, and detection confidence.</td></tr><tr><td valign="top"><code>TonicTextualPiiTypes</code></td><td valign="top">None</td><td valign="top">Lists the supported entity types.<br><br>Provides the type names to use in the configuration for the redaction tools.</td></tr></tbody></table>

### Text redaction

```
from langchain_textual import TonicTextualRedactText

tool = TonicTextualRedactText()
tool.invoke("My name is John Smith and my email is john@example.com.")
# "My name is [NAME_GIVEN_xxxx] [NAME_FAMILY_xxxx] and my email is [EMAIL_ADDRESS_xxxx]."
```

### JSON redaction

```
from langchain_textual import TonicTextualRedactJson

tool = TonicTextualRedactJson()
tool.invoke('{"name": "John Smith", "email": "john@example.com"}')
# '{"name": "[NAME_GIVEN_xxxx] [NAME_FAMILY_xxxx]", "email": "[EMAIL_ADDRESS_xxxx]"}'
```

### HTML redaction

```
from langchain_textual import TonicTextualRedactHtml

tool = TonicTextualRedactHtml()
tool.invoke("<p>Contact John Smith at john@example.com</p>")
# "<p>Contact [NAME_GIVEN_xxxx] [NAME_FAMILY_xxxx] at [EMAIL_ADDRESS_xxxx]</p>"
```

### File redaction (PDF, image, CSV, TSV)

```
from langchain_textual import TonicTextualRedactFile

tool = TonicTextualRedactFile()
tool.invoke({"file_path": "/path/to/scan.pdf"})
# "/path/to/scan_redacted.pdf"

tool.invoke({"file_path": "/path/to/photo.jpg", "output_path": "/tmp/redacted.jpg"})
# "/tmp/redacted.jpg"
```

For `.txt`, `.json`, and `.html`/`.htm` files, you do not use the file redaction tool. Instead, you read the file content, then pass the content to the text, JSON, or HTML redaction tool.

### Get the list of detected entities

```
from langchain_textual import TonicTextualExtractEntities

tool = TonicTextualExtractEntities()
tool.invoke("My name is John Smith and my email is john@example.com.")
# '[{"label": "NAME_GIVEN", "text": "John", "start": 11, "end": 15, "score": 0.9}, ...]'
```

Returns a JSON array of detected entities, each with the following fields:

* &#x20;`label` - The entity type for the detected entity.
* &#x20;`text` - The text of the detected entity.
* &#x20;`start` - The start location of the entity.
* &#x20;`end` - The end location of the entity.
* &#x20;`score` - The confidence score. Indicates how confident Textual is in its detection.

## Tool configuration options

All of the redaction tools provide the same configuration options to determine how to de-identify entities of specific types.

### Getting the list of available entity types

When you configure entity type handling, you must provide the entity type names.

To get a list of all of the supported entity type names, use  `TonicTextualPiiTypes`:

```
from langchain_textual import TonicTextualPiiTypes

TonicTextualPiiTypes().invoke("")
# "NUMERIC_VALUE, LANGUAGE, MONEY, ..., EMAIL_ADDRESS, NAME_GIVEN, NAME_FAMILY, ..."
```

### Available handling options

The available entity type handling options are:

* `Redaction` - This is the default, unless you specify otherwise. Indicates to replace the entity value with the entity type name followed by a unique identifier for each unique value. For example, replace `John` with `NAME_GIVEN_1234`.
* `Synthesis` - Indicates to replace the entity value with a realistic generated value. For example, replace `John` with `Michael`.
* `Off` - Indicates to not replace the entity value at all, and keep it as is in the output.

### Specifying the default handling option

To specify the default handling option to use for all entity types, use the `generator_default` parameter.

In the following example, the default handling option is set to `Synthesis`. All entities are replaced with realistic generated values.

```
tool = TonicTextualRedactText(generator_default="Synthesis")
tool.invoke("Contact Jane Doe at jane.doe@example.com.")
# "Contact Maria Chen at maria.chen@gmail.com."
```

### Providing handling options for specific entity types

To provide handling options for specific entity types, use the  `generator_config` parameter.

Within `generator_config`, for each entity type:

{% code overflow="wrap" %}

```
"<type name>: "<handling option>"
```

{% endcode %}

In the following example, the default handling option is `Off`. First and last names are replaced with realistic generated values, and email addresses are redacted:

```
tool = TonicTextualRedactText(
    generator_default="Off",
    generator_config={
        "NAME_GIVEN": "Synthesis",
        "NAME_FAMILY": "Synthesis",
        "EMAIL_ADDRESS": "Redaction",
    },
)
tool.invoke("Contact Jane Doe at jane.doe@example.com.")
# "Contact Maria Chen at chen@[EMAIL_ADDRESS_xxxx]."
```
