Textual LangChain integration

The Textual LangChain integrationarrow-up-right provides Textual tools that you can use to detect and de-identify sensitive data in text, JSON, HTML, and files.

You can replace entity values with realistic generated values, or with tokenized placeholders. You can also extract the list of detected entities.

You can drop the Textual tools into any LangChain chain or agent as standard tools.

Installing the integration

pip install langchain-textual

Providing a Textual API key

To use the Textual tools, you must provide a Textual API key.

To set the API key as an environment variable value:

export TONIC_TEXTUAL_API_KEY="your-api-key"

To provide the API key when you call a tool:

tool = TonicTextualRedactText(tonic_textual_api_key="your-api-key")

Calling a tool

To call a tool on Textual Cloud:

from langchain_textual import <tool name>

tool = <tool name>(<configuration parameters>)
tool.invoke(<tool target>)

To call a tool on a self-hosted instance of Textual, the configuration parameters must include your Textual instance URL:

Available tools

Tool
Input
Use to

TonicTextualRedactText

Plain text string

Synthesize or tokenize entities in raw text or the content of a .txt file.

TonicTextualRedactJson

JSON string

Synthesize or tokenize entities in raw JSON or the content of a .json file.

TonicTextualRedactHtml

HTML string

Synthesize or tokenize entities in raw HTML, or the content of an .html or .htm file.

TonicTextualRedactFile

File path

Synthesize or tokenize entities in PDF, image (JPG, PNG), CSV, or TSV files. For .txt, .json, .htm, or .html files, you read the file content, then pass the content to the text, JSON, or HTML tool.

TonicTextualExtractEntities

Plain text string

Return a list of detected entities. For each entity, identify the type, value, location, and detection confidence.

TonicTextualPiiTypes

None

Lists the supported entity types. Provides the type names to use in the configuration for the redaction tools.

Text redaction

JSON redaction

HTML redaction

File redaction (PDF, image, CSV, TSV)

For .txt, .json, and .html/.htm files, you do not use the file redaction tool. Instead, you read the file content, then pass the content to the text, JSON, or HTML redaction tool.

Get the list of detected entities

Returns a JSON array of detected entities, each with the following fields:

  • label - The entity type for the detected entity.

  • text - The text of the detected entity.

  • start - The start location of the entity.

  • end - The end location of the entity.

  • score - The confidence score. Indicates how confident Textual is in its detection.

Tool configuration options

All of the redaction tools provide the same configuration options to determine how to de-identify entities of specific types.

Getting the list of available entity types

When you configure entity type handling, you must provide the entity type names.

To get a list of all of the supported entity type names, use TonicTextualPiiTypes:

Available handling options

The available entity type handling options are:

  • Redaction - This is the default, unless you specify otherwise. Indicates to replace the entity value with the entity type name followed by a unique identifier for each unique value. For example, replace John with NAME_GIVEN_1234.

  • Synthesis - Indicates to replace the entity value with a realistic generated value. For example, replace John with Michael.

  • Off - Indicates to not replace the entity value at all, and keep it as is in the output.

Specifying the default handling option

To specify the default handling option to use for all entity types, use the generator_default parameter.

In the following example, the default handling option is set to Synthesis. All entities are replaced with realistic generated values.

Providing handling options for specific entity types

To provide handling options for specific entity types, use the generator_config parameter.

Within generator_config, for each entity type:

In the following example, the default handling option is Off. First and last names are replaced with realistic generated values, and email addresses are redacted:

Last updated

Was this helpful?