The Textual Haystack integration provides Haystack components that you use to call Textual detection and redaction functions from within Haystack. You can add those components to your Haystack pipeline.
To use the same key for every call, set the API key as the value of TONIC_TEXTUAL_API_KEY:
export TONIC_TEXTUAL_API_KEY="your-api-key"
Alternatively, you can provide the API key when you call a component:
from haystack.utils.auth import Secret
extractor = TonicTextualEntityExtractor(
api_key=Secret.from_token("your-api-key")
)
Providing the URL for a self-hosted instance
If you are calling a self-hosted instance of Textual, then the call must include the Textual URL:
Textual Haystack components
The Textual Haystack integration includes the following components:
Component
Description
TonicTextualEntityExtractor
Extracts entities from provided content.
The results include the entity type, entity value, location within the text, and the detection confidence score.
TonicTextualDocumentCleaner
Replaces entities in provided content.
You can optionally specify how to replace values for different entity types.
Using the entity extraction component
To use the entity extraction component, you provide the content for the component to analyze.
For example:
Document cleaning
To use the document cleaning component, you provide the content for the component to redact.
For example:
You can optionally specify how to replace each entity type:
Redaction - Replaces each value with the entity type name plus a unique identifier.
Synthesis - Replaces each value with a realistic generated value.
Off - Does not replace the value.
generator_default identifies the default to use for all entity types. generator_config specifies the handling for individual entity types.
from haystack.dataclasses import Document
from haystack_integrations.components.tonic_textual import (
TonicTextualEntityExtractor,
)
extractor = TonicTextualEntityExtractor()
result = extractor.run(
documents=[Document(content="My name is John Smith and my email is [email protected]")]
)
for entity in TonicTextualEntityExtractor.get_stored_annotations(result["documents"][0]):
print(f"{entity.entity}: {entity.text} (confidence: {entity.score:.2f})")
# NAME_GIVEN: John (confidence: 0.90)
# NAME_FAMILY: Smith (confidence: 0.90)
# EMAIL_ADDRESS: [email protected] (confidence: 0.95)
from haystack.dataclasses import Document
from haystack_integrations.components.tonic_textual import (
TonicTextualDocumentCleaner,
)
# Synthesize PII with realistic fakes
cleaner = TonicTextualDocumentCleaner(generator_default="Synthesis")
result = cleaner.run(
documents=[Document(content="Contact John Smith at [email protected]")]
)
print(result["documents"][0].content)
# "Contact Maria Chen at [email protected]"