# Textual Haystack

The [Textual Haystack](https://github.com/TonicAI/textual-haystack) integration provides [Haystack](https://haystack.deepset.ai/) components that you use to call Textual detection and redaction functions from within Haystack. You can add those components to your Haystack pipeline.

## Installing and configuring the integration

To install the integration, run:

```
pip install textual-haystack
```

### Providing a Textual API key

The calls to Textual require a [Textual API key](/textual/tonic-textual-api/textual-api-keys.md).

To use the same key for every call, set the API key as the value of `TONIC_TEXTUAL_API_KEY`:

```
export TONIC_TEXTUAL_API_KEY="your-api-key"
```

Alternatively, you can provide the API key when you call a component:

```
from haystack.utils.auth import Secret

extractor = TonicTextualEntityExtractor(
    api_key=Secret.from_token("your-api-key")
)
```

### Providing the URL for a self-hosted instance

If you are calling a self-hosted instance of Textual, then the call must include the Textual URL:

```
extractor = TonicTextualEntityExtractor(
    base_url="https://textual.your-company.com"
)
```

## Textual Haystack components

The Textual Haystack integration includes the following components:

<table><thead><tr><th width="318.7578125" valign="top">Component</th><th valign="top">Description</th></tr></thead><tbody><tr><td valign="top"><code>TonicTextualEntityExtractor</code></td><td valign="top">Extracts entities from provided content.<br><br>The results include the entity type, entity value, location within the text, and the detection confidence score.</td></tr><tr><td valign="top"><code>TonicTextualDocumentCleaner</code></td><td valign="top">Replaces entities in provided content.<br><br>You can optionally specify how to replace values for different entity types.</td></tr></tbody></table>

## Using the entity extraction component

To use the entity extraction component, you provide the content for the component to analyze.

For example:

```
from haystack.dataclasses import Document
from haystack_integrations.components.tonic_textual import (
    TonicTextualEntityExtractor,
)

extractor = TonicTextualEntityExtractor()
result = extractor.run(
    documents=[Document(content="My name is John Smith and my email is john@example.com")]
)

for entity in TonicTextualEntityExtractor.get_stored_annotations(result["documents"][0]):
    print(f"{entity.entity}: {entity.text} (confidence: {entity.score:.2f})")
# NAME_GIVEN: John (confidence: 0.90)
# NAME_FAMILY: Smith (confidence: 0.90)
# EMAIL_ADDRESS: john@example.com (confidence: 0.95)
```

## Document cleaning

To use the document cleaning component, you provide the content for the component to redact.

For example:

```
from haystack.dataclasses import Document
from haystack_integrations.components.tonic_textual import (
    TonicTextualDocumentCleaner,
)

# Synthesize PII with realistic fakes
cleaner = TonicTextualDocumentCleaner(generator_default="Synthesis")
result = cleaner.run(
    documents=[Document(content="Contact John Smith at john@example.com")]
)
print(result["documents"][0].content)
# "Contact Maria Chen at maria.chen@gmail.com"
```

You can optionally specify how to replace each entity type:

* `Redaction` - Replaces each value with the entity type name plus a unique identifier.
* `Synthesis` - Replaces each value with a realistic generated value.
* `Off` - Does not replace the value.

`generator_default` identifies the default to use for all entity types. `generator_config` specifies the handling for individual entity types.

For example:

```
cleaner = TonicTextualDocumentCleaner(
    generator_default="Off",
    generator_config={
        "NAME_GIVEN": "Synthesis",
        "NAME_FAMILY": "Synthesis",
        "EMAIL_ADDRESS": "Redaction",
    },
)
```


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://docs.tonic.ai/textual/textual-integrations/textual-haystack.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
Component	Description
`TonicTextualEntityExtractor`	Extracts entities from provided content. The results include the entity type, entity value, location within the text, and the detection confidence score.
`TonicTextualDocumentCleaner`	Replaces entities in provided content. You can optionally specify how to replace values for different entity types.