# Textual Haystack

The [Textual Haystack](https://github.com/TonicAI/textual-haystack) integration provides [Haystack](https://haystack.deepset.ai/) components that you use to call Textual detection and redaction functions from within Haystack. You can add those components to your Haystack pipeline.

## Installing and configuring the integration

To install the integration, run:

```
pip install textual-haystack
```

### Providing a Textual API key

The calls to Textual require a [Textual API key](https://docs.tonic.ai/textual/tonic-textual-api/textual-api-keys).

To use the same key for every call, set the API key as the value of `TONIC_TEXTUAL_API_KEY`:

```
export TONIC_TEXTUAL_API_KEY="your-api-key"
```

Alternatively, you can provide the API key when you call a component:

```
from haystack.utils.auth import Secret

extractor = TonicTextualEntityExtractor(
    api_key=Secret.from_token("your-api-key")
)
```

### Providing the URL for a self-hosted instance

If you are calling a self-hosted instance of Textual, then the call must include the Textual URL:

```
extractor = TonicTextualEntityExtractor(
    base_url="https://textual.your-company.com"
)
```

## Textual Haystack components

The Textual Haystack integration includes the following components:

<table><thead><tr><th width="318.7578125" valign="top">Component</th><th valign="top">Description</th></tr></thead><tbody><tr><td valign="top"><code>TonicTextualEntityExtractor</code></td><td valign="top">Extracts entities from provided content.<br><br>The results include the entity type, entity value, location within the text, and the detection confidence score.</td></tr><tr><td valign="top"><code>TonicTextualDocumentCleaner</code></td><td valign="top">Replaces entities in provided content.<br><br>You can optionally specify how to replace values for different entity types.</td></tr></tbody></table>

## Using the entity extraction component

To use the entity extraction component, you provide the content for the component to analyze.

For example:

```
from haystack.dataclasses import Document
from haystack_integrations.components.tonic_textual import (
    TonicTextualEntityExtractor,
)

extractor = TonicTextualEntityExtractor()
result = extractor.run(
    documents=[Document(content="My name is John Smith and my email is john@example.com")]
)

for entity in TonicTextualEntityExtractor.get_stored_annotations(result["documents"][0]):
    print(f"{entity.entity}: {entity.text} (confidence: {entity.score:.2f})")
# NAME_GIVEN: John (confidence: 0.90)
# NAME_FAMILY: Smith (confidence: 0.90)
# EMAIL_ADDRESS: john@example.com (confidence: 0.95)
```

## Document cleaning

To use the document cleaning component, you provide the content for the component to redact.

For example:

```
from haystack.dataclasses import Document
from haystack_integrations.components.tonic_textual import (
    TonicTextualDocumentCleaner,
)

# Synthesize PII with realistic fakes
cleaner = TonicTextualDocumentCleaner(generator_default="Synthesis")
result = cleaner.run(
    documents=[Document(content="Contact John Smith at john@example.com")]
)
print(result["documents"][0].content)
# "Contact Maria Chen at maria.chen@gmail.com"
```

You can optionally specify how to replace each entity type:

* `Redaction` - Replaces each value with the entity type name plus a unique identifier.
* `Synthesis` - Replaces each value with a realistic generated value.
* `Off` - Does not replace the value.

`generator_default` identifies the default to use for all entity types. `generator_config` specifies the handling for individual entity types.

For example:

```
cleaner = TonicTextualDocumentCleaner(
    generator_default="Off",
    generator_config={
        "NAME_GIVEN": "Synthesis",
        "NAME_FAMILY": "Synthesis",
        "EMAIL_ADDRESS": "Redaction",
    },
)
```
