# Enabling dataset text search

On a self-hosted instance, to be able to use the [dataset text search](https://docs.tonic.ai/textual/dataset-configure-redaction/textual-datasets-review-results/dataset-text-search), you must set up a search provider.

You can use either:

* [Apache Lucene](https://lucene.apache.org/)
* [OpenSearch](https://opensearch.org/)

## Identifying the search provider to use

To identify the search provider to use, set the environment variable `DATASET_TEXT_SEARCH_PROVIDER`.

The available values are:

* `lucene` - This is the default value. Indicates to use Apache Lucene as the search provider.
* `opensearch` - Indicates to use OpenSearch as the search provider. OpenSearch provides a more robust search index. You must use OpenSearch if the Textual API and worker are on different hosts.

## Configuring Lucene <a href="#lucene" id="lucene"></a>

If you use Lucene as your search provider:

1. Mount a shared folder as a volume for both the API and the worker.\
   \
   The folder must be accessible to both the API and the Textual worker.
2. Configure the environment variable `DATASET_TEXT_SEARCH_LUCENE_INDEX_PATH` to point to that folder.\
   \
   The value must be the absolute path to the folder.\ <br>

## Configuring OpenSearch <a href="#opensearch" id="opensearch"></a>

To use OpenSearch as your search provider, configure the following.

### **Connection to your OpenSearch instance**

To configure the connection, set the following environment variable:

* `DATASET_TEXT_SEARCH_OPENSEARCH_URL` - Points to your OpenSearch instance.

If you enabled authentication for your OpenSearch cluster, set the following environment variables:

* `DATASET_TEXT_SEARCH_OPENSEARCH_USERNAME` - Username for OpenSearch authentication.
* `DATASET_TEXT_SEARCH_OPENSEARCH_PASSWORD` - Password for OpenSearch authentication.

Also optionally, for testing, to disable SSL certificate validation, set `DATASET_TEXT_SEARCH_OPENSEARCH_DISABLE_CERTIFICATE_VALIDATION` to `true`.

### **Setting a custom index name**

By default, the name of the OpenSearch index for the dataset text search is `textual_text_search`.

To use a different name for the index, set the index name as the value of `DATASET_TEXT_SEARCH_OPENSEARCH_INDEX_PREFIX`.

### **Setting the number of shards and replicas**

By default, Textual uses:

* 5 shards
* 1 replica

You can adjust these numbers as needed for your instance.

To configure the number of shards and replicas, set the following environment variables:

* `DATASET_TEXT_SEARCH_OPENSEARCH_NUM_SHARDS`
* `DATASET_TEXT_SEARCH_OPENSEARCH_NUM_REPLICAS`


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://docs.tonic.ai/textual/textual-install-administer/configuring-textual/enable-and-configure-textual-features/config-textsearch.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
