# Enabling dataset text search

On a self-hosted instance, to be able to use the [dataset text search](https://docs.tonic.ai/textual/dataset-configure-redaction/textual-datasets-review-results/dataset-text-search), you must set up a search provider.

You can use either:

* [Apache Lucene](https://lucene.apache.org/)
* [OpenSearch](https://opensearch.org/)

## Identifying the search provider to use

To identify the search provider to use, set the environment variable `DATASET_TEXT_SEARCH_PROVIDER`.

The available values are:

* `lucene` - This is the default value. Indicates to use Apache Lucene as the search provider.
* `opensearch` - Indicates to use OpenSearch as the search provider. OpenSearch provides a more robust search index. You must use OpenSearch if the Textual API and worker are on different hosts.

## Configuring Lucene <a href="#lucene" id="lucene"></a>

If you use Lucene as your search provider:

1. Mount a shared folder as a volume for both the API and the worker.\
   \
   The folder must be accessible to both the API and the Textual worker.
2. Configure the environment variable `DATASET_TEXT_SEARCH_LUCENE_INDEX_PATH` to point to that folder.\
   \
   The value must be the absolute path to the folder.\ <br>

## Configuring OpenSearch <a href="#opensearch" id="opensearch"></a>

To use OpenSearch as your search provider, configure the following.

### **Connection to your OpenSearch instance**

To configure the connection, set the following environment variable:

* `DATASET_TEXT_SEARCH_OPENSEARCH_URL` - Points to your OpenSearch instance.

If you enabled authentication for your OpenSearch cluster, set the following environment variables:

* `DATASET_TEXT_SEARCH_OPENSEARCH_USERNAME` - Username for OpenSearch authentication.
* `DATASET_TEXT_SEARCH_OPENSEARCH_PASSWORD` - Password for OpenSearch authentication.

Also optionally, for testing, to disable SSL certificate validation, set `DATASET_TEXT_SEARCH_OPENSEARCH_DISABLE_CERTIFICATE_VALIDATION` to `true`.

### **Setting a custom index name**

By default, the name of the OpenSearch index for the dataset text search is `textual_text_search`.

To use a different name for the index, set the index name as the value of `DATASET_TEXT_SEARCH_OPENSEARCH_INDEX_PREFIX`.

### **Setting the number of shards and replicas**

By default, Textual uses:

* 5 shards
* 1 replica

You can adjust these numbers as needed for your instance.

To configure the number of shards and replicas, set the following environment variables:

* `DATASET_TEXT_SEARCH_OPENSEARCH_NUM_SHARDS`
* `DATASET_TEXT_SEARCH_OPENSEARCH_NUM_REPLICAS`
