Enabling dataset text search

On a self-hosted instance, to be able to use the dataset text search, you must set up a search provider.

You can use either:

Identifying the search provider to use

To identify the search provider to use, set the environment variable DATASET_TEXT_SEARCH_PROVIDER.

The available values are:

  • lucene - This is the default value. Indicates to use Apache Lucene as the search provider.

  • opensearch - Indicates to use OpenSearch as the search provider. OpenSearch provides a more robust search index. You must use OpenSearch if the Textual API and worker are on different hosts.

Configuring Lucene

If you use Lucene as your search provider:

  1. Mount a shared folder as a volume for both the API and the worker

  2. Configure the environment variable DATASET_TEXT_SEARCH_LUCENE_INDEX_PATH to point to that folder.

Configuring OpenSearch

To use OpenSearch as your search provider, configure the following.

Connection to your OpenSearch instance

To configure the connection, set the following environment variable:

  • DATASET_TEXT_SEARCH_OPENSEARCH_URL - Points to your OpenSearch instance.

If you enabled authentication for your OpenSearch cluster, set the following environment variables:

  • DATASET_TEXT_SEARCH_OPENSEARCH_USERNAME - Username for OpenSearch authentication.

  • DATASET_TEXT_SEARCH_OPENSEARCH_PASSWORD - Password for OpenSearch authentication.

Also optionally, for testing, to disable SSL certificate validation, set DATASET_TEXT_SEARCH_OPENSEARCH_DISABLE_CERTIFICATE_VALIDATION to true.

Setting a custom index name

By default, the name of the OpenSearch index for the dataset text search is textual_text_search.

To use a different name for the index, set the index name as the value of DATASET_TEXT_SEARCH_OPENSEARCH_INDEX_PREFIX.

Setting the number of shards and replicas

By default, Textual uses:

  • 5 shards

  • 1 replica

You can adjust these numbers as needed for your instance.

To configure the number of shards and replicas, set the following environment variables:

  • DATASET_TEXT_SEARCH_OPENSEARCH_NUM_SHARDS

  • DATASET_TEXT_SEARCH_OPENSEARCH_NUM_REPLICAS

Last updated

Was this helpful?