Enabling dataset text search
On a self-hosted instance, to be able to use the dataset text search, you must set up a search provider.
You can use either:
Identifying the search provider to use
To identify the search provider to use, set the environment variable DATASET_TEXT_SEARCH_PROVIDER.
The available values are:
lucene- This is the default value. Indicates to use Apache Lucene as the search provider.opensearch- Indicates to use OpenSearch as the search provider. OpenSearch provides a more robust search index. You must use OpenSearch if the Textual API and worker are on different hosts.
Configuring Lucene
If you use Lucene as your search provider:
Mount a shared folder as a volume for both the API and the worker
Configure the environment variable DATASET_TEXT_SEARCH_LUCENE_INDEX_PATH to point to that folder.
Configuring OpenSearch
To use OpenSearch as your search provider, configure the following.
Connection to your OpenSearch instance
To configure the connection, set the following environment variable:
DATASET_TEXT_SEARCH_OPENSEARCH_URL- Points to your OpenSearch instance.
If you enabled authentication for your OpenSearch cluster, set the following environment variables:
DATASET_TEXT_SEARCH_OPENSEARCH_USERNAME- Username for OpenSearch authentication.DATASET_TEXT_SEARCH_OPENSEARCH_PASSWORD- Password for OpenSearch authentication.
Also optionally, for testing, to disable SSL certificate validation, set DATASET_TEXT_SEARCH_OPENSEARCH_DISABLE_CERTIFICATE_VALIDATION to true.
Setting a custom index name
By default, the name of the OpenSearch index for the dataset text search is textual_text_search.
To use a different name for the index, set the index name as the value of DATASET_TEXT_SEARCH_OPENSEARCH_INDEX_PREFIX.
Setting the number of shards and replicas
By default, Textual uses:
5 shards
1 replica
You can adjust these numbers as needed for your instance.
To configure the number of shards and replicas, set the following environment variables:
DATASET_TEXT_SEARCH_OPENSEARCH_NUM_SHARDSDATASET_TEXT_SEARCH_OPENSEARCH_NUM_REPLICAS
Last updated
Was this helpful?