Configuring Textual
Configuring Tonic Textual environment variables
On a self-hosted instance of Textual, much of the configuration takes the form of environment variables.
Docker
Add the variable to .env in the format:
SETTING_NAME=value
After you update .env, to restart Textual and complete the update, run:
$ docker-compose down
$ docker-compose pull && docker-compose up -d
Kubernetes
In values.yaml, add the environment variable to the appropriate env section of the Helm char. For example:
After you update the yaml file, to restart the service and complete the update, run:
$ helm upgrade <name_of_release> -n <namespace_name> <path-to-helm-chart>
The above helm upgrade command is always safe to use when you provide specific version numbers. However, if you use the latest tag, it might result in Textual containers that have different versions.
Configuring the number of textual-ml workers
The TEXTUAL_ML_WORKERS
environment variable specifies the number of workers to use within the textual-ml
container. The default value is 1.
Having multiple workers allows for parallelization of inferences with NER models.
When you deploy Textual with Kubernetes on GPUs, parallelization allows the textual-ml
container to fully utilize the GPU.
We recommend 6GB of GPU RAM for each worker.
Enabling LLM synthesis on the Playground page
On the Playground page, the LLM synthesis option uses a large language model (LLM) to generate synthesized replacement values for the detected entities in the text.
This option requires an OpenAI key.
Before you can use this option on your self-hosted Textual instance, you must provide an OpenAI key as the value of the environment variable SOLAR_OPENAI_KEY
.
Enabling PDF scanning
To scan PDF files, Tonic Textual uses OCR on PDFS on Azure Cognitive Services. To enable PDF scanning on your self-hosted instance of Textual, you need to provide the Azure document intelligence key and endpoint for your Azure account.
Docker
In .env, uncomment and provide values for the following settings:
SOLAR_AZURE_DOC_INTELLIGENCE_KEY=#<FILL IN>
SOLAR_AZURE_DOC_INTELLIGENCE_ENDPOINT=#<FILL IN>
Kubernetes
In values.yaml, uncomment and provide values for the following settings:
azureDocIntelligenceKey: <key>
azureDocIntelligenceEndpoint: <endpoint-url>
Choosing an auxiliary model
To improve overall inference, you can configure Textual to use an auxiliary NER model.
An auxiliary model detects the following types:
DATE_TIME
EVENT
LANGUAGE
LAW
LOCATION
MONEY
NRP
NUMERIC_VALUE
ORGANIZATION
PERSON
PRODUCT
WORK_OF_ART
By default, Textual uses Spacy’s en_core_web_trf
model. The en_core_web_lg
and en_core_web_sm
models allow for faster throughput, but with some drop in accuracy for the types listed above.
You can also disable the auxiliary model.
On a self-hosted Textual instance, you configure the auxiliary model as the value of the environment variable SOLAR_AUX_MODEL
. The available values are:
en_core_web_trf
- This is the default value.en_core_web_lg
en_core_web_sm
none
- Indicates to not use an auxiliary model.
Last updated