Configuring Textual

Configuring Tonic Textual environment variables

On a self-hosted instance of Textual, much of the configuration takes the form of environment variables.

​​Docker

Add the variable to .env in the format:

SETTING_NAME=value

After you update .env, to restart Textual and complete the update, run:

$ docker-compose down

$ docker-compose pull && docker-compose up -d

Kubernetes

In values.yaml, add the environment variable to the appropriate env section of the Helm char. For example:

env: {
  "TEXTUAL_ML_WORKERS": "2"
}

After you update the yaml file, to restart the service and complete the update, run:

$ helm upgrade <name_of_release> -n <namespace_name> <path-to-helm-chart>

The above helm upgrade command is always safe to use when you provide specific version numbers. However, if you use the latest tag, it might result in Textual containers that have different versions.

Configuring the number of textual-ml workers

The TEXTUAL_ML_WORKERS environment variable specifies the number of workers to use within the textual-ml container. The default value is 1.

Having multiple workers allows for parallelization of inferences with NER models.

When you deploy Textual with Kubernetes on GPUs, parallelization allows the textual-ml container to fully utilize the GPU.

We recommend 6GB of GPU RAM for each worker.

Enabling LLM synthesis on the Playground page

On the Playground page, the LLM synthesis option uses a large language model (LLM) to generate synthesized replacement values for the detected entities in the text.

This option requires an OpenAI key.

Before you can use this option on your self-hosted Textual instance, you must provide an OpenAI key as the value of the environment variable SOLAR_OPENAI_KEY.

Enabling PDF scanning

To scan PDF files, Tonic Textual uses OCR on PDFS on Azure Cognitive Services. To enable PDF scanning on your self-hosted instance of Textual, you need to provide the Azure document intelligence key and endpoint for your Azure account.

Docker

In .env, uncomment and provide values for the following settings:

SOLAR_AZURE_DOC_INTELLIGENCE_KEY=#<FILL IN>

SOLAR_AZURE_DOC_INTELLIGENCE_ENDPOINT=#<FILL IN>

Kubernetes

In values.yaml, uncomment and provide values for the following settings:

azureDocIntelligenceKey: <key>

azureDocIntelligenceEndpoint: <endpoint-url>

Choosing an auxiliary model

To improve overall inference, you can configure Textual to use an auxiliary NER model.

An auxiliary model detects the following types:

  • DATE_TIME

  • EVENT

  • LANGUAGE

  • LAW

  • LOCATION

  • MONEY

  • NRP

  • NUMERIC_VALUE

  • ORGANIZATION

  • PERSON

  • PRODUCT

  • WORK_OF_ART

By default, Textual uses Spacy’s en_core_web_trf model. The en_core_web_lg and en_core_web_sm models allow for faster throughput, but with some drop in accuracy for the types listed above.

You can also disable the auxiliary model.

On a self-hosted Textual instance, you configure the auxiliary model as the value of the environment variable SOLAR_AUX_MODEL. The available values are:

  • en_core_web_trf - This is the default value.

  • en_core_web_lg

  • en_core_web_sm

  • none - Indicates to not use an auxiliary model.

Last updated