Language support in Textual

Tonic Textual supports languages in addition to English. Textual automatically detects the language and applies the correct model.

On self-hosted instances, you configure whether to support multiple languages, and can optionally provide auxiliary language models.

Supported languages

Textual can detect values in the following languages:

Name

Code

Afrikaans

Albanian

Amharic

Arabic

Armenian

Assamese

Azerbaijani

Basque

Belarusian

Bengali

Bengali Romanized

Bosnian

Breton

Bulgarian

Burmese

Burmese (alternative)

Catalan

Chinese (Simplified)

Chinese (Traditional)

Croatian

Czech

Danish

Dutch

English

Esperanto

Estonian

Filipino

Finnish

French

Galician

Irish

Georgian

German

Greek

Gujarati

Hausa

Hebrew

Hindi

Hindi Romanized

Hungarian

Icelandic

Indonesian

Italian

Japanese

Javanese

Kannada

Kazakh

Khmer

Korean

Kurdish (Kurmanji)

Kyrgyz

Lao

Latin

Latvian

Lithuanian

Macedonian

Malagasy

Malay

Malayalam

Marathi

Mongolian

Nepali

Norwegian

Oriya

Oromo

Pashto

Persian

Polish

Portuguese

Punjabi

Romanian

Russian

Sanskrit

Scottish Gaelic

Serbian

Sinhala

Sindhi

Slovak

Slovenian

Somali

Spanish

Sundanese

Swahili

Swedish

Tamil

Tamil Romanized

Telugu

Telugu Romanized

Thai

Turkish

Ukrainian

Urdu

Urdu Romanized

Uyghur

Uzbek

Vietnamese

Welsh

Western Frisian

Xhosa

Yiddish

Self-hosted instances

On a self-hosted instance, you configure whether Textual supports multiple languages. When you enable multi-language support, you can limit Textual to its multi-language model whenever it detects non-English content.

You can also optionally provide auxiliary language models.

Enabling multi-language support

To enable support for languages other than English, set the environment variable TEXTUAL_MULTI_LINGUAL=true.

The setting is used by the machine learning container.

Using only the multi-language model for non-English content

When TEXTUAL_MULTI_LINGUAL=true, then by default, when Textual detects any non-English content, it runs both its English model and its multi-language model.

To instead only use the multi-language model, and not use the English model, set the environment variable TEXTUAL_MULTI_LINGUAL_XLM_ONLY=true. This can improve the precision of detections in non-English text.

Providing auxiliary language model assets

You can provide additional language model assets for Textual to use.

By default, Textual looks for model assets in the machine learning container, in /usr/bin/textual/language_models. The default Helm and Docker Compose configurations include the volume mount.

To choose a different location, set the environment variable TEXTUAL_LANGUAGE_MODEL_DIRECTORY. Note that if you change the location, you must also modify your volume mounts.

For help with installing model assets, contact Tonic.ai support ([email protected]).

Last updated 1 hour ago

Was this helpful?

Good evening

hashtagSupported languages

hashtagSelf-hosted instances

hashtagEnabling multi-language support

hashtagUsing only the multi-language model for non-English content

hashtagProviding auxiliary language model assets

Supported languages

Self-hosted instances

Enabling multi-language support

Using only the multi-language model for non-English content

Providing auxiliary language model assets