Configuring processing and parallelism

The following environment variables control job and file processing.

Configuring the number of jobs to run concurrently

The number of jobs that can run concurrently can affect the number of Textual workers that you need. The more jobs that can run concurrently, the fewer workers that are needed.

Textual provides a set of environment variables to control the number of jobs that each Textual worker can run at the same time.

  • A global setting to control the total number of jobs across all job types.

  • Individual settings to control the number of concurrent jobs for specific job types.

Configuring the global limit for concurrent jobs

The environment variable SOLAR_MAX_CONCURRENT_WORKER_JOBS controls the number of jobs that can run concurrently across all of the job types.

The default value is 16.

Configuring the limits for specific job types

The following environment variables control the number of jobs that can run concurrently for specific types of jobs.

  • SOLAR_MAX_CONCURRENT_JOBS_DEIDENTIFY_FILE Default value: 8 Includes the following actions:

    • Upload a file to a dataset

    • Rescan a dataset file that fails to process

    • Rescan a dataset file after a custom entity type is created or edited

    • Rescan a dataset file after an update to the dataset configuration

    • Textual uploads a cloud storage file for processing

  • SOLAR_MAX_CONCURRENT_JOBS_DEIDENTIFY_UNATTACHED_FILE Default value: 8 Includes the following actions:

    • Upload a file to the Home page preview tool

    • In the SDK, make a call to redact.start_file_redaction

  • SOLAR_MAX_CONCURRENT_JOBS_PARSE_FILES Default value: 8 Includes the following actions:

    • Run parsing jobs

  • SOLAR_MAX_CONCURRENT_JOBS_AUDIO_TRANSCRIPTION Default value: 8 Includes the following actions:

    • Transcribe and redact an audio file

  • SOLAR_MAX_CONCURRENT_JOBS_PROCESS_EXTERNAL_FILES Default value: 8 Includes the following actions:

    • Synchronize cloud storage files for a dataset

  • SOLAR_MAX_CONCURRENT_JOBS_GENERATE_EXTERNAL_FILES Default value: 8 Includes the following actions:

    • Generate output files for a cloud storage dataset

  • SOLAR_MAX_CONCURRENT_JOBS_FILE_ANNOTATION Default value: 5 Includes the following actions:

    • For a model-based custom entity type, identify the entity type values in the test data.

  • SOLAR_MAX_CONCURRENT_JOBS_GENERATE_NEW_GUIDELINES Default value: 2 Includes the following actions:

    • For a model-based custom entity type, generate recommendations to improve the guidelines.

  • SOLAR_MAX_CONCURRENT_JOBS_TRAINING_FILE_ANALYSIS Default value: 4 Includes the following actions:

    • For a model-based custom entity type, analyze the training files.

  • SOLAR_MAX_CONCURRENT_JOBS_TRAINING_FILE_ANNOTATION Default value: 4 Includes the following actions:

    • For a model-based custom entity type, use the selected guidelines version to identify the entity values in the training data.

Configuring the size of the datetime generator cache

When it generates datetime values, to optimize the processing, Textual stores the redacted datetime values in a cache.

To change the cache size, configure the environment variable SOLAR_DATETIME_GENERATOR_CACHE_CAPACITY.

The default value is 100000, meaning that the cache contains 100,000 values.

Note that while increasing the size of the cache can speed up processing, it also uses more RAM.

Configuring the number of PDF pages to redact simultaneously

When Textual redacts PDF files so that a user can preview or download the output, the following environment variable determines the number of pages that it processes simultaneously:

SOLAR_PDF_PAGE_REDACTION_PARALLELISM

The default value is 4, meaning that Textual processes 4 pages at a time.

Configuring the number of PDF files to plan simultaneously

When Textual plans the redaction of PDF files for a user to preview or download, the following environment variable determines the number of files that it plans simultaneously.

SOLAR_PDF_DOC_PLAN_PARALLELISM

The default value is 3, meaning that Textual plans 3 PDF files at a time.

Configuring how often to purge cached PDF pages

When it redacts PDF files, Textual stores the redacted PDF pages in a cache.

The following environment variable determines how often Textual purges the cache of PDF pages.

PURGE_REDACTED_PAGES_IN_HOURS

The default value is 12, meaning that Textual purges the redacted PDF pages cache every 12 hours.

Last updated

Was this helpful?