Configuring processing and parallelism
The following environment variables control job and file processing.
Configuring the number of jobs to run concurrently
The number of jobs that can run concurrently can affect the number of Textual workers that you need. The more jobs that can run concurrently, the fewer workers that are needed.
Textual provides a set of environment variables to control the number of jobs that each Textual worker can run at the same time.
A global setting to control the total number of jobs across all job types.
Individual settings to control the number of concurrent jobs for specific job types.
Configuring the global limit for concurrent jobs
The environment variable SOLAR_MAX_CONCURRENT_WORKER_JOBS controls the number of jobs that can run concurrently across all of the job types.
The default value is 16.
Configuring the limits for specific job types
The following environment variables control the number of jobs that can run concurrently for specific types of jobs.
SOLAR_MAX_CONCURRENT_JOBS_DEIDENTIFY_FILEDefault value: 8 Includes the following actions:Upload a file to a dataset
Rescan a dataset file that fails to process
Rescan a dataset file after a custom entity type is created or edited
Rescan a dataset file after an update to the dataset configuration
Textual uploads a cloud storage file for processing
SOLAR_MAX_CONCURRENT_JOBS_DEIDENTIFY_UNATTACHED_FILEDefault value: 8 Includes the following actions:Upload a file to the Home page preview tool
In the SDK, make a call to
redact.start_file_redaction
SOLAR_MAX_CONCURRENT_JOBS_PARSE_FILESDefault value: 8 Includes the following actions:Run parsing jobs
SOLAR_MAX_CONCURRENT_JOBS_AUDIO_TRANSCRIPTIONDefault value: 8 Includes the following actions:Transcribe and redact an audio file
SOLAR_MAX_CONCURRENT_JOBS_PROCESS_EXTERNAL_FILESDefault value: 8 Includes the following actions:Synchronize cloud storage files for a dataset
SOLAR_MAX_CONCURRENT_JOBS_GENERATE_EXTERNAL_FILESDefault value: 8 Includes the following actions:Generate output files for a cloud storage dataset
SOLAR_MAX_CONCURRENT_JOBS_FILE_ANNOTATIONDefault value: 5 Includes the following actions:For a model-based custom entity type, identify the entity type values in the test data.
SOLAR_MAX_CONCURRENT_JOBS_GENERATE_NEW_GUIDELINESDefault value: 2 Includes the following actions:For a model-based custom entity type, generate recommendations to improve the guidelines.
SOLAR_MAX_CONCURRENT_JOBS_TRAINING_FILE_ANALYSISDefault value: 4 Includes the following actions:For a model-based custom entity type, analyze the training files.
SOLAR_MAX_CONCURRENT_JOBS_TRAINING_FILE_ANNOTATIONDefault value: 4 Includes the following actions:For a model-based custom entity type, use the selected guidelines version to identify the entity values in the training data.
Configuring the size of the datetime generator cache
When it generates datetime values, to optimize the processing, Textual stores the redacted datetime values in a cache.
To change the cache size, configure the environment variable SOLAR_DATETIME_GENERATOR_CACHE_CAPACITY.
The default value is 100000, meaning that the cache contains 100,000 values.
Note that while increasing the size of the cache can speed up processing, it also uses more RAM.
Configuring the number of PDF pages to redact simultaneously
When Textual redacts PDF files so that a user can preview or download the output, the following environment variable determines the number of pages that it processes simultaneously:
SOLAR_PDF_PAGE_REDACTION_PARALLELISM
The default value is 4, meaning that Textual processes 4 pages at a time.
Configuring the number of PDF files to plan simultaneously
When Textual plans the redaction of PDF files for a user to preview or download, the following environment variable determines the number of files that it plans simultaneously.
SOLAR_PDF_DOC_PLAN_PARALLELISM
The default value is 3, meaning that Textual plans 3 PDF files at a time.
Configuring how often to purge cached PDF pages
When it redacts PDF files, Textual stores the redacted PDF pages in a cache.
The following environment variable determines how often Textual purges the cache of PDF pages.
PURGE_REDACTED_PAGES_IN_HOURS
The default value is 12, meaning that Textual purges the redacted PDF pages cache every 12 hours.
Last updated
Was this helpful?
