# Configuring processing and parallelism

The following environment variables control job and file processing.

## Configuring the number of jobs to run concurrently <a href="#config-concurrent-jobs" id="config-concurrent-jobs"></a>

The number of jobs that can run concurrently can affect the [number of Textual workers](https://docs.tonic.ai/textual/textual-install-administer/configuring-textual/general-instance-and-processing-settings/textual-config-ml-worker-count) that you need. The more jobs that can run concurrently, the fewer workers that are needed.

Textual provides a set of environment variables to control the number of jobs that each Textual worker can run at the same time.

* A global setting to control the total number of jobs across all job types.
* Individual settings to control the number of concurrent jobs for specific job types.

### Configuring the global limit for concurrent jobs <a href="#concurrent-jobs-global-limit" id="concurrent-jobs-global-limit"></a>

The environment variable `SOLAR_MAX_CONCURRENT_WORKER_JOBS` controls the number of jobs that can run concurrently across all of the job types.

The default value is 16.

### Configuring the limits for specific job types <a href="#concurrent-jobs-limit-by-type" id="concurrent-jobs-limit-by-type"></a>

The following environment variables control the number of jobs that can run concurrently for specific types of jobs.

* `SOLAR_MAX_CONCURRENT_JOBS_DEIDENTIFY_FILE` \
  \
  Default value: 8\
  \
  Includes the following actions:
  * Upload a file to a dataset
  * Rescan a dataset file that fails to process
  * Rescan a dataset file after a custom entity type is created or edited
  * Rescan a dataset file after an update to the dataset configuration
  * Textual uploads a cloud storage file for processing
* `SOLAR_MAX_CONCURRENT_JOBS_DEIDENTIFY_UNATTACHED_FILE` \
  \
  Default value: 8\
  \
  Includes the following actions:
  * Upload a file to the **Home** page preview tool
  * In the SDK, make a call to `redact.start_file_redaction`
* `SOLAR_MAX_CONCURRENT_JOBS_PARSE_FILES` \
  \
  Default value: 8\
  \
  Includes the following actions:
  * Run parsing jobs
* `SOLAR_MAX_CONCURRENT_JOBS_AUDIO_TRANSCRIPTION` \
  \
  Default value: 8\
  \
  Includes the following actions:
  * Transcribe and redact an audio file
* `SOLAR_MAX_CONCURRENT_JOBS_PROCESS_EXTERNAL_FILES` \
  \
  Default value: 8\
  \
  Includes the following actions:
  * Synchronize cloud storage files for a dataset
* `SOLAR_MAX_CONCURRENT_JOBS_GENERATE_EXTERNAL_FILES` \
  \
  Default value: 8\
  \
  Includes the following actions:
  * Generate output files for a cloud storage dataset
* `SOLAR_MAX_CONCURRENT_JOBS_MANUAL_REDACTION_PROCESS_FILE`\
  \
  Default value: 8\
  \
  Includes the following actions:
  * Scan a file in a guided redaction project
* `SOLAR_MAX_CONCURRENT_JOBS_FILE_ANNOTATION` \
  \
  Default value: 5\
  \
  Includes the following actions:
  * For a model-based custom entity type, identify the entity type values in the test data.
* `SOLAR_MAX_CONCURRENT_JOBS_GENERATE_NEW_GUIDELINES` \
  \
  Default value: 2\
  \
  Includes the following actions:
  * For a model-based custom entity type, generate recommendations to improve the guidelines.
* `SOLAR_MAX_CONCURRENT_JOBS_TRAINING_FILE_ANALYSIS` \
  \
  Default value: 4\
  \
  Includes the following actions:
  * For a model-based custom entity type, analyze the training files.
* `SOLAR_MAX_CONCURRENT_JOBS_TRAINING_FILE_ANNOTATION` \
  \
  Default value: 4\
  \
  Includes the following actions:
  * For a model-based custom entity type, use the selected guidelines version to identify the entity values in the training data.

## Configuring the size of the datetime generator cache <a href="#config-datetime-cache" id="config-datetime-cache"></a>

When it generates datetime values, to optimize the processing, Textual stores the redacted datetime values in a cache.

To change the cache size, configure the environment variable `SOLAR_DATETIME_GENERATOR_CACHE_CAPACITY`.

The default value is 100000, meaning that the cache contains 100,000 values.

Note that while increasing the size of the cache can speed up processing, it also uses more RAM.

## Configuring the number of PDF pages to redact simultaneously <a href="#config-pdf-page-parallelism" id="config-pdf-page-parallelism"></a>

When Textual redacts PDF files so that a user can preview or download the output, the following environment variable determines the number of pages that it processes simultaneously:

`SOLAR_PDF_PAGE_REDACTION_PARALLELISM`

The default value is 4, meaning that Textual processes 4 pages at a time.

## Configuring the number of PDF files to plan simultaneously <a href="#config-pdf-plan-parallelism" id="config-pdf-plan-parallelism"></a>

When Textual plans the redaction of PDF files for a user to preview or download, the following environment variable determines the number of files that it plans simultaneously.

`SOLAR_PDF_DOC_PLAN_PARALLELISM`

The default value is 3, meaning that Textual plans 3 PDF files at a time.

## Configuring how often to purge cached PDF pages <a href="#config-pdf-page-cache-purge" id="config-pdf-page-cache-purge"></a>

When it redacts PDF files, Textual stores the redacted PDF pages in a cache.

The following environment variable determines how often Textual purges the cache of PDF pages.

`PURGE_REDACTED_PAGES_IN_HOURS`

The default value is 12, meaning that Textual purges the redacted PDF pages cache every 12 hours.
