Managing Tonic performance

Performance considerations and tuning
During Tonic data generation, performance bottlenecks typically come from one of the following sources:
  • Network IO. Specifically, the bandwidth capacity of the network that connects Tonic to the database instances.
  • Disk IO. The disk IO of the databases.
  • Tonic server and workspace configuration. Tonic performs several complex data computations and transformations. Depending on your workspace selections, these tasks can take a long time to perform.
In most cases, slow data generation times are caused by disk IO and network IO.

Network IO

When possible, ensure that Tonic has a fast network pipe between Tonic and each source and destination database.
It is always advisable to install Tonic on or near the hardware that runs your database instances.

Disk IO

This is normally limited by the database hardware.
If you run in a public cloud, you can configure options to access faster disks.
For SQL Server, you can increase your write speeds on your destination database. For details, see SQL Server.

Reducing data loads

To reduce the required disk and network IO, you can copy less data from the source to the destination.
In some cases, you don't need the data from every table, or from specific columns within a table. Or you might be happy with the data that is already in the destination, and so you don't need to copy it again from the source.
Here are some tips to reduce the data load:
  • Put large tables that contain unneeded data into Truncate mode. In Truncate mode, Tonic does not copy any of the table data to the destination database.
    For example, audit or transaction tables might not be needed for typical QA testing.
  • Avoid copying over large columns such as varchar(max), blob, XML, and JSON columns.
    If you do not need the data in a column, then to reduce the required IO, either:
    • If the column is nullable, apply a NULL generator.
    • Apply a Constant generator
  • For subsequent generation runs from the same source database:
    • For large tables that have not changed, use Preserve Destination mode. In Preserve Destination mode, Tonic does not copy the table over, but instead uses the existing data in the destination database.
    • For large tables that have very few changes, use Incremental mode. In Incremental mode, Tonic only copies over the changes that occurred since the previous generation.

Configuring parallel processing

When you believe that the Tonic server is the bottleneck, then to improve performance, you can tune the following settings that control parallel processing.
You apply these settings as environment variables in your tonic_worker container. For more information on setting environment variables, see Setting environment variables.

Variables that are not data connector-specific

The following settings are not limited to specific data connectors:
The number of constraints that a worker can apply in parallel during a job.
The number of threads to devote to performing the data transformations.
Certain Tonic configurations can introduce CPU bottlenecks. This typically occurs when you configure composite generators such as JSON Mask or XML Mask with a large number of paths.
If your workspace has a very high number of generators, or a large number of JSON Mask, XML Mask, Integer Primary Key, or Alphanumeric Primary Key generators, then you should increase this value to at least 2.
The number of tables that Tonic operates on at the same time.
For subsetting, the number of subsetting steps that a worker processes in parallel during a subsetting job. See Enabling parallel processing for subsetting.
If your Tonic server has enough CPU, and your source and target databases are not fully utilized, then we recommend that you to increase this variable to 2.
Depending on your hardware, you can even increase it higher.
The number of threads to devote to writing rows to the output database.

Data connector-specific variables

The following settings apply to specific data connectors:
Setting and default value
Default: 2
Google BigQuery only.
The number of read threads for Google BigQuery.
Default: 1
MySQL and PostgreSQL only.
At the end of the data generation run, the number of indexes to restore concurrently in the destination database.
Default: 1
MySQL only.
The number of tables that a worker can copy in parallel during a job.
Default: 0
Oracle only, and only on Oracle Enterprise Edition databases.
The maximum number of processes of active execution for Data Pump to use.
PostgreSQL only.
The number of paginated result sets that are read from concurrently during a job.
PostgreSQL only.
A comma-separated list of tables to read in parallel by primary key ranges.
MySQL and SQL Server only.
The number of table partitions that are read from concurrently during a job.
PostgreSQL only.
The number of ranges to read in parallel.