Performance

Performance considerations and tuning

Considerations

There are typically three possible bottlenecks in the performance of data generation job in Tonic. They are:

1) Network IO. Specifically, the bandwidth capacity of the network which connects Tonic to the database instances.

2) Disk IO. The disk IO of the databases.

3) Tonic Server and workspace configuration. Tonic performs a variety of complex data computations and transformations, which depending on your workspace selections, can take a while to perform.

However, it is often the case that disk IO and network IO are more often the culprit for slow data generation times.

Network IO

When possible, ensure that Tonic has a fast network pipe between the two database instances. It is always advisable to install Tonic on or nearby the hardware running your database instances.

Disk IO

This is normally limited by the database hardware itself. If you are running in a public cloud you do have options you can configure for accessing faster disks. Also, go to our SQL Server page to read about things you can do specifically for SQL Server to increase your write speeds on your destination database.

Reducing data loads

To reduce the amount of necessary disk and network IO you can also copy less data from the source to the destination. In some cases, you don't need the data in each table (or in specific columns within the table) and other times you might be happy with the data already existing in the destination and don't need to copy it again from the source. Read below for a few tips:

1) Put large tables into Truncate mode which tells Tonic to not transfer data. This can be especially useful on audit or transaction tables which are sometimes not needed for typical QA testing.

2) Avoid copying over large columns such as varchar(max), blob, XML, and JSON columns. If the data is not needed for your purpose then apply a NULL generator (when column is nullable) or a Constant generator to these columns to reduce the amount of IO necessary.

3) Put large tables in Preserve mode. In Preserve mode a table is not copied from the source because it already exists on the destination. If you need to re-do a run using the same source data but a large table has not changed then put it into Preserved mode and Tonic will not copy it over and will instead use the already existing data in the destination DB.

Tonic Server

Certain Tonic configurations, however, can introduce CPU bottlenecks. This typically occurs when using specific Tonic generators such as the JSON or XML Mask generators with a large number of paths selected. When you believe the Tonic server itself is the bottleneck then there are several settings that can be tuned in order to improve your performance. These settings should all be applied as environmental variables in your tonic_worker container.

TONIC_TABLE_PARALLELISM (defaults to 1)

This variable controls how many tables Tonic operates on at once. If your Tonic server has enough CPU and your source and target databases are not fully utilized then we advise increasing this variable to 2, or possibly higher depending on your hardware.

Note: This variable must be kept at 1 if your source database is being modified by other database connections during the data generation job. If you increase it to 2 or higher in that situation then your output database will likely not maintain referential integrity.

TONIC_WRITE_PARALLELISM (defaults to 2)

The number of threads to devote to writing rows to the output database.

TONIC_PROCESS_PARALLELISM (defaults to 1)

The number of threads to devote to performing the data transformations necessary. If you have a workspace with a very high number of generators or a large number of JSON Mask, XML Mask, Integery Primary Key, or Alphanumeric Primary Key generators then you should likely increase this value to at least 2.

TONIC_INDEX_RESTORATION_PARALLELISM (defaults to 1)

The number of indexes to restore concurrently in the destination database at the end of the data generation run. Note that this variable only affects MySql databases.