TOИIC
Search…
Consistency
Consistency is an option for some generators that when turned on, maps the same input to the same output across an entire database.
Consistency can also be maintained across multiple databases of varying types. For example, if consistency is turned on for a name generator, it always maps the same input name (for example, Albert Einstein) to the same output (for example, Richard Feynman).

Why use consistency?

The primary reasons for using consistency are to:
  • Enable joining on columns that have explicit database constraints in the schema. This is often seen with values such as email addresses. With consistency, you can completely anonymize an email address and still use it in a join.
  • Preserve the approximate cardinality of a column. For example, a city column contains 50 different cities. To randomize this column but still have ~50 cities, you can use consistency to maintain the approximate cardinality. Because consistency does not guarantee uniqueness, the cardinality might change. However, it is guaranteed to not increase. If unique 1-to-1 mappings are required, a Key generator should be used.
  • Match duplicated data across 1 or more databases. For example, you have a user database that contains a username in both a column and a JSON blob, and another database that contains their website activity, identified by the same username values. To anonymize the username, but still have the username be the same in all locations/databases, use consistency.

Enabling consistency

To enable consistency, on the generator configuration panel, toggle the Consistency switch.
Not all generators support consistency.
Address Generator with Consistency Switch
Consistency is a function of the both the data type and the value.
For example, a numeric field contains the value 123. A string/varchar field contains the value "123". Both fields have consistent generators applied. The output will not be consistent between the two fields.

Consistency example

To demonstrate the effect of consistency on the output, we'll use a column that contains a first name, and that uses the Name generator.
Here is the sample input and output when consistency is not enabled:
Sample input and output for a first name field with consistency disabled for the Name generator
In this sample data, the first name Melissa appears twice, but is mapped to Walton the first time and Linn the second time.
Here is the sample input and output when consistency is enabled:
Sample input and output for a first name field with consistency enabled for the Name generator
In this case, the first name Melissa is mapped to Rosella both times.

Consistency considerations

Consistency does not imply uniqueness

A consistent generator ensures that the same input value always produces the same output value.
It does not guarantee that two different input values produce two different output values.
Consistent generators are not 1:1 mappings.

Consistency is across an entire database

Any column, regardless of which table it resides in, is consistent with any other column that uses the same consistent generator.
However, by default, consistency is not guaranteed between data generation runs, even if the run is on the same database.

Enabling consistency across runs or multiple databases

To ensure consistency across data generations, add the following system environmental variable to the tonic_worker and tonic_webserver container:
1
TONIC_STATISTICS_SEED: <ANY 32 BIT SIGNED INTEGER>
Copied!

Generators that can be made self-consistent

The following generators can be made consistent to themselves. This means that the same input value in the column always produces the same output value.

Generators that can be made self-consistent and to other columns

The following generators can be made consistent either to themselves or to other columns.
When a column is consistent to another column, the output value is based on the other column. For example, a column contains a job title. You make the assigned generator consistent with the username column. Every row that has the username User1 in the input database has the same job title in the destination database.