Consistency is an option for some generators that when turned on, the same input will map to the same output across an entire database. For example, if consistency is turned on for a name generator, it will always map the same input name (e.g. Albert Einstein) to the same output (e.g. Richard Feynman).
The primary reasons for using consistency are to:
Enable joining on columns that have explicit database constraints in the schema. This is often seen with things like email addresses. With consistency, you can completely anonymize an email address and still use it in a join.
Preserve the cardinality of a column. For example, say you have a city column with 50 different cities and you want to randomize this column but still have 50 cities, you can
Match duplicated data across 1 or more databases. For example, you might have a user database that contains a username in both a column and a JSON blob as well as another database that contains their website activity. To anonymize the username, but still have the username be the same in all locations, use consistency.
Consistency can be enabled by simply toggling the 'Consistency' switch when adding a generator to a column. Note that not all generators support consistency.
The first image shows an address generator being used in a non-consistent fashion. Notice that the city of Atlanta is initially mapped to San Diego, but future occurrences of Atlanta are mapped to different (random) cities such as Grand Junction, Long Beach, and Phoenix.
In the second image we use an address generator in a consistent manner. Now, the city of Atlanta is consistently mapped to San Diego.
A consistent generator ensures that the same input always gives the same output. It does not guarantee that two different inputs will yield different outputs. In other words, consistent generators are not 1:1 mappings.
Any column, regardless of which table it resides in, will be consistent with any other columns using the same consistent generator. However, consistency is not guaranteed between data generation runs (whether on the same database or not) by default. In order to enable cross-db and cross-run consistent generation please see the next section.
In order to ensure consistency across data generations you need to add a system environmental variable to the tonic_worker and tonic_webserver container. The environmental variable to add is
TONIC_STATISTICS_SEED: <ANY 32 BIT SIGNED INTEGER>
The following generators have the option to be made consistent: