Links

Generator reference

Here are the details for the supported generators in Tonic.
The table for each generator includes:

Address

Generates a random address-like string.
You can indicate which part of an address string that the column contains. For example, the column might contain only the street address or the city, or it might contain the full address.
Consistency
Yes, can be made self-consistent or consistent with another column.
Linking
Yes, can be linked.
Differential privacy
Yes, if consistency is not enabled.
Data-free
Yes, if consistency is not enabled.
Allowed for primary keys
No
Allowed for unique columns
No
Uses format-preserving encryption (FPE)
No
Privacy ranking
  • 1 if not consistent
  • 4 if consistent
Generator ID (for the API)
To configure the generator:
  1. 1.
    From the Link To dropdown list, select the columns to link this column to. You can link columns that use the Address generator to mask one of the following address components:
    • City
    • City State
    • Country
    • Country Code
    • State
    • State Abbreviation
    • Zip Code
    • Latitude
    • Longitude
    Note that when linked to another address column, a country or country code is always the United States.
  2. 2.
    From the address component dropdown list, select the address component that this column contains. The available options are:
    • Building Number
    • Cardinal Direction (North, South, East, West)
    • City
    • City Prefix (Examples: North, South, East, West, Port, New)
    • City Suffix (Examples: land, ville, furt, town)
    • City with State (Example: Spokane, Washington)
    • City with State Abbr (Example: Houston, TX)
    • Country (Examples: Spain, Canada)
    • Country Code (Uses the 2-character country code. Examples: ES, CA)
    • County
    • Direction (Examples: North, Northeast, Southwest, East)
    • Full Address
    • Latitude (Examples: 33.51, 41.32)
    • Longitude (Examples: -84.05, -74.21)
    • Ordinal Direction (Examples: Northeast, Southwest)
    • Secondary Address (Examples: Apt 123, Suite 530)
    • State (Examples: Alabama, Wisconsin)
    • State Abbr (Examples: AL, WI)
    • Street Address (Example: 123 Main Street)
    • Street Name (Examples: Broad, Elm)
    • Street Suffix (Examples: Way, Hill, Drive)
    • US Address
    • US Address with Country
    • Zip Code (Example: 12345)
  3. 3.
    Toggle the Consistency setting to indicate whether to make the column consistent. By default, the consistency is disabled.
  4. 4.
    If consistency is enabled, then by default, the generator is self-consistent. To make the generator consistent with another column, from the Consistent to dropdown list, select the column. When the Address generator is consistent with itself, then the same value in the source database is always mapped to the same destination value. For example, for a column that contains a state name, Alabama is always mapped to Illinois. When the Address generator is consistent with another column, then the same value in the other column always results in the same destination value for the address column. For example, if the address column is consistent with a name column, then every instance of John Smith in the name column in the source database has the same address value in the destination database.
  5. 5.
    If Tonic data encryption is enabled, then to use it for this column, toggle Use data encryption process to the on position.

Spark supported address parts

For the Address generator, Spark workspaces (Amazon EMR, Databricks, and self-managed Spark clusters) only support the following address parts:
  • Building Number
  • City
  • Country
  • Country Code
  • Full Address
  • Latitude
  • Longitude
  • State
  • State Abbr
  • Street Address
  • Street Name
  • Street Suffix
  • US Address
  • US Address with Country
  • Zip Code

AI Synthesizer

Within a table, the AI synthesizer uses the columns that are assigned the AI Synthesizer to train a model and generate the synthetic data.
It uses deep neural networks for high-fidelity data mimicking.
By default, the AI Synthesizer is not available. To enable the AI Synthesizer, in the Tonic web server container, set the environment setting TONIC_NN_GENERATOR_ENABLED to true. Go to Configuring environment settings.
The privacy ranking is 3.
For details, go to Using the AI Synthesizer.

Algebraic

The algebraic generator identifies the algebraic relationship between three or more numeric values and generates new values to match. At least one of the values must be a non-integer.
If a relationship cannot be found, then the generator defaults to the Categorical generator.
This generator can be linked with other Algebraic generators.
Consistency
No, cannot be made consistent.
Linking
Yes, can be linked.
Differential privacy
No
Data-free
No
Allowed for primary keys
No
Allowed for unique columns
No
Uses format-preserving encryption (FPE)
No
Privacy ranking
3
Generator ID (for the API)
To configure the generator, from the Link To dropdown list, select the columns to link this column to. You can select other columns that are assigned the Algebraic generator.
You must select at least three columns.
The column values must be numeric. At least one of the columns must contain a value other than an integer.
If Tonic data encryption is enabled, then to use it for this column, toggle Use data encryption process to the on position.

Alphanumeric String Key

Generates unique alphanumeric strings of the same length as the input. For example, for the origin value ABC123, the output value is a six-character alphanumeric string such as D24N05.
Consistency
Yes, can be made self-consistent.
Linking
No, cannot be linked.
Differential privacy
No
Data-free
No
Allowed for primary keys
Yes
Allowed for unique columns
Yes
Uses format-preserving encryption (FPE)
Yes
Privacy ranking
  • 3 if not consistent
  • 4 if consistent
Generator ID (for the API)
To configure the generator, toggle the Consistency setting to indicate whether to make the generator self-consistent.
By default, the generator is not consistent.
If Tonic data encryption is enabled, then to use it for this column, toggle Use data encryption process to the on position.

Array Character Scramble

A version of the Character Scramble generator that can be used for array values.
This generator replaces letters with random other letters, and numbers with random other numbers. Punctuation and whitespace are preserved.
For example, for the following array value:
["ABC.123", 3, "last week"]
The output might be something like:
["KFR.860", 7, "sdrw mwoc"]
This generator securely masks letters and numbers. There is no way to recover the original data.
Consistency
Yes, can be made self-consistent.
Linking
No, cannot be linked.
Differential privacy
No
Data-free
No
Allowed for primary keys
No
Allowed for unique columns
No
Uses format-preserving encryption (FPE)
No
Privacy ranking
  • 3 if not consistent
  • 4 if consistent
Generator ID (for the API)
To configure the generator, toggle the Consistency setting to indicate whether to make the generator self-consistent.
By default, the generator is not consistent.
If Tonic data encryption is enabled, then to use it for this column, toggle Use data encryption process to the on position.

Array JSON Mask

A version of the JSON Mask generator that can be used for array values.
Runs a selected generator on values that match a user-specified JSONPath.
Consistency
Determined by the specified sub-generators.
Linking
Determined by the specified sub-generators.
Differential privacy
Determined by the specified sub-generators.
Data-free
Determined by the specified sub-generators.
Allowed for primary keys
No
Allowed for unique columns
No
Uses format-preserving encryption (FPE)
No
Privacy ranking
5
Generator ID (for the API)
To configure the generator:
  1. 1.
    To assign a generator to a path expression:
    1. 1.
      Under Sub-generators, click Add Generator. On the sub-generator configuration panel, the Cell JSON field contains a sample value from the source database. You can use the previous and next icons to page through different values.
    2. 2.
      In the Path Expression field, type the JSONPath expression to identify the value to apply the generator to. To populate a path expression, you can also click a value in the Cell JSON field. Matched JSON Values shows the result from the value in Cell JSON.
    3. 3.
      By default, the selected generator is applied to any value that matches the expression. To limit the types of values to apply the generator to, from the Type Filter, specify the applicable types. You can select Any, or you can select any combination of String, Number, and Null.
    4. 4.
      From the Generator Configuration dropdown list, select the generator to apply to the path expression. You cannot select another composite generator.
    5. 5.
      Configure the selected generator. You cannot configure the selected generator to be consistent with another column.
    6. 6.
      To save the configuration and immediately add a generator for another path expression, click Save and Add Another. To save the configuration and close the add generator panel, click Save.
  2. 2.
    From the Sub-Generators list:
    1. 1.
      To edit a generator assignment, click the edit icon.
    2. 2.
      To remove a generator assignment, click the delete icon.
    3. 3.
      To move a generator assignment up or down in the list, click the up or down arrow.
  3. 3.
    If Tonic data encryption is enabled, then to use it for this column, toggle Use data encryption process to the on position.

Array Regex Mask

A version of the Regex Mask generator that can be used for array values.
Uses regular expressions to parse strings and replace specified substrings with the output of specified generators. The parts of the string to replace are specified inside unnamed top-level capture groups.
Consistency
Determined by the selected sub-generators.
Linking
Determined by the selected sub-generators.
Differential privacy
Determined by the selected sub-generators.
Data-free
Determined by the selected sub-generators.
Allowed for primary keys
No
Allowed for unique columns
No
Uses format-preserving encryption (FPE)
No
Privacy ranking
5
Generator ID (for the API)
To configure the generator:
  1. 1.
    To add a regular expression:
    1. 1.
      Click Add Regex. On the configuration panel, Cell Value shows a sample value from the source database. You can use the previous and next options to navigate through the values.
    2. 2.
      By default, Replace all matches is enabled. To only match the first occurrence of a pattern, toggle Replace all matches to the off position.
    3. 3.
      In the Pattern field, enter a regular expression. If the expression is valid, then Tonic displays the capture groups for the expression.
    4. 4.
      For each capture group, to select and configure the generator to apply, click the selected generator. You cannot select another composite generator.
    5. 5.
      To save the configuration and immediately add a generator for another path expression, click Save and Add Another. To save the configuration and close the add generator panel, click Save.
  2. 2.
    From the Regexes list:
    1. 1.
      To edit a regex, click the edit icon.
    2. 2.
      To remove a regex, click the delete icon.

ASCII Key

Generates unique alpha-numeric strings based on any printable ASCII characters. The length of the source string is not preserved. You can choose to exclude lowercase letters from the generated values.
Consistency
Yes, can be made self-consistent.
Linking
No, cannot be linked.
Differential privacy
No
Data-free
No
Allowed for primary keys
Yes
Allowed for unique columns
Yes
Uses format-preserving encryption (FPE)
Yes
Privacy ranking
  • 3 if not consistent
  • 4 if consistent
Generator ID (for the API)
To configure the generator:
  1. 1.
    To exclude lowercase letters from the generated values, toggle Exclude Lowercase Alphabet to the on position.
  2. 2.
    Toggle the Consistency setting to indicate whether to make the generator consistent. By default, the generator is not consistent.
  3. 3.
    If Tonic data encryption is enabled, then to use it for this column, toggle Use data encryption process to the on position.

Business Name

Generates a random company name-like string.
Consistency
Yes, can be made self-consistent or consistent with another column.
Linking
No, cannot be linked.
Differential privacy
Yes, if consistency is not enabled.
Data-free
Yes, if consistency is not enabled.
Allowed for primary keys
No
Allowed for unique columns
No
Uses format-preserving encryption (FPE)
No
Privacy ranking
  • 1 if not consistent
  • 4 if consistent
Generator ID (for the API)
To configure the generator, toggle the Consistency setting to indicate whether to make the generator consistent.
By default, the generator is not consistent.
If consistency is enabled, then by default it is self-consistent. To make the generator consistent with another column, from the Consistent to dropdown list, select the column.
When the generator is consistent with itself, then a given source value is always mapped to the same destination value. For example, My Business is always mapped to New Business.
When the generator is consistent with another column, then a given source value in that other column always results in the same destination value for the company name column. For example, if the company name column is consistent with a name column, then every instance of John Smith in the name column in the source database has the same company name in the destination database.
If Tonic data encryption is enabled, then to use it for this column, toggle Use data encryption process to the on position.

Categorical

The Categorical generator shuffles the existing values within a field while maintaining the overall frequency of the values. It disassociates the values from other pieces of data. Note that NULL is considered a separate value.
For example, a column contains the values Small, Medium, and Large. Small appears 3 times, Medium appears 4 times, and Large appears 5 times. In the output data, each value still appears the same number of times, but the values are shuffled to different rows.
This generator is optimized for categories with fewer than 10,000 unique values. If your underlying data has more unique values (for example, your field is populated by freeform text entry), we recommend that you use the Character Scramble or Custom Categorical generator.
Consistency
No, cannot be made consistent.
Linking
Yes, can be linked.
Differential privacy
Configurable
Data-free
No
Allowed for primary keys
No
Allowed for unique columns
No
Uses format-preserving encryption (FPE)
No
Privacy ranking
  • 2 if differential privacy enabled
  • 3 if differential privacy not enabled
Generator ID (for the API)
To configure the generator:
  1. 1.
    From the Link To dropdown, select the columns to link to the current column. You can select from other columns that use the Categorical generator.
  2. 2.
    Toggle the Differential Privacy setting to indicate whether to make the output data differentially private. By default, differential privacy is disabled.
  3. 3.
    If Tonic data encryption is enabled, then to use it for this column, toggle Use data encryption process to the on position.

Character Scramble

This generator replaces letters with random other letters and numbers with random other numbers. Punctuation, whitespace, and mathematical symbols are preserved.
For example, for the following input string:
ABC.123 123-456-789 Go!
The output would be something like:
PRX.804 296-915-378 Ab!
This generator securely masks letters and numbers. There is no way to recover the original data.
Consistency
Yes, can be made self-consistent
Linking
No, cannot be linked
Differential privacy
No
Data-free
No
Allowed for primary keys
No
Allowed for unique columns
No
Uses format-preserving encryption (FPE)
No
Privacy ranking
  • 3 if not consistent
  • 4 if consistent
Generator ID (for the API)
To configure the generator, toggle the Consistency setting to indicate whether to make the generator self-consistent.
By default, the generator is not consistent.
If Tonic data encryption is enabled, then to use it for this column, toggle Use data encryption process to the on position.

Character Substitution

Performs a random character replacement that preserves formatting (spaces, capitalization, and punctuation).
Characters are replaced with other characters from within the same Unicode Block.
For example, for the following input string:
Miami Store #162
The output would be something like:
Vgkjg Gmlvf #681
Note that for a numeric column, when a generated number starts with a 0, the starting 0 is removed. This could result in matching output values in different columns. For example, one column is changed to 113 and the other to 0113, which also becomes 113.
Consistency
This generator is implicitly self-consistent. You do not specify whether the generator is consistent. Every occurrence of a character always maps to the same substitute character. Because of this, it can be used to preserve a join between two text columns, such as a join on a name or email.
Linking
No, cannot be linked.
Differential privacy
No
Data-free
No
Allowed for primary keys
No
Allowed for unique columns
Yes
Uses format-preserving encryption (FPE)
No
Privacy ranking
4
Generator ID (for the API)
If Tonic data encryption is enabled, then to use it for this column, toggle Use data encryption process to the on position.

Company Name

This generator is deprecated. Use the Business Name generator instead.
Generates a random company name-like string.
Consistency
Yes, can be made self-consistent or consistent with another column.
Linking
No, cannot be linked.
Differential privacy
Yes, if consistency is not enabled.
Data-free
Yes, if consistency is not enabled.
Allowed for primary keys
No
Allowed for unique columns
No
Uses format-preserving encryption (FPE)
No
Privacy ranking
  • 1 if not consistent
  • 4 if consistent
Generator ID (for the API)
To configure the generator, toggle the Consistency setting to indicate whether to make the generator consistent.
By default, the generator is not consistent.
If consistency is enabled, then by default it is self-consistent. To make the generator consistent with another column, from the Consistent to dropdown list, select the column.
When the generator is consistent with itself, then a given source value is always mapped to the same destination value. For example, My Company is always mapped to New Company.
When the generator is consistent with another column, then a given source value in that other column always results in the same destination value for the company name column. For example, if the company name column is consistent with a name column, then every instance of John Smith in the name column in the source database has the same company name in the destination database.
If Tonic data encryption is enabled, then to use it for this column, toggle Use data encryption process to the on position.

Conditional

Applies different generators to the value conditionally based on any value in the table.
For example, a Users table contains Name, Username, and Role columns. For the Username column, you can use a conditional generator to indicate that if the value of Role is something other than Test, then use the Character Scramble generator for the Username value. For Test users, the name is not masked.
Consistency
Determined by the selected generators.
Linking
Determined by the selected generators.
Differential privacy
Determined by the selected generators.
Data-free
Determined by the selected generators.
Allowed for primary keys
No
Allowed for unique columns
Yes
Uses format-preserving encryption (FPE)
No
Privacy ranking
  • If a fallback generator is selected, then the lower of either 5 or the fallback generator.
  • 5 if no fallback generator is selected
Generator ID (for the API)
The generator consists of a list of options. Each option includes the required conditions and the generator to use if those conditions are met.
The generator always contains a Default option. The Default option is used if the value does not meet any of the conditions. To configure the Default option:
  1. 1.
    From the Default dropdown list, select the generator to use by default.
  2. 2.
    Configure the selected generator.
To add a condition option:
  1. 1.
    Click + Conditional Generator.
  2. 2.
    To add a condition:
    1. 1.
      Click + Condition.
    2. 2.
      From the column list, select the column for which to check the value.
    3. 3.
      Select the comparison type.
    4. 4.
      Enter the column value to check for.
    To remove a condition, click the delete icon for the condition.
  3. 3.
    From the Generator dropdown list, select the generator to run on the current column if the conditions are met. You cannot select another composite generator.
  4. 4.
    Choose the configuration options for the selected generator.
To view details for and edit a condition option, click the expand icon for that option.
To remove a condition option, click the delete icon for the option.

Constant

Uses a single value to mask all of the values in the column.
For example, you can replace every value in a string column with the String1. Or you can replace every value in a numeric column with the value 12345.
Consistency
No, cannot be made consistent.
Linking
No, cannot be linked.
Differential privacy
Yes
Data-free
Yes
Allowed for primary keys
No
Allowed for unique columns
No
Uses format-preserving encryption (FPE)
No
Privacy ranking
1
Generator ID (for the API)
To configure the generator, in the Constant Value field, provide the value to use.
The value must be compatible with the field type. For example, you cannot provide a string value for an integer column.
If Tonic data encryption is enabled, then to use it for this column, toggle Use data encryption process to the on position.

Continuous

Generates a continuous distribution to fit the underlying data.
This generator can be linked to other Continuous generators to create multivariate distributions and can be partitioned by other columns.
Consistency
No, cannot be made consistent.
Linking
Yes, can be linked.
Differential privacy
Configurable
Data-free
No
Allowed for primary keys
No
Allowed for unique columns
No
Uses format-preserving encryption (FPE)
No
Privacy ranking
  • 2 if differential privacy enabled
  • 3 if differential privacy not enabled
Generator ID (for the API)
To configure the generator:
  1. 1.
    From the Link To drop-down list, select the other Continuous generator columns to link to. The linking creates a multivariate distribution.
  2. 2.
    From the Partition By drop-down list, select one or more columns to use to partition the data. The selected columns must have the generator set to either Passthrough or Categorical. For more information about partitioning and how it works, go to Partitioning a column.
  3. 3.
    Toggle the Differential Privacy setting to indicate whether to make the output data differentially private. By default, the generator is not differentially private.
  4. 4.
    If Tonic data encryption is enabled, then to use it for this column, toggle Use data encryption process to the on position.

Cross Table Sum

Links columns in two tables. This column value is the sum of the values in a column in another table.
This generator does not provide a preview. The sums are not computed until the other table is generated.
For example, a Customers table contains a Total_Sales column. The Transactions table uses a foreign key Customer_ID column to identify the customer who made the transaction, and an Amount column that contains the amount of the sale. The Customer_ID value in the Transactions table is a value from the ID primary key column in the Customers table.
You assign the Cross Table Sum generator to the Total_Sales column. In the generator configuration, you indicate that the value is the sum of the Amount values for the Customer_ID value that matches the primary key ID value for the current row.
For the Customers row for ID 123, the Total_Sales column contains the sum of the Amount column for Transactions rows where Customer_ID is 123.
Consistency
No, cannot be made consistent.
Linking
No, cannot be linked.
Differential privacy
No
Data-free
No
Allowed for primary keys
No
Allowed for unique columns
No
Uses format-preserving encryption (FPE)
No
Privacy ranking
3
Generator ID (for the API)
To configure the generator:
  1. 1.
    From the Foreign Table dropdown list, select the table that contains the column for which to sum the values.
  2. 2.
    From the Foreign Key dropdown list, select the foreign key. The foreign key identifies the row from the current table that is referred to in the foreign table.
  3. 3.
    From the Sum Over dropdown list, select the column for which to sum the values.
  4. 4.
    From the Primary Key dropdown list, select the primary key for the current table.
  5. 5.
    If Tonic data encryption is enabled, then to use it for this column, toggle Use data encryption process to the on position.

CSV Mask

Masks text columns by parsing the values as rows whose columns are delimited by a specified character.
You can assign specific generators to specific indexes. You can also use the generator that is assigned to a specific index as the default. This applies the generator to every index that does not have an assigned generator.
The output value maintains the quotes around the index values.
For example, a column contains the following value:
"first","second","third"
You assign the Character Scramble generator to index 0 and assign Passthrough to index 2. You select index 0 as the index to use for the default generator.
In the output, the first and second values are masked by the Character Scramble generator. The third value is not masked. The output looks something like:
"wmcop", "xjorsl", "third"
Consistency
Determined by the selected sub-generators.
Linking
Determined by the selected sub-generators.
Differential privacy
Determined by the selected sub-generators.
Data-free
Determined by the selected sub-generators.
Allowed for primary keys
No
Allowed for unique columns
No
Uses format-preserving encryption (FPE)
No
Privacy ranking
5
Generator ID (for the API)
To configure the generator:
  1. 1.
    In the Delimiter field, type the delimiter that is used as a separator for the value. For example, for the value "first","second","third", the delimiter is a comma.
  2. 2.
    You can configure a generator for any or all of the indexes. To add a sub-generator for an index:
    1. 1.
      Under Sub-Generators, click Add Generator. On the add generator dialog, the Cell CSV field contains a sample value from the source data. You can use the navigation icons to page through the values.
    2. 2.
      In the CSV Index field, type the index to assign a generator to. The index numbers start with 0. You cannot use an index that already has an assigned generator. Matched CSV values shows the value at that index for the current sample column value.
    3. 3.
      Under Generator Configuration, from the Select a Generator dropdown list, select the generator to use for the selected index. You cannot select another composite generator. To remove the selection, click the delete icon.
    4. 4.
      Configure the selected generator. You cannot configure the selected generator to be consistent with another column.
    5. 5.
      To save the configuration and immediately add a generator for another index, click Save and Add Another. To save the configuration and close the add generator panel, click Save.
  3. 3.
    From the Sub-Generators list:
    1. 1.
      To edit a generator assignment, click the edit icon.
    2. 2.
      To remove a generator assignment, click the delete icon.
    3. 3.
      To move a generator assignment up or down in the list, click the up or down arrow.
  4. 4.
    After you configure a generator for at least one index, the Default Link dropdown list is displayed. From the Default Link dropdown list, select the index to use to determine how to mask values for indexes that do not have an assigned generator. For example, you assign the Character Scramble generator to index 2. If you set Default Link to 2, then all indexes that do not have an assigned generator use the Character Scramble generator.

Custom Categorical

A version of the Categorical generator that selects from values that you provide instead of shuffling the original values.
Consistency
Yes, can be made self-consistent or consistent with another column.
Linking
Yes, can be linked.
Differential privacy
Yes, if consistency is not enabled.
Data-free
Yes, if consistency is not enabled.
Allowed for primary keys
No
Allowed for unique columns
No