Generator reference

Here are the details for the supported generators in Tonic Structural.

The table for each generator includes:

Address

Generates a random address-like string.

You can indicate which part of an address string that the column contains. For example, the column might contain only the street address or the city, or it might contain the full address.

Consistency

Yes, can be made self-consistent or consistent with another column.

Linking

Yes, can be linked.

Differential privacy

Yes, if consistency is not enabled.

Data-free

Yes, if consistency is not enabled.

Allowed for primary keys

No

Allowed for unique columns

No

Uses format-preserving encryption (FPE)

No

Privacy ranking

  • 1 if not consistent

  • 4 if consistent

Generator ID (for the API)

To configure the generator:

  1. From the Link To dropdown list, select the columns to link this column to. You can link columns that use the Address generator to mask one of the following address components:

    • City

    • City State

    • Country

    • Country Code

    • State

    • State Abbreviation

    • Zip Code

    • Latitude

    • Longitude

    Note that when linked to another address column, a country or country code is always the United States.

  2. From the address component dropdown list, select the address component that this column contains. The available options are:

    • Building Number

    • Cardinal Direction (North, South, East, West)

    • City

    • City Prefix (Examples: North, South, East, West, Port, New)

    • City Suffix (Examples: land, ville, furt, town)

    • City with State (Example: Spokane, Washington)

    • City with State Abbr (Example: Houston, TX)

    • Country (Examples: Spain, Canada)

    • Country Code (Uses the 2-character country code. Examples: ES, CA)

    • County

    • Direction (Examples: North, Northeast, Southwest, East)

    • Full Address

    • Latitude (Examples: 33.51, 41.32)

    • Longitude (Examples: -84.05, -74.21)

    • Ordinal Direction (Examples: Northeast, Southwest)

    • Secondary Address (Examples: Apt 123, Suite 530)

    • State (Examples: Alabama, Wisconsin)

    • State Abbr (Examples: AL, WI)

    • Street Address (Example: 123 Main Street)

    • Street Name (Examples: Broad, Elm)

    • Street Suffix (Examples: Way, Hill, Drive)

    • US Address

    • US Address with Country

    • Zip Code (Example: 12345)

  3. Toggle the Consistency setting to indicate whether to make the column consistent. By default, the consistency is disabled.

  4. If consistency is enabled, then by default, the generator is self-consistent. To make the generator consistent with another column, from the Consistent to dropdown list, select the column. When the Address generator is consistent with itself, then the same value in the source database is always mapped to the same destination value. For example, for a column that contains a state name, Alabama is always mapped to Illinois. When the Address generator is consistent with another column, then the same value in the other column always results in the same destination value for the address column. For example, if the address column is consistent with a name column, then every instance of John Smith in the name column in the source database has the same address value in the destination database.

  5. If Structural data encryption is enabled, then to use it for this column, toggle Use data encryption process to the on position.

Spark supported address parts

For the Address generator, Spark workspaces (Amazon EMR, Databricks, and self-managed Spark clusters) only support the following address parts:

  • Building Number

  • City

  • Country

  • Country Code

  • Full Address

  • Latitude

  • Longitude

  • State

  • State Abbr

  • Street Address

  • Street Name

  • Street Suffix

  • US Address

  • US Address with Country

  • Zip Code

AI Synthesizer

Within a table, the AI synthesizer uses the columns that are assigned the AI Synthesizer to train a model and generate the synthetic data.

It uses deep neural networks for high-fidelity data mimicking.

By default, the AI Synthesizer is not available. To enable the AI Synthesizer, in the Structural web server container, set the environment setting TONIC_NN_GENERATOR_ENABLED to true. Go to Configuring environment settings.

The privacy ranking is 3.

For details, go to Using the AI Synthesizer.

Algebraic

The algebraic generator identifies the algebraic relationship between three or more numeric values and generates new values to match. At least one of the values must be a non-integer.

If a relationship cannot be found, then the generator defaults to the Categorical generator.

This generator can be linked with other Algebraic generators.

Consistency

No, cannot be made consistent.

Linking

Yes, can be linked.

Differential privacy

No

Data-free

No

Allowed for primary keys

No

Allowed for unique columns

No

Uses format-preserving encryption (FPE)

No

Privacy ranking

3

Generator ID (for the API)

To configure the generator, from the Link To dropdown list, select the columns to link this column to. You can select other columns that are assigned the Algebraic generator.

You must select at least three columns.

The column values must be numeric. At least one of the columns must contain a value other than an integer.

If Structural data encryption is enabled, then to use it for this column, toggle Use data encryption process to the on position.

Alphanumeric String Key

Generates unique alphanumeric strings of the same length as the input. For example, for the origin value ABC123, the output value is a six-character alphanumeric string such as D24N05.

Consistency

Yes, can be made self-consistent.

Linking

No, cannot be linked.

Differential privacy

No

Data-free

No

Allowed for primary keys

Yes

Allowed for unique columns

Yes

Uses format-preserving encryption (FPE)

Yes

Privacy ranking

  • 3 if not consistent

  • 4 if consistent

Generator ID (for the API)

To configure the generator, toggle the Consistency setting to indicate whether to make the generator self-consistent.

By default, the generator is not consistent.

If Structural data encryption is enabled, then to use it for this column, toggle Use data encryption process to the on position.

Array Character Scramble

A version of the Character Scramble generator that can be used for array values.

This generator replaces letters with random other letters, and numbers with random other numbers. Punctuation and whitespace are preserved.

For example, for the following array value:

["ABC.123", 3, "last week"]

The output might be something like:

["KFR.860", 7, "sdrw mwoc"]

This generator securely masks letters and numbers. There is no way to recover the original data.

Consistency

Yes, can be made self-consistent.

Linking

No, cannot be linked.

Differential privacy

No

Data-free

No

Allowed for primary keys

No

Allowed for unique columns

No

Uses format-preserving encryption (FPE)

No

Privacy ranking

  • 3 if not consistent

  • 4 if consistent

Generator ID (for the API)

To configure the generator, toggle the Consistency setting to indicate whether to make the generator self-consistent.

By default, the generator is not consistent.

If Structural data encryption is enabled, then to use it for this column, toggle Use data encryption process to the on position.

Array JSON Mask

This is a composite generator.

A version of the JSON Mask generator that can be used for array values.

Runs a selected generator on values that match a user-specified JSONPath.

Consistency

Determined by the specified sub-generators.

Linking

Determined by the specified sub-generators.

Differential privacy

Determined by the specified sub-generators.

Data-free

Determined by the specified sub-generators.

Allowed for primary keys

No

Allowed for unique columns

No

Uses format-preserving encryption (FPE)

No

Privacy ranking

5

Generator ID (for the API)

To configure the generator:

  1. To assign a generator to a path expression:

    1. Under Sub-generators, click Add Generator. On the sub-generator configuration panel, the Cell JSON field contains a sample value from the source database. You can use the previous and next icons to page through different values.

    2. In the Path Expression field, type the JSONPath expression to identify the value to apply the generator to. To populate a path expression, you can also click a value in the Cell JSON field. Matched JSON Values shows the result from the value in Cell JSON.

    3. By default, the selected generator is applied to any value that matches the expression. To limit the types of values to apply the generator to, from the Type Filter, specify the applicable types. You can select Any, or you can select any combination of String, Number, and Null.

    4. From the Generator Configuration dropdown list, select the generator to apply to the path expression. You cannot select another composite generator.

    5. Configure the selected generator. You cannot configure the selected generator to be consistent with another column.

    6. To save the configuration and immediately add a generator for another path expression, click Save and Add Another. To save the configuration and close the add generator panel, click Save.

  2. From the Sub-Generators list:

    1. To edit a generator assignment, click the edit icon.

    2. To remove a generator assignment, click the delete icon.

    3. To move a generator assignment up or down in the list, click the up or down arrow.

  3. If Structural data encryption is enabled, then to use it for this column, toggle Use data encryption process to the on position.

Array Regex Mask

This is a composite generator.

A version of the Regex Mask generator that can be used for array values.

Uses regular expressions to parse strings and replace specified substrings with the output of specified generators. The parts of the string to replace are specified inside unnamed top-level capture groups.

Consistency

Determined by the selected sub-generators.

Linking

Determined by the selected sub-generators.

Differential privacy

Determined by the selected sub-generators.

Data-free

Determined by the selected sub-generators.

Allowed for primary keys

No

Allowed for unique columns

No

Uses format-preserving encryption (FPE)

No

Privacy ranking

5

Generator ID (for the API)

To configure the generator:

  1. To add a regular expression:

    1. Click Add Regex. On the configuration panel, Cell Value shows a sample value from the source database. You can use the previous and next options to navigate through the values.

    2. By default, Replace all matches is enabled. To only match the first occurrence of a pattern, toggle Replace all matches to the off position.

    3. In the Pattern field, enter a regular expression. If the expression is valid, then Structural displays the capture groups for the expression.

    4. For each capture group, to select and configure the generator to apply, click the selected generator. You cannot select another composite generator.

    5. To save the configuration and immediately add a generator for another path expression, click Save and Add Another. To save the configuration and close the add generator panel, click Save.

  2. From the Regexes list:

    1. To edit a regex, click the edit icon.

    2. To remove a regex, click the delete icon.

ASCII Key

Generates unique alpha-numeric strings based on any printable ASCII characters. The length of the source string is not preserved. You can choose to exclude lowercase letters from the generated values.

Consistency

Yes, can be made self-consistent.

Linking

No, cannot be linked.

Differential privacy

No

Data-free

No

Allowed for primary keys

Yes

Allowed for unique columns

Yes

Uses format-preserving encryption (FPE)

Yes

Privacy ranking

  • 3 if not consistent

  • 4 if consistent

Generator ID (for the API)

To configure the generator:

  1. To exclude lowercase letters from the generated values, toggle Exclude Lowercase Alphabet to the on position.

  2. Toggle the Consistency setting to indicate whether to make the generator consistent. By default, the generator is not consistent.

  3. If Structural data encryption is enabled, then to use it for this column, toggle Use data encryption process to the on position.

Business Name

Generates a random company name-like string.

Consistency

Yes, can be made self-consistent or consistent with another column.

Linking

No, cannot be linked.

Differential privacy

Yes, if consistency is not enabled.

Data-free

Yes, if consistency is not enabled.

Allowed for primary keys

No

Allowed for unique columns

No

Uses format-preserving encryption (FPE)

No

Privacy ranking

  • 1 if not consistent

  • 4 if consistent

Generator ID (for the API)

To configure the generator, toggle the Consistency setting to indicate whether to make the generator consistent.

By default, the generator is not consistent.

If consistency is enabled, then by default it is self-consistent. To make the generator consistent with another column, from the Consistent to dropdown list, select the column.

When the generator is consistent with itself, then a given source value is always mapped to the same destination value. For example, My Business is always mapped to New Business.

When the generator is consistent with another column, then a given source value in that other column always results in the same destination value for the company name column. For example, if the company name column is consistent with a name column, then every instance of John Smith in the name column in the source database has the same company name in the destination database.

If Structural data encryption is enabled, then to use it for this column, toggle Use data encryption process to the on position.

Categorical

The Categorical generator shuffles the existing values within a field while maintaining the overall frequency of the values. It disassociates the values from other pieces of data. Note that NULL is considered a separate value.

For example, a column contains the values Small, Medium, and Large. Small appears 3 times, Medium appears 4 times, and Large appears 5 times. In the output data, each value still appears the same number of times, but the values are shuffled to different rows.

This generator is optimized for categories with fewer than 10,000 unique values. If your underlying data has more unique values (for example, your field is populated by freeform text entry), we recommend that you use the Character Scramble or Custom Categorical generator.

Consistency

No, cannot be made consistent.

Linking

Yes, can be linked.

Differential privacy

Configurable

Data-free

No

Allowed for primary keys

No

Allowed for unique columns

No

Uses format-preserving encryption (FPE)

No

Privacy ranking

  • 2 if differential privacy enabled

  • 3 if differential privacy not enabled

Generator ID (for the API)

To configure the generator:

  1. From the Link To dropdown, select the columns to link to the current column. You can select from other columns that use the Categorical generator.

  2. Toggle the Differential Privacy setting to indicate whether to make the output data differentially private. By default, differential privacy is disabled.

  3. If Structural data encryption is enabled, then to use it for this column, toggle Use data encryption process to the on position.

Character Scramble

This generator replaces letters with random other letters and numbers with random other numbers. Punctuation, whitespace, and mathematical symbols are preserved.

For example, for the following input string:

ABC.123 123-456-789 Go!

The output would be something like:

PRX.804 296-915-378 Ab!

This generator securely masks letters and numbers. There is no way to recover the original data.

Character Scramble is similar to Character Substitution, with a couple of key differences. While you can enable consistency for the entire value, Character Scramble does not always replace the same source character with the same destination character. Because there is no guarantee of unique values, you cannot use Character Scramble on unique columns. Character Substitution, however, does always map the same source character to the same destination character. Character Substitution is always consistent, which makes it less secure than Character Scramble. You can use Character Substitution on unique columns.

Consistency

Yes, can be made self-consistent

Linking

No, cannot be linked

Differential privacy

No

Data-free

No

Allowed for primary keys

No

Allowed for unique columns

No

Uses format-preserving encryption (FPE)

No

Privacy ranking

  • 3 if not consistent

  • 4 if consistent

Generator ID (for the API)

To configure the generator, toggle the Consistency setting to indicate whether to make the generator self-consistent.

By default, the generator is not consistent.

If Structural data encryption is enabled, then to use it for this column, toggle Use data encryption process to the on position.

Character Substitution

Performs a random character replacement that preserves formatting (spaces, capitalization, and punctuation).

Characters are replaced with other characters from within the same Unicode Block. A given source character is always mapped to the same destination character. For example, M might always map to V.

For example, for the following input string:

Miami Store #162

The output would be something like:

Vgkjg Gmlvf #681

Note that for a numeric column, when a generated number starts with a 0, the starting 0 is removed. This could result in matching output values in different columns. For example, one column is changed to 113 and the other to 0113, which also becomes 113.

Character Substitution is similar to Character Scramble, with a couple of key differences. Because Character Substitution always maps the same source character to the same destination character, it is always consistent. It also can be used for unique columns. In Character Scramble, the character mapping is random, which makes Character Scramble slightly more secure. However, Character Scramble cannot be used for unique columns.

Consistency

This generator is implicitly self-consistent. You do not specify whether the generator is consistent. Every occurrence of a character always maps to the same substitute character. Because of this, it can be used to preserve a join between two text columns, such as a join on a name or email.

Linking

No, cannot be linked.

Differential privacy

No

Data-free

No

Allowed for primary keys

No

Allowed for unique columns

Yes

Uses format-preserving encryption (FPE)

No

Privacy ranking

4

Generator ID (for the API)

If Structural data encryption is enabled, then to use it for this column, toggle Use data encryption process to the on position.

Company Name

This generator is deprecated. Use the Business Name generator instead.

Generates a random company name-like string.

Consistency

Yes, can be made self-consistent or consistent with another column.

Linking

No, cannot be linked.

Differential privacy

Yes, if consistency is not enabled.

Data-free

Yes, if consistency is not enabled.

Allowed for primary keys

No

Allowed for unique columns

No

Uses format-preserving encryption (FPE)

No

Privacy ranking

  • 1 if not consistent

  • 4 if consistent

Generator ID (for the API)

To configure the generator, toggle the Consistency setting to indicate whether to make the generator consistent.

By default, the generator is not consistent.

If consistency is enabled, then by default it is self-consistent. To make the generator consistent with another column, from the Consistent to dropdown list, select the column.

When the generator is consistent with itself, then a given source value is always mapped to the same destination value. For example, My Company is always mapped to New Company.

When the generator is consistent with another column, then a given source value in that other column always results in the same destination value for the company name column. For example, if the company name column is consistent with a name column, then every instance of John Smith in the name column in the source database has the same company name in the destination database.

If Structural data encryption is enabled, then to use it for this column, toggle Use data encryption process to the on position.

Conditional

This is a composite generator.

Applies different generators to the value conditionally based on any value in the table.

For example, a Users table contains Name, Username, and Role columns. For the Username column, you can use a conditional generator to indicate that if the value of Role is something other than Test, then use the Character Scramble generator for the Username value. For Test users, the name is not masked.

Consistency

Determined by the selected generators.

Linking

Determined by the selected generators.

Differential privacy

Determined by the selected generators.

Data-free

Determined by the selected generators.

Allowed for primary keys

No

Allowed for unique columns

Yes

Uses format-preserving encryption (FPE)

No

Privacy ranking

  • If a fallback generator is selected, then the lower of either 5 or the fallback generator.

  • 5 if no fallback generator is selected

Generator ID (for the API)

The generator consists of a list of options. Each option includes the required conditions and the generator to use if those conditions are met.

The generator always contains a Default option. The Default option is used if the value does not meet any of the conditions. To configure the Default option:

  1. From the Default dropdown list, select the generator to use by default.

  2. Configure the selected generator.

To add a condition option:

  1. Click + Conditional Generator.

  2. To add a condition:

    1. Click + Condition.

    2. From the column list, select the column for which to check the value.

    3. Select the comparison type.

    4. Enter the column value to check for.

    To remove a condition, click the delete icon for the condition.

  3. From the Generator dropdown list, select the generator to run on the current column if the conditions are met. You cannot select another composite generator.

  4. Choose the configuration options for the selected generator.

To view details for and edit a condition option, click the expand icon for that option.

To remove a condition option, click the delete icon for the option.

Constant

Uses a single value to mask all of the values in the column.

For example, you can replace every value in a string column with the String1. Or you can replace every value in a numeric column with the value 12345.

Consistency

No, cannot be made consistent.

Linking

No, cannot be linked.

Differential privacy

Yes

Data-free

Yes

Allowed for primary keys

No

Allowed for unique columns

No

Uses format-preserving encryption (FPE)

No

Privacy ranking

1

Generator ID (for the API)

To configure the generator, in the Constant Value field, provide the value to use.

The value must be compatible with the field type. For example, you cannot provide a string value for an integer column.

If Structural data encryption is enabled, then to use it for this column, toggle Use data encryption process to the on position.

Continuous

Generates a continuous distribution to fit the underlying data.

This generator can be linked to other Continuous generators to create multivariate distributions and can be partitioned by other columns.

Consistency

No, cannot be made consistent.

Linking

Yes, can be linked.

Differential privacy

Configurable

Data-free

No

Allowed for primary keys

No

Allowed for unique columns

No

Uses format-preserving encryption (FPE)

No

Privacy ranking

  • 2 if differential privacy enabled

  • 3 if differential privacy not enabled

Generator ID (for the API)

To configure the generator:

  1. From the Link To drop-down list, select the other Continuous generator columns to link to. The linking creates a multivariate distribution.

  2. From the Partition By drop-down list, select one or more columns to use to partition the data. The selected columns must have the generator set to either Passthrough or Categorical. For more information about partitioning and how it works, go to Partitioning a column.

  3. Toggle the Differential Privacy setting to indicate whether to make the output data differentially private. By default, the generator is not differentially private.

  4. If Structural data encryption is enabled, then to use it for this column, toggle Use data encryption process to the on position.

Cross Table Sum

Links columns in two tables. This column value is the sum of the values in a column in another table.

This generator does not provide a preview. The sums are not computed until the other table is generated.

For example, a Customers table contains a Total_Sales column. The Transactions table uses a foreign key Customer_ID column to identify the customer who made the transaction, and an Amount column that contains the amount of the sale. The Customer_ID value in the Transactions table is a value from the ID primary key column in the Customers table.

You assign the Cross Table Sum generator to the Total_Sales column. In the generator configuration, you indicate that the value is the sum of the Amount values for the Customer_ID value that matches the primary key ID value for the current row.

For the Customers row for ID 123, the Total_Sales column contains the sum of the Amount column for Transactions rows where Customer_ID is 123.

Consistency

No, cannot be made consistent.

Linking

No, cannot be linked.

Differential privacy

No

Data-free

No

Allowed for primary keys

No

Allowed for unique columns

No

Uses format-preserving encryption (FPE)

No

Privacy ranking

3

Generator ID (for the API)

To configure the generator:

  1. From the Foreign Table dropdown list, select the table that contains the column for which to sum the values.

  2. From the Foreign Key dropdown list, select the foreign key. The foreign key identifies the row from the current table that is referred to in the foreign table.

  3. From the Sum Over dropdown list, select the column for which to sum the values.

  4. From the Primary Key dropdown list, select the primary key for the current table.

  5. If Structural data encryption is enabled, then to use it for this column, toggle Use data encryption process to the on position.

CSV Mask

This is a composite generator.

Masks text columns by parsing the values as rows whose columns are delimited by a specified character.

You can assign specific generators to specific indexes. You can also use the generator that is assigned to a specific index as the default. This applies the generator to every index that does not have an assigned generator.

The output value maintains the quotes around the index values.

For example, a column contains the following value:

"first","second","third"

You assign the Character Scramble generator to index 0 and assign Passthrough to index 2. You select index 0 as the index to use for the default generator.

In the output, the first and second values are masked by the Character Scramble generator. The third value is not masked. The output looks something like:

"wmcop", "xjorsl", "third"

Consistency

Determined by the selected sub-generators.

Linking

Determined by the selected sub-generators.

Differential privacy

Determined by the selected sub-generators.

Data-free

Determined by the selected sub-generators.

Allowed for primary keys

No

Allowed for unique columns

No

Uses format-preserving encryption (FPE)

No

Privacy ranking

5

Generator ID (for the API)

To configure the generator:

  1. In the Delimiter field, type the delimiter that is used as a separator for the value. For example, for the value "first","second","third", the delimiter is a comma.

  2. You can configure a generator for any or all of the indexes. To add a sub-generator for an index:

    1. Under Sub-Generators, click Add Generator. On the add generator dialog, the Cell CSV field contains a sample value from the source data. You can use the navigation icons to page through the values.

    2. In the CSV Index field, type the index to assign a generator to. The index numbers start with 0. You cannot use an index that already has an assigned generator. Matched CSV values shows the value at that index for the current sample column value.

    3. Under Generator Configuration, from the Select a Generator dropdown list, select the generator to use for the selected index. You cannot select another composite generator. To remove the selection, click the delete icon.

    4. Configure the selected generator. You cannot configure the selected generator to be consistent with another column.

    5. To save the configuration and immediately add a generator for another index, click Save and Add Another. To save the configuration and close the add generator panel, click Save.

  3. From the Sub-Generators list:

    1. To edit a generator assignment, click the edit icon.

    2. To remove a generator assignment, click the delete icon.

    3. To move a generator assignment up or down in the list, click the up or down arrow.

  4. After you configure a generator for at least one index, the Default Link dropdown list is displayed. From the Default Link dropdown list, select the index to use to determine how to mask values for indexes that do not have an assigned generator. For example, you assign the Character Scramble generator to index 2. If you set Default Link to 2, then all indexes that do not have an assigned generator use the Character Scramble generator.

Custom Categorical

A version of the Categorical generator that selects from values that you provide instead of shuffling the original values.

Consistency

Yes, can be made self-consistent or consistent with another column.

Linking

Yes, can be linked.

Differential privacy

Yes, if consistency is not enabled.

Data-free

Yes, if consistency is not enabled.

Allowed for primary keys

No

Allowed for unique columns

No

Uses format-preserving encryption (FPE)

No

Privacy ranking

  • 1 if not consistent

  • 4 if consistent

Generator ID (for the API)

To configure the generator:

  1. From the Link To dropdown list, select the columns to link this column to. You can only select other columns that use the Custom Categorical generator.

  2. In the Custom Categories text area, enter the list of values that the generator can choose from. Put each value on a separate line. To add a NULL value to the list, use the keyword {NULL}.

  3. Toggle the Consistency setting to indicate whether to make the column consistent. By default, consistency is disabled.

  4. If you enable consistency, then by default the generator is self-consistent. To make the generator consistent with another column, from the Consistent to dropdown list, select the column. When a generator is self-consistent, then a given value in the source database is always mapped to the same value in the destination database. When a generator is consistent with another column, then a given source value in that column always results in the same value for the current column in the destination database. For example, a department column is consistent with a username column. For each instance of User1 in the source database, the value in the department column is the same.

  5. If Structural data encryption is enabled, then to use it for this column, toggle Use data encryption process to the on position.

Date Truncation

Truncates a date value or a timestamp to a specific part.

For a date or a timestamp, you can truncate to the year, month, or day.

For a timestamp, you can also truncate to the hour, minute, or second.

Consistency

No, cannot be made consistent.

Linking

No, cannot be linked.

Differential privacy

No

Data-free

No

Allowed for primary keys

No

Allowed for unique columns

No

Uses format-preserving encryption (FPE)

No

Privacy ranking

5

Generator ID (for the API)

To configure the generator:

  1. From the dropdown list, select the part of the date or timestamp to truncate to. For both date and timestamp values, you can truncate to the year, month, or day. When you select one of these options, the time portion of a timestamp is set to 00:00:00. For the date, the values below the selected truncation value are set to 01. For example, when you truncate to month, the day value is set to 01, and the timestamp is set to 00:00:00. For a timestamp value, you also can truncate to the hour, minute, or second. The date values remain the same as the original data. The time values below the selected truncation value are set to 00. For example, when you truncate to minute, the seconds value is set to 00.

  2. Toggle the Birth Date option. When you enable Birth Date, the generator shifts dates that are more than 90 years before the generation date to the date exactly 90 years before the generation date. For example, a generation occurs on January 1, 2023. Any date that occurs before January 1, 1933 is changed to January 1, 1933.

    This is mostly intended for birthdate values, to group birthdates for everyone who is older than 89 into a single year. This is used to comply with HIPAA Safe Harbor.

  3. If Structural data encryption is enabled, then to use it for this column, toggle Use data encryption process to the on position.

Here are examples of date and time values and how the selected truncation affects the output:

OptionDate valueTimestamp value

Original value

2021-12-20

2021-12-20 13:42:55

Truncate to year

2021-01-01

2021-01-01 00:00:00

Truncate to month

2021-12-01

2021-12-01 00:00:00

Truncate to day

2021-12-20

2021-12-20 00:00:00

Truncate to hour

Not applicable

2021-12-20 13:00:00

Truncate to minute

Not applicable

2021-12-20 13:42:00

Truncate to second

Not applicable

2021-12-20 13:42:55

Email

This generator scrambles the characters in an email address. It preserves formatting and keeps the @ and . characters.

For example, for the following input value:

johndoe@company.com

The output value would be something like:

brwomse@xorwxlt.slt

By default, the generator scrambles the domain. You can configure the generator to not mask specific domains. You can also specify a domain to use for all of the output email addresses.

For example, if you configure the generator to not scramble the domain company.com, then the output for johndoe@company.com would look something like:

brwomse@company.com

This generator securely masks letters and numbers. There is no way to recover the original data.

Consistency

Yes, can be made self-consistent.

Linking

No, cannot be linked.

Differential privacy

No

Data-free

No

Allowed for primary keys

No

Allowed for unique columns

No

Uses format-preserving encryption (FPE)

No

Privacy ranking

  • 3 if not consistent

  • 4 if consistent

Generator ID (for the API)

To configure the generator:

  1. In the Email Domain field, enter a domain to use for all of the output values. For example, use @mycompany.com for all of the generated values. The generator scrambles the content before the @.

  2. In the Excluded Email Domains field, enter a comma-separated list of domains for which email addresses are not masked in the output values. This allows you, for example, to maintain internal or testing email addresses that are not considered sensitive.

  3. Toggle the Replace invalid emails setting to indicate whether to replace an invalid email address with a generated valid email address. By default, invalid email addresses are not replaced. In the replacement values, the username is generated. If you specify a value for Email Domain, then the email addresses use that domain. Otherwise, the domain is generated.

  4. Toggle the Consistency setting to indicate whether to make the column self-consistent. By default, consistency is disabled.

Event Timestamps

Generates timestamps fitting an event distribution. The source timestamp must include a date. It cannot be a time-only value.

Link columns to create a sequence of events across multiple columns. This generator can be partitioned by other columns.

Consistency

No, cannot be made consistent.

Linking

Yes, can be linked.

Differential privacy

No

Data-free

No

Allowed for primary keys

No

Allowed for unique columns

No

Uses format-preserving encryption (FPE)

No

Privacy ranking

3

Generator ID (for the API)

To configure the generator:

  1. From the Link To dropdown list, select the other Event Timestamps generator columns to link this column to. Linking creates a sequence across multiple columns.

  2. From the Partition drop-down list, select one or more columns to use to partition the data. The selected columns must have their generator set to either Passthrough or Categorical. For more information about partitioning and how it works, go to Partitioning a column.

  3. The Options list displays the current column and linked columns. Use the Up and Down buttons to configure the column sequence.

  4. If Structural data encryption is enabled, then to use it for this column, toggle Use data encryption process to the on position.

File Name

This generator scrambles characters while preserving formatting and keeping the file extension intact.

For example, for the following input value:

DataSummary1.pdf

The output value would look something like:

RsnoPwcsrtv5.pdf

This generator securely masks letters and numbers. There is no way to recover the original data.

Consistency

Yes, can be made self-consistent.

Linking

No, cannot be linked.

Differential privacy

No

Data-free

No

Allowed for primary keys

No

Allowed for unique columns

No

Uses format-preserving encryption (FPE)

No

Privacy ranking

  • 3 if not consistent

  • 4 if consistent

Generator ID (for the API)

To configure the generator, toggle the Consistency setting to indicate whether to make the generator self-consistent.

By default, the generator is not consistent.

If Structural data encryption is enabled, then to use it for this column, toggle Use data encryption process to the on position.

Find and Replace

This generator replaces all instances of the find string with the replace string.

For example, you can indicate to replace all instances of abc with 123.

Consistency

No, cannot be made consistent.

Linking

No, cannot be linked.

Differential privacy

No

Data-free

No

Allowed for primary keys

No

Allowed for unique columns

No

Uses format-preserving encryption (FPE)

No

Privacy ranking

5

Generator ID (for the API)

To configure the generator:

  1. In the Find field, type the string to look for in the source column value. To use a regular expression to identify the source value, check the Use Regex checkbox. If you use a regular expression, use backslash ( \ ) as the escape character.

  2. In the Replace field, type the string to replace the matching string with.

  3. If Structural data encryption is enabled, then to use it for this column, toggle Use data encryption process to the on position.

FNR

The FNR generator transforms Norwegian national identity numbers. In Norwegian, the term for national identity number abbreviates to FNR.

The first six digits of an FNR reflects the person's birthdate. You can choose to preserve the birthdates from the source values in the destination values. If you do not preserve the source values, the destination values are still within the same date range as the source values.

Another digit in an FNR indicates whether the person is male or female. You can specify whether to preserve in the generated value the gender indicated in the source value.

The last digits in an FNR are a checksum value. The last digits in the destination value are not a checksum - the values are random.

Consistency

Yes, can be made self-consistent or consistent with another column.

Linking

No, cannot be linked

Differential privacy

No

Data-free

No

Allowed for primary keys

No

Allowed for unique columns

Yes

Uses format-preserving encryption (FPE)

No

Privacy ranking

  • 3 if not consistent

  • 4 if consistent

Generator ID (for the API)

To configure the generator:

  1. To preserve the gender from the source value in the destination value, toggle Preserve Gender to the on position.

  2. To preserve the birthdate from the source value in the destination value, toggle Preserve Birthdate to the on position.

  3. Toggle the Consistency setting to indicate whether to make the generator consistent. By default, consistency is disabled.

  4. If you enable consistency, then by default the generator is self-consistent. To make the generator consistent with another column, from the Consistent to dropdown list, select the column. When a generator is self-consistent, then a given value in the source database is always mapped to the same value in the destination database. When a generator is consistent with another column, then a given value for that other column in the source database results in the same value in the destination database. For example, if the FNR column is consistent with a Name column, then every instance of John Smith in the source database results in the same FNR in the destination database.

  5. If Structural data encryption is enabled, then to use it for this column, toggle Use data encryption process to the on position.

Geo

This generator can be used to mask columns of latitude and longitude.

The Geo generator divides the globe into grids that are approximately 4.9 x 4.9 km. It then counts the number of points within each grid.

During data generation, each (latitude, longitude) pair is mapped to its grid.

  • If the grid contains a sufficient number of points to preserve privacy, then the generator returns a randomly chosen point in that grid.

  • If the grid does not contain enough points to preserve privacy, then the generator returns a random coordinate from the nearest grid that contains enough points.

Consistency

No, cannot be made consistent.

Linking

Yes, can be linked.

Differential privacy

No

Data-free

No

Allowed for primary keys

No

Allowed for unique columns

Yes

Uses format-preserving encryption (FPE)

No

Privacy ranking

3

Generator ID (for the API)

To configure the generator:

  1. From the Link To dropdown list, select the column to link to this one. You typically assign the Geo generator to both the latitude and longitude column, then link those columns.

  2. From the value type dropdown, select whether this column contains a latitude value or a longitude value.

  3. If Structural data encryption is enabled, then to use it for this column, toggle Use data encryption process to the on position.

HIPAA Address

This generator can be used to generate cities, states, and zip codes that follow HIPAA guidelines for safe harbor.

Zip Codes

How the HIPAA Address generator handles zip codes is based on whether the Replace zeros in truncated Zip Code toggle in the generator configuration is off or on.

By default, the setting is off. In this case, the last two digits of the zip code in the column are replaced with zeros, unless the zip code is a low population area as designated by the current census. For a low population area, all of the digits in the zip code are replaced with zeros.

If the setting is on, then the generator selects a real zip code that starts with the same three digits as the original zip code. For a low population area, if a state is linked, then the generator selects a random zip code from within that state. Otherwise the generator selects a random zip code from the United States.

Cities

When a zip code column is not linked, a random city is chosen in the United States. When a zip code is already added to the link, a city is chosen at random that has at least some overlap with the zip code.

If the original zip code is designated as a low population area then a random city is chosen within the state, this is done only if the user has linked a State column. If they have not, a random city within the United States is chosen.

For example, if the original city and zip code were (Atlanta, 30305), the zip code would be replaced with 30300. There are many cities that contain zip codes beginning in 303 such as Atlanta, Decatur, Chamblee, Hapeville, Dunwoody, College Park, etc.). One of these cities is chosen at random so that our final value is (Chamblee, 30300), for example.

States

HIPAA guidelines allow for information at the state level to be kept. Therefore, these values are passed through.

Latitude and longitude (GPS) coordinates

GPS coordinates are randomly generated in descending order of dependence of the linked HIPAA address components:

  1. If a zip code is linked, a random point within the same 3-digit zip code prefix is generated, if the 3-digit zip code prefix is not designated a low population area. If it is a low population area, use the linked state.

  2. If a state is available and a zip code and city are not, or the zip code or city are in a 3-digit zip code prefix that is designated a low population area, then a random GPS coordinate is generated somewhere within the state.

  3. If no zip code, city, or state is linked, or one or more of them were provided, but there was a problem generating a random GPS coordinate within the linked areas, then a GPS coordinate is generated at a random location within the United States.

Note: If the city component of the HIPAA address is linked with latitude and/or longitude, the GPS coordinate components are randomly generated independently of the city.

Other address parts

All other address parts are generated randomly and hence their value is not influenced at all by the underlying value in the column.

Consistency

Yes, can be made self-consistent.

Linking

Yes, can be linked.

Differential privacy

No

Data-free

No

Allowed for primary keys

No

Allowed for unique columns

No

Uses format-preserving encryption (FPE)

No

Privacy ranking

  • 3 if not consistent

  • 4 if consistent

Generator ID (for the API)

To configure the generator:

  1. From the Link To dropdown list, select the other columns to link to. You can only select columns that are also assigned the HIPAA Address generator.

  2. From the address part dropdown list, select the type of address value that is in the column.

  3. Toggle the Replace zeros in truncated Zip Code setting how to generate zip codes. If the setting is off, then the last two digits are replaced with zero. For low population areas, the entire zip code is populated with zeroes. If the setting is on, then a real zip code is selected that starts with the first three digits of the original zip code. For low population areas, if a state is linked, a random zip code from the state is used. Otherwise, a random zip code from the United States is used.

  4. Toggle the Consistency setting to indicate whether to make the column self-consistent. By default, consistency is disabled.

  5. If Structural data encryption is enabled, then to use it for this column, toggle Use data encryption process to the on position.

Spark supported address parts

For the HIPAA Address generator, Spark workspaces (Amazon EMR, Databricks, and self-managed Spark clusters) only support the following address parts:

  • City

  • City with State

  • City with State Abbr

  • State

  • State Abbr

  • US Address

  • US Address with Country

  • Zip Code

The Address generator provides support for additional address parts in Spark workspaces.

Hostname

Generates random host names, based on the English language.

Consistency

Yes, can be made self-consistent or consistent with another column.

Linking

No, cannot be linked.

Differential privacy

Yes, if consistency is not enabled.

Data-free

Yes, if consistency is not enabled.

Allowed for primary keys

No

Allowed for unique columns

No

Uses format-preserving encryption (FPE)

No

Privacy ranking

  • 1 if not consistent

  • 4 if consistent

Generator ID (for the API)

To configure the generator, toggle the Consistency setting to indicate whether to make the generator consistent.

By default, the generator is not consistent.

If you enable consistency, then by default the generator is self-consistent. To make the generator consistent with another column, from Consistent to, select the column.

When the generator is consistent with itself, then a given value in the source database is mapped to the same value in the destination database. For example, Host123 in the source database always produces MyHostABC in the destination database.

When the generator is consistent with another column, then a given source value in the other column results in the same host name value in the destination database. For example, a host name column is consistent with a department column. Every instance of Sales in the source data is given the same host name in the destination database.

If Structural data encryption is enabled, then to use it for this column, toggle Use data encryption process to the on position.

HStore Mask

This is a composite generator.

Runs selected generators on specified key values in an HStore column in a PostgreSQL database. HStore columns contain a set of key-value pairs.

Consistency

Determined by the selected sub-generators.

Linking

Determined by the selected sub-generators.

Differential privacy

Determined by the selected sub-generators.

Data-free

Determined by the selected sub-generators.

Allowed for primary keys

No

Allowed for unique columns

No

Uses format-preserving encryption (FPE)

No

Privacy ranking

5

Generator ID (for the API)

To configure the generator:

  1. To assign a generator to a key:

    1. Under Sub-generators, click Add Generator. On the sub-generator configuration panel, the Cell HStore field contains a sample value from the source database. You can use the previous and next icons to page through different values.

    2. Under Enter a key, enter the name of a key from the column value. For example, for the column value: "pages"=>"446", "title"=>"The Iliad", "category"=>"mythology" To apply a generator to the title, you would enter title as the key. Matched HStore Values shows the result from the value in Cell HStore.

    3. From the Generator Configuration dropdown list, select the generator to apply to the key value. You cannot select another composite generator.

    4. Configure the selected generator. You cannot configure the selected generator to be consistent with another column.

    5. To save the configuration and immediately add a generator for another key, click Save and Add Another. To save the configuration and close the add generator panel, click Save.

  2. From the Sub-Generators list:

    1. To edit a generator assignment, click the edit icon.

    2. To remove a generator assignment, click the delete icon.

    3. To move a generator assignment up or down in the list, click the up or down arrow.

HTML Mask

This is a composite generator.

Masks text columns by parsing the contents as HTML, and applying sub-generators to specified path expressions.

If applying a sub-generator fails because of an error, the generator selected as the fallback generator is applied instead.

Path expressions are defined using the XPath syntax.

For example, for the following HTML:

<html>
<body>
  <div class="container">
    <h1>Title</h1>
    <p>Paragraph content</p>
    <ul>
      <li>Item 1</li>
      <li>Item 2</li>
      <li>Item 3</li>
    </ul>
  </div>
</body>
</html>

To get the value of h1, the expression is //h1/text().

To get the value of the first list item, the expression is //ul/li[1]/text().

Consistency

Determined by the selected sub-generators.

Linking

Determined by the selected sub-generators.

Differential privacy

Determined by the selected sub-generators.

Data-free

Determined by the selected sub-generators.

Allowed for primary keys

No

Allowed for unique columns

No

Uses format-preserving encryption (FPE)

No

Privacy ranking

5

Generator ID (for the API)

To configure the generator:

  1. To assign a generator to a path expression:

    1. Under Sub-generators, click Add Generator. On the sub-generator configuration panel, the Cell HTML field contains a sample value from the source database. You can use the previous and next icons to page through different values.

    2. In the Path Expression field, type the path expression to identify the value to apply the generator to. Matched HTML Values shows the result from the value in Cell HTML.

    3. From the Generator Configuration dropdown list, select the generator to apply to the path expression. You cannot select another composite generator.

    4. Configure the selected generator. You cannot configure the selected generator to be consistent with another column.

    5. To save the configuration and immediately add a generator for another path expression, click Save and Add Another. To save the configuration and close the add generator panel, click Save.

  2. From the Sub-Generators list:

    1. To edit a generator assignment, click the edit icon.

    2. To remove a generator assignment, click the delete icon.

    3. To move a generator assignment up or down in the list, click the up or down arrow.

  3. From the Fallback Generator dropdown list, select the generator to use if the assigned generator for a path expression fails. The options are:

Integer Key

Generates unique integer values. By default, the generated values are within the range of the column’s data type.

You can also specify a range for the generated values. The source values must be within that range.

This generator cannot be used to transform negative numbers.

Consistency

Yes, can be made self-consistent.

Linking

No, cannot be linked.

Differential privacy

Yes, if consistency is not enabled.

Data-free

Yes, if consistency is not enabled.

Allowed for primary keys

Yes

Allowed for unique columns

Yes

Uses format-preserving encryption (FPE)

Yes

Privacy ranking

  • 1 if not consistent

  • 4 if consistent

Generator ID (for the API)

To configure the generator:

  1. In the Minimum field, enter the minimum value to use for an output value. The minimum value cannot be larger than any of the values in the source data.

  2. In the Maximum field, enter the maximum value to use for an output value. The maximum value cannot be smaller than any of the values in the source data.

  3. Toggle the Consistency setting to indicate whether to make the column self-consistent. By default, consistency is disabled.

  4. If Structural data encryption is enabled, then to use it for this column, toggle Use data encryption process to the on position.

IP Address

Generates a random IP address formatted string.

Consistency

Yes, can be made self-consistent or consistent with another column.

Linking

No, cannot be linked.

Differential privacy

Yes, if consistency is not enabled.

Data-free

Yes, if consistency is not enabled.

Allowed for primary keys

No

Allowed for unique columns

No

Uses format-preserving encryption (FPE)

No

Privacy ranking

  • 1 if not consistent

  • 4 if consistent

Generator ID (for the API)

To configure the generator:

  1. In the Percent IPv4 field, type the percentage of output values that are IPv4 addresses. For example, if you set this to 60, then 60% of the generated IP addresses are IPv4 addresses, and 40% of the generated IP addresses are IPv6 addresses. If you set this to 100, then all of the generated IP addresses are IPv4 addresses. If you set this to 0, then all of the generated IP addresses are IPv6 addresses.

  2. Toggle the Consistency setting to indicate whether to make the column consistent. By default, consistency is disabled.

  3. If you enable consistency, then by default the generator is self-consistent. To make the generator consistent with another column, from the Consistent to dropdown list, select the column. When a generator is self-consistent, then a given value in the source database is always mapped to the same value in the destination database. When a generator is consistent with another column, then a given source value in that column always results in the same IP address value in the destination database. For example, an IP address column is consistent with a username column. For each instance of User1 in the source database, the value in the IP address column is the same.

  4. If Structural data encryption is enabled, then to use it for this column, toggle Use data encryption process to the on position.

JSON Mask

This is a composite generator.

Runs a selected generator on values that match a user specified JSONPath.

If an error occurs, the selected fallback generator is used for the entirety of the JSON value.

Sub-generators are applied sequentially, from the sub-generator at the top of the list to the sub-generator at the bottom of the list.

If multiple JSONPath expressions point to the same key, the most recently added generator takes priority.

JSON paths can also contain regular expressions and comparison logic, which allows the configured sub-generators to be applied only when there are properties that satisfy the query.

For example, a column contains this JSON:

[ { file_name: "foo.txt", b: 10 }, ... ]

The following JSON path only applies to array elements that contain a file_name key for which the value ends in .txt:

$.[?(@.file_Name =~ /^.*.txt$/)]

A JSON path can also be used to point to a key name recursively. For example, a column contains this JSON:

{
  "first_name": "John",
  "last_name": "Smith",
  "children": [
    {
      "first_name": "Mary",
      "last_name": "Jones",
      "children": [
        {
          "first_name": "Ann",
          "last_name": "Jones"
        }
      ]
    }
  ]
}

The following JSON path applies to all properties for which the key is first_name:

$..first_name

Consistency

Determined by the selected sub-generators.

Linking

Determined by the selected sub-generators.

Differential privacy

Determined by the selected sub-generators.

Data-free

Determined by the selected sub-generators.

Allowed for primary keys

No

Allowed for unique columns

No

Uses format-preserving encryption (FPE)

No

Privacy ranking

5

Generator ID (for the API)

To configure the generator:

  1. To assign a generator to a path expression:

    1. Under Sub-generators, click Add Generator. On the sub-generator configuration panel, the Cell JSON field contains a sample value from the source database. You can use the previous and next icons to page through different values.

    2. In the Path Expression field, type the path expression to identify the value to apply the generator to. To create a path expression, you can also click the value in Cell JSON that you want the expression to point to. Matched JSON Values shows the result from the value in Cell JSON.

    3. By default, the selected generator is applied to any value that matches the expression. To limit the types of values to apply the generator to, from the Type Filter, specify the applicable types. You can select Any, or you can select any combination of String, Number, Boolean, and Null.

    4. From the Generator Configuration dropdown list, select the generator to apply to the path expression. You cannot select another composite generator.

    5. Configure the selected generator. You cannot configure the selected generator to be consistent with another column.

    6. To save the configuration and immediately add a generator for another path expression, click Save and Add Another. To save the configuration and close the add generator panel, click Save.

  2. From the Sub-Generators list:

    1. To edit a generator assignment, click the edit icon.

    2. To remove a generator assignment, click the delete icon.

    3. To move a generator assignment up or down in the list, click the up or down arrow.

  3. From the Fallback Generator dropdown list, select the generator to use if the assigned generator for a path expression fails. The options are:

MAC Address

Generates a random MAC address formatted string.

Consistency

Yes, can be made self-consistent.

Linking

No, cannot be linked.

Differential privacy

Yes, if consistency is not enabled.

Data-free

Yes, if consistency is not enabled.

Allowed for primary keys

No

Allowed for unique columns

No

Uses format-preserving encryption (FPE)

Yes

Privacy ranking

  • 1 if not consistent

  • 4 if consistent

Generator ID (for the API)

To configure the generator:

  1. In the Bytes Preserved field, enter the number of bytes to preserve in the generated address.

  2. Toggle the Consistency setting to indicate whether to make the column self-consistent. By default, consistency is disabled.

  3. If Structural data encryption is enabled, then to use it for this column, toggle Use data encryption process to the on position.

Mongo ObjectId Key

Generates unique object identifiers.

Can be assigned to text columns that contain MongoDB ObjectId values. The column value must be 12 bytes long.

Consistency

Yes, can be made self-consistent

Linking

No, cannot be linked

Differential privacy

No

Data-free

No

Allowed for primary keys

No

Allowed for unique columns

No

Uses format-preserving encryption (FPE)

No

Privacy ranking

  • 3 if not consistent

  • 4 if consistent

Generator ID (for the API)

To configure the generator:

  1. A MongoID object identifier consists of an epoch timestamp, a random value, and an incremented counter. To only change the random value portion of the identifier, but keep the timestamp and counter portions, toggle Preserve Timestamp and Incremental Counter to the on position.

  2. Toggle the Consistency setting to indicate whether to make the generator self-consistent. By default, the generator is not consistent.

  3. If Structural data encryption is enabled, then to use it for this column, toggle Use data encryption process to the on position.

Name

Generates a random name string from a dictionary of first and last names.

You specify the name information that is contained in the column. A column might only contain a first name or last name, or might contain a full name. A full name might be first name first or last name first.

For example, a Name column contains a full name in the format Last, First. For the input value Smith, John, the output value would be something like, Jones, Mary.

Consistency

Yes, can be made self-consistent or consistent with another column. Note that all Name generator columns that have the same consistency configuration are automatically consistent with each other. The columns must either be all self-consistent or all consistent with the same other column. For example, you can use this to ensure that a first name and last name column value always match the first name and last name in a full name column.

Linking

No, cannot be linked.

Differential privacy

Yes, if consistency is not enabled.

Data-free

Yes, if consistency is not enabled.

Allowed for primary keys

No

Allowed for unique columns

No

Uses format-preserving encryption (FPE)

No

Privacy ranking

  • 1 if not consistent

  • 4 if consistent

Generator ID (for the API)

To configure the generator:

  1. From the name format dropdown list, select the type of name value that the column contains:

    • First. This also is commonly used for standalone middle name fields.

    • Last

    • First Last

    • First Middle Last

    • First Middle Initial Last

    • Last, First

    • Last, First Middle

    • Middle Initial

  2. Toggle the Preserve Capitalization setting to indicate whether to preserve the capitalization of the column value. By default, the capitalization is not preserved.

  3. Toggle the Consistency setting to indicate whether to make the column consistent. By default, consistency is disabled.

  4. If you enable consistency, then by default the generator is self-consistent. To make the generator consistent with another column, from the Consistent to dropdown list, select the column.

  5. If Structural data encryption is enabled, then to use it for this column, toggle Use data encryption process to the on position.

Noise Generator

Masks values in numeric columns. Adds or multiplies the original value by random noise.

The additive noise generator draws noise from an interval around 0 scaled to the magnitude of original value. For example, the default scale is 10% of the underlying value. The larger the value, the larger the amount of noise that is added.

The multiplicative noise generator multiplies the original value by a random scaling factor that falls within a specified range.

Consistency

Yes, can be made self-consistent or consistent with another column.

Linking

No, cannot be linked.

Differential privacy

No

Data-free

No

Allowed for primary keys

No

Allowed for unique columns

No

Uses format-preserving encryption (FPE)

No

Privacy ranking

  • 3 if not consistent

  • 4 if consistent

Generator ID (for the API)

To configure the generator:

  1. To use the additive noise generator:

    1. From the dropdown list, choose Additive.

    2. In the Relative noise scale field, type the percentage of the underlying value to scale the noise to. The default value is 10. Tonic samples the additive noise from a range between [-{scale/100} * |value|, {scale/ 100} * |value|), where scale is the noise scale, and value is the original data value. The lower value of the range is inclusive, and the upper value of the range is exclusive. For example, for the default noise scale of 10, and a data value of 20, the additive noise range would be [-.1 * 20, .1 * 20). In other words, between -2 (inclusive) and 2 (exclusive).

  2. To use the multiplicative noise generator:

    1. From the dropdown list, choose Multiplicative.

    2. In the Min field, type the minimum value for the scaling factor. The minimum value is inclusive. The default value is 0.5.

    3. In the Max field, type the maximum value for the scaling factor. The maximum value is exclusive. The default value is 5.

    Tonic scales the original value from a range between [min, max), where min is the minimum scaling factor, and max is the maximum scaling factor. For example, for the default values of 0.5 and 5, Tonic multiplies the original data value by a value from between 0.5 (inclusive) and 5 (exclusive).

  3. Toggle the Consistency setting to indicate whether to make the column consistent. By default, the consistency is disabled.

  4. If you enable consistency, then by default the generator is self-consistent. To make the generator consistent with another column, from the Consistent to dropdown list, select the column. If the generator is self-consistent, then a given value in the source database is masked in exactly the same way to produce the value in the destination database. If the generator is consistent with another column, then for a given value in that other column, the column that is assigned the Noise generator is always masked in exactly the same way in the destination database. For example, a field containing a salary value is assigned the Noise Generator and is consistent with the username field. For each instance of User1, the Noise Generator masks the salary value in exactly the same way.

  5. If Structural data encryption is enabled, then to use it for this column, toggle Use data encryption process to the on position.

Null

Generates NULL values to fill the rows of the specified column.

Consistency

No, cannot be made consistent.

Linking

No, cannot be linked.

Differential privacy

Yes

Data-free

Yes

Allowed for primary keys

No

Allowed for unique columns

Yes

Uses format-preserving encryption (FPE)

No

Privacy ranking

1

Generator ID (for the API)

The Null generator has no configuration options.

Numeric String Key

Generates unique numeric strings of the same length as the input value.

For example, for the input value 123456, the output value would be something like 832957.

You can apply this generator only to columns that contain numeric strings.

Consistency

Yes, can be made self-consistent.

Linking

No, cannot be linked.

Differential privacy

No

Data-free

No

Allowed for primary keys

Yes

Allowed for unique columns

Yes

Uses format-preserving encryption (FPE)

Yes

Privacy ranking

  • 3 if not consistent

  • 4 if consistent

Generator ID (for the API)

To configure the generator, toggle the Consistency setting to indicate whether to make the generator self-consistent.

By default, the generator is not consistent.

If Structural data encryption is enabled, then to use it for this column, toggle Use data encryption process to the on position.

Passthrough

Passthrough is the default option.

It passes through the value from the source database to the destination database without masking it.

Consistency

No, cannot be made consistent.

Linking

No, cannot be linked.

Differential privacy

No

Data-free

No

Allowed for primary keys

No

Allowed for unique columns

Yes

Uses format-preserving encryption (FPE)

No

Privacy ranking

6

Generator ID (for the API)

Passthrough has no configuration options.

Phone

Generates a random phone number that matches the country or region of the input phone number while maintaining the format. For example, (123) 456-7890 or 123-456-7890.

If the input is not a valid phone number, the generator randomly replaces numeric characters. You can also replace invalid numbers with valid numbers.

By default, the numbers are United States phone numbers. Generated numbers pass Google's libphonenumber verification if the input is a valid phone number or if you replace invalid numbers.

Consistency

Yes, can be made self-consistent.

Linking

No, cannot be linked.

Differential privacy

No

Data-free

No

Allowed for primary keys

No

Allowed for unique columns

No

Uses format-preserving encryption (FPE)

No

Privacy ranking

3

Generator ID (for the API)

To configure the generator:

  1. Toggle the Replace invalid numbers setting to indicate whether to replace invalid input values with a valid output value. By default, the generator does not replace invalid values. It randomly replaces numeric characters.

  2. Toggle the Consistency setting to indicate whether to make the generator self-consistent. By default, consistency is disabled.

  3. If Structural data encryption is enabled, then to use it for this column, toggle Use data encryption process to the on position.

Random Boolean

Generates a random boolean value.

Consistency

No, cannot be made consistent.

Linking

No, cannot be linked.

Differential privacy

Yes

Data-free

Yes

Allowed for primary keys

No

Allowed for unique columns

No

Uses format-preserving encryption (FPE)

No

Privacy ranking

1

Generator ID (for the API)

To configure the generator, in the Percent True field, enter the percentage of values to set to True in the output.

For example, if you set this to 60, then 60% of the output values are True, and 40% of the output values are False.

If you set this to 100, then all of the output values are True.

If you set this to 0, then all of the output values are False.

If Structural data encryption is enabled, then to use it for this column, toggle Use data encryption process to the on position.

Random Double

Generates a random double number between the specified minimum (inclusive) and maximum (exclusive).

Consistency

No, cannot be made consistent.

Linking

No, cannot be linked.

Differential privacy

Yes

Data-free

Yes

Allowed for primary keys

No

Allowed for unique columns

No

Uses format-preserving encryption (FPE)

No

Privacy ranking

1

Generator ID (for the API)

To configure the generator:

  1. In the Minimum field, type the minimum value to use in the output values. The minimum value is inclusive. The output values can be that value or higher.

  2. In the Maximum field, type the maximum value to use in the output values. The maximum value is exclusive. The output values are lower than that value.

  3. If Structural data encryption is enabled, then to use it for this column, toggle Use data encryption process to the on position.

Random Hash

Generates a random hash string.

Consistency

No, cannot be made consistent.

Linking

No, cannot be linked.

Differential privacy

Yes

Data-free

Yes

Allowed for primary keys

No

Allowed for unique columns

No

Uses format-preserving encryption (FPE)

No

Privacy ranking

1

Generator ID (for the API)

If Structural data encryption is enabled, then to use it for this column, toggle Use data encryption process to the on position.

Random Integer

Returns a random integer between the specified minimum (inclusive) and maximum (exclusive).

For example, for a column that contains a percentage value, you can indicate to use a value between 0 and 101.

Consistency

No, cannot be made consistent.

Linking

No, cannot be linked.

Differential privacy

Yes

Data-free

Yes

Allowed for primary keys

No

Allowed for unique columns

No

Uses format-preserving encryption (FPE)

No

Privacy ranking

1

Generator ID (for the API)

To configure the generator:

  1. In the Minimum field, type the minimum value to use in the output values. The minimum value is inclusive. The output values can be that value or higher.

  2. In the Maximum field, type the maximum value to use in the output values. The maximum value is exclusive. The output values are lower than that value.

  3. If Structural data encryption is enabled, then to use it for this column, toggle Use data encryption process to the on position.

Random Timestamp

Generates random dates, times, and timestamps that fall within a specified range.

For example, you might want the output dates to all fall within a specific year or month.

Consistency

No, cannot be made consistent.

Linking

No, cannot be linked.

Differential privacy

Yes

Data-free

Yes

Allowed for primary keys

No

Allowed for unique columns

No

Uses format-preserving encryption (FPE)

No

Privacy ranking

1

Generator ID (for the API)

To configure the generator, in the Range fields, provide the start and end dates, times, or timestamps to use for the output values.

If Structural data encryption is enabled, then to use it for this column, toggle Use data encryption process to the on position.

Random UUID

Generates a random new UUID string.

Consistency

No, cannot be made consistent.

Linking

No, cannot be linked.

Differential privacy

Yes

Data-free

Yes

Allowed for primary keys

No

Allowed for unique columns

Yes

Uses format-preserving encryption (FPE)

No

Privacy ranking

1

Generator ID (for the API)

If Structural data encryption is enabled, then to use it for this column, toggle Use data encryption process to the on position.

Regex Mask

This is a composite generator.

Uses regular expressions to parse strings and replace specified substrings with the output of specified generators. The parts of the string to replace are specified inside unnamed top-level capture groups.

Defining multiple expressions allows you to attach completely different sets of sub-generators to to a given cell, depending on the cell's value.

If multiple regular expressions match a given string, the regular expressions and their associated generators are applied in the order that they are specified. The first expression defined that matches has the selected sub-generators applied.

With the Replace all matches option, the Regex Mask generator behaves similarly to a traditional regex parser. It matches all occurrences of a pattern before the next pattern is encountered. For example, the pattern ^(a)$ applied to the string aaab matches every occurrence of the letter a, instead of just the first.

Note that for Spark-based data connectors, depending on your environment, there might be slight differences in the regular expression support. To ensure consistent results across all data connectors, use regular expression patterns that are compatible with both Java and C#.

For more information about regular expressions in C#, go to this reference. For more information about regular expressions in Java, go to this reference.

Example expressions

In a cell that contains the string ProductId:123-BuyerId:234, to mask the substrings 123 and 234, specify the regular expression:

^ProductId:([0-9]{3})-BuyerId:([0-9]{3})$

This captures the two occurrences of three-digit numbers in the pattern ProductId:xxx-BuyerId:xxx. This makes it possible to define a sub-generator on neither, either, or both of these captured substrings.

The following regular expression defines a broader capture that matches more cell values:

^(\w+).(\d+).(\w+).(\d+)$

This captures pairs of words ((\w+)) and numbers ((\d+)) if there is a single character of any value between them, instead of the relatively more specific pattern of the first expression.

Consistency

Determined by the selected sub-generators.

Linking

Determined by the selected sub-generators.

Differential privacy

Determined by the selected sub-generators.

Data-free

Determined by the selected sub-generators.

Allowed for primary keys

No

Allowed for unique columns

Yes

Uses format-preserving encryption (FPE)

No

Privacy ranking

5

Generator ID (for the API)

To configure the generator:

  1. To add a regular expression:

    1. Click Add Regex. On the configuration panel, Cell Value shows a sample value from the source database. You can use the previous and next options to navigate through the values.

    2. By default, Replace all matches is enabled. To only match the first occurrence of a pattern, toggle Replace all matches to the off position.

    3. In the Pattern field, enter a regular expression. If the expression is valid, then Tonic displays the capture groups for the expression.

    4. For each capture group, to select and configure the generator to apply, click the selected generator. You cannot select another composite generator.

    5. To save the configuration and immediately add a generator for another path expression, click Save and Add Another. To save the configuration and close the add generator panel, click Save.

  2. From the Regexes list:

    1. To edit a regex, click the edit icon.

    2. To remove a regex, click the delete icon.

Sequential Integer

Generates a column of unique integer values. The values increment by 1.

Consistency

No, cannot be made consistent.

Linking

Yes, can be linked.

Differential privacy

No

Data-free

No

Allowed for primary keys

No

Allowed for unique columns

Yes

Uses format-preserving encryption (FPE)

No

Privacy ranking

3

Generator ID (for the API)

To configure the generator:

  1. From the Link To dropdown list, select the other columns to link to the current column. You can only select columns that also use the Sequential Integer generator.

  2. In the Starting Point field, type the number to use as the starting point. By default, the starting point is 0. This means that the column value in the first processed row is 0. The value in the next processed row is 1. The generator continues to increment the value by 1 in each row that it processes.

  3. If Structural data encryption is enabled, then to use it for this column, toggle Use data encryption process to the on position.

Shipping Container

Generates values of ISO 6346 compliant shipping container codes. All generated codes are in the freight category ("U").

Consistency

Yes, can be made self-consistent or consistent with another column.

Linking

No, cannot be linked.

Differential privacy

Yes, if consistency is not enabled.

Data-free

Yes, if consistency is not enabled.

Allowed for primary keys

No

Allowed for unique columns

No

Uses format-preserving encryption (FPE)

No

Privacy ranking

  • 1 if not consistent

  • 4 if consistent

Generator ID (for the API)

To configure the generator, toggle the Consistency setting to indicate whether to make the generator consistent.

By default, the generator is not consistent.

If you enable consistency, then by default the generator is self-consistent. To make the generator consistent with another column, from the Consistent to dropdown list, select the column.

When the generator is self-consistent, then a given value in the source database is always mapped to the same value in the destination database.

When the generator is consistent with another column, then a given value for the other column in the source database always results in the same shipping container code value in the destination database. For example, a shipping container column is consistent with an owner column. Every instance of an owner column from the source database has the same shipping container value in the destination database.

If Structural data encryption is enabled, then to use it for this column, toggle Use data encryption process to the on position.

SIN

Generates a new valid Canadian Social Insurance Number that preserves the formatting of the original value.

For example, the original value might be 123456789, 123 456 789, or 123-456-789. The output value uses the same format.

Consistency

Yes, can be made self-consistent.

Linking

No, cannot be linked.

Differential privacy

No, cannot be made differentially private.

Data-free

Yes, if consistency is not enabled.

Allowed for primary keys

No

Allowed for unique columns

Yes

Uses format-preserving encryption (FPE)

Yes

Privacy ranking

  • 1 if not consistent

  • 4 if consistent

Generator ID (for the API)

To configure the generator, toggle the Consistency setting to indicate whether to make the generator self-consistent.

By default, the generator is not consistent.

If Structural data encryption is enabled, then to use it for this column, toggle Use data encryption process to the on position.

SSN

Generates a new valid United States Social Security Number.

You specify the percentage of values for which to include the dashes.

Consistency

Yes, can be made self-consistent or consistent with another column.

Linking

No, cannot be linked.

Differential privacy

Yes, if consistency is not enabled.

Data-free

Yes, if consistency is not enabled.

Allowed for primary keys

No

Allowed for unique columns

No

Uses format-preserving encryption (FPE)

No

Privacy ranking

  • 1 if not consistent

  • 4 if consistent

Generator ID (for the API)

To configure the generator:

  1. In the Percent with -'s field, type the percentage of output values for which to include dashes in the format. For example, if you set this to 60, then 60% of the output values are formatted 123-45-6789, and 40% are formatted 123456789. If you set this to 100, then all of the output values are formatted 123-45-6789. If you set this to 0, then all of the output values are formatted 12345679.

  2. Toggle the Consistency setting to indicate whether to make the generator consistent. By default, consistency is disabled.

  3. If you enable consistency, then by default the generator is self-consistent. To make the generator consistent with another column, from the Consistent to dropdown list, select the column. When a generator is self-consistent, then a given value in the source database is always mapped to the same value in the destination database. When a generator is consistent with another column, then a given value for that other column in the source database results in the same SSN in the destination database. For example, if the SSN column is consistent with a Name column, then every instance of John Smith in the source database results in the same SSN in the destination database.

  4. If Structural data encryption is enabled, then to use it for this column, toggle Use data encryption process to the on position.

Struct Mask

This is a composite generator.

Applies selected generators to specific StructFields within a StructType in a Spark database (Databricks and Amazon EMR).

For example, for the following StructType:

root
 |-- firstname: string (nullable = true)
 |-- lastname: string (nullable = true)
 |-- age: integer (nullable = true)
 |-- occupation: string (nullable = true)
 |-- salary: integer (nullable = true)


firstname | lastname | age | occupation | salary
-------------------------------------------------
John      | Smith    | 25  | Teacher    | 45000

To get the value of the occupation field, you would use the expression root.occupation.

Consistency

Determined by the selected sub-generators.

Linking

Determined by the selected sub-generators.

Differential privacy

Determined by the selected sub-generators.

Data-free

Determined by the selected sub-generators.

Allowed for primary keys

No

Allowed for unique columns

No

Uses format-preserving encryption (FPE)

No

Privacy ranking

5

Generator ID (for the API)

To configure the generator:

  1. To assign a generator to a path expression:

    1. Under Sub-generators, click Add Generator. On the sub-generator configuration panel, the Cell Struct field contains a sample value from the source database. You can use the previous and next icons to page through different values.

    2. In the Path Expression field, type the path expression to identify the value to apply the generator to. Matched Struct Values shows the result from the value in Cell Struct.

    3. From the Generator Configuration dropdown list, select the generator to apply to the path expression. You cannot select another composite generator.

    4. Configure the selected generator. You cannot configure the selected generator to be consistent with another column.

    5. To save the configuration and close the add generator panel, click Save.

  2. From the Sub-Generators list:

    1. To edit a generator assignment, click the edit icon.

    2. To remove a generator assignment, click the delete icon.

    3. To move a generator assignment up or down in the list, click the up or down arrow.

Timestamp Shift Generator

Shifts timestamps by a random amount of a specific unit of time within a set range.

For date-only values, the Timestamp Shift Generator supports the following date formats. The example values are all for February 23, 2021.

  • MM/dd/yyyy - 02/23/2021

  • MM/dd/yy - 02/23/21

  • MM-dd-yyyy - 02-23-2021

  • yyyyMMdd - 20210223

  • yyyy/MM/dd - 2021/02/23

  • MMddyyyy - 02232021

Consistency

Yes, can be made self-consistent or consistent with another column.

Linking

No, cannot be linked.

Differential privacy

No

Data-free

No

Allowed for primary keys

No

Allowed for unique columns

No

Uses format-preserving encryption (FPE)

No

Privacy ranking

  • 3 if not consistent

  • 4 if consistent

Generator ID (for the API)

To configure the generator:

  1. From the Date Part dropdown list, select the unit of time to use for the minimum and maximum shift.

  2. In the Minimum Shift field, type the minimum amount the value can be shifted from the original value. Use negative numbers to indicate to shift the date to the past. For example, assume that the date part is Day. -3 indicates that the day cannot be shifted earlier than 3 days before the original day. 3 indicates that the date cannot be shifted earlier than 3 days after the original day.

  3. In the Maximum Shift field, type the maximum amount by which the value can be shifted from the original value. For example, assume that the date part is Day. 5 indicates that the date cannot be shifted later than 5 days after the original day.

  4. Toggle the Consistency setting to indicate whether to make the generator consistent. By default, consistency is disabled.

  5. If you enable consistency, then by default the generator is self-consistent. To make the generator consistent with another column, from the Consistent to dropdown list, select the column. When a column is consistent with itself, then the same date part value is always shifted by the same amount.

    When a column is consistent with another column, then for the same value in the other column, the date part value is always shifted by the same amount. For example, for the same value of username, the birthdate column value is always shifted by the same amount.

    If multiple columns that use the Timestamp Shift generator are consistent with the same other column, then for those columns, the date part value shifts by the same amount. For example, the startdate and enddate columns are both consistent with the username column. Both startdate and enddate use the Timestamp Shift generator. For the same value of username, both startdate and enddate are shifted by the same amount.

  6. If Structural data encryption is enabled, then to use it for this column, toggle Use data encryption process to the on position.

Unique Email

Generates unique email addresses. Replaces the username with a randomly generated GUID, and masks the domain with a character scramble.

This generator only guarantees uniqueness if the underlying column is unique.

Consistency

Yes, can be made self-consistent.

Linking

No, cannot be linked.

Differential privacy

No

Data-free

No

Allowed for primary keys

No

Allowed for unique columns

Yes

Uses format-preserving encryption (FPE)

No

Privacy ranking

  • 3 if not consistent

  • 4 if consistent

Generator ID (for the API)

To configure the generator:

  1. In the Email Domain field, enter a domain to use for all of the output values. For example, use @mycompany.com for all of the generated values. If you do not provide a value, then the generator uses a character scramble on the domain.

  2. In the Excluded Email Domains field, enter a comma-separated list of domains for which email addresses are not masked in the output values. This allows you, for example, to maintain internal or testing email addresses that are not considered sensitive.

  3. Toggle the Replace invalid emails setting to indicate whether to replace an invalid email address with a generated valid email address. By default, invalid email addresses are not replaced. In the replacement values, the username is generated. If you specify a value for Email Domain, then that value is used for the domain. Otherwise, the domain is generated.

  4. Toggle the Consistency setting to indicate whether to make the generator self-consistent. By default, consistency is disabled.

  5. If Structural data encryption is enabled, then to use it for this column, toggle Use data encryption process to the on position.

URL

This is a substitution cipher that preserves formatting, but keeps the URL scheme and top-level domain intact.

For example, for the following input value:

http://www.example.com/products/clothes

The output value would be something like:

http://www.example.com/sowrmsl/kwctlsn

This mask is not secure.

Consistency

No, cannot be made consistent.

Linking

No, cannot be linked.

Differential privacy

No

Data-free

No

Allowed for primary keys

No

Allowed for unique columns

Yes

Uses format-preserving encryption (FPE)

No

Privacy ranking

3

Generator ID (for the API)

If Structural data encryption is enabled, then to use it for this column, toggle Use data encryption process to the on position.

UUID Key

Generates UUIDs on primary key columns.

All foreign key columns that reference the configured column automatically have their UUID values masked.

Consistency

Yes, can be made self-consistent.

Linking

No, cannot be linked.

Differential privacy

No

Data-free

No

Allowed for primary keys

Yes

Allowed for unique columns

Yes

Uses format-preserving encryption (FPE)

Yes

Privacy ranking

  • 3 if not consistent

  • 4 if consistent

Generator ID (for the API)

To configure the generator:

  1. To preserve the version and variant bits from the source UUID in the output value, toggle Preserve Version and Variant to the on position.

  2. Toggle the Consistency setting to indicate whether to make the generator self-consistent. By default, the generator is not consistent.

  3. If Structural data encryption is enabled, then to use it for this column, toggle Use data encryption process to the on position.

XML Mask

This is a composite generator.

Runs a selected generator on values that match a user specified path expression.

Path expressions are defined using the XPath syntax.

For example, for the following XML content:

<?xml version="1.0" encoding="UTF-8"?>
    <household>
        <member>
            <first_name>John</first_name>
            <last_name>Smith</last_name>
            <age>25</age>
            <occupation>Teacher</occupation>
            <salary>45000</salary>
        </member>
    </household>
</xml>

To get the first_name value, you would use /household/member/first_name.

You can also select a fallback generator to run on the entire XML value if there is any error during data generation.

Consistency

Determined by the selected sub-generators.

Linking

Determined by the selected sub-generators.

Differential privacy

Determined by the selected sub-generators.

Data-free

Determined by the selected sub-generators.

Allowed for primary keys

No

Allowed for unique columns

No

Uses format-preserving encryption (FPE)

No

Privacy ranking

5

Generator ID (for the API)

To configure the generator:

  1. To assign a generator to a path expression:

    1. Under Sub-generators, click Add Generator. On the sub-generator configuration panel, the Cell XML field contains a sample value from the source database. You can use the previous and next icons to page through different values.

    2. In the Path Expression field, type the path expression to identify the value to apply the generator to. Matched XML Values shows the result from the value in Cell XML.

    3. From the Generator Configuration dropdown list, select the generator to apply to the value at the path expression. You cannot select another composite generator.

    4. Configure the selected generator. You cannot configure the selected generator to be consistent with another column.

    5. To save the configuration and immediately add a generator for another path expression, click Save and Add Another. To save the configuration and close the add generator panel, click Save.

  2. From the Sub-Generators list:

    1. To edit a generator assignment, click the edit icon.

    2. To remove a generator assignment, click the delete icon.

    3. To move a generator assignment up or down in the list, click the up or down arrow.

  3. From the Fallback Generator dropdown list, select the generator to use if any error occurs in the generation. The fallback generator is then used for the entire XML value. The options are:

Last updated