Search…
⌃K
Links

Generator reference

Here are the details for the supported generators in Tonic.

Address

Generates a random address-like string.
You can indicate which part of an address string that the column contains. For example, the column might contain only the street address or the city, or it might contain the full address.
Consistency
Yes, can be made self-consistent or consistent with another column.
Linking
Yes, can be linked.
Differential privacy
Yes, if consistency is not enabled.
Data-free
Yes, if consistency is not enabled.
To configure the generator:
  1. 1.
    From the Link To dropdown list, select the columns to link this column to. You can link columns that use the Address generator to mask one of the following address components:
    • City
    • City State
    • Country
    • Country Code
    • State
    • State Abbreviation
    • Zip Code
    • Latitude
    • Longitude
    Note that when linked to another address column, a country or country code is always the United States.
  2. 2.
    From the address component dropdown list, select the address component that this column contains. The available options are:
    • Building Number
    • Cardinal Direction (North, South, East, West)
    • City
    • City Prefix (Examples: North, South, East, West, Port, New)
    • City Suffix (Examples: land, ville, furt, town)
    • City with State (Example: Spokane, Washington)
    • City with State Abbr (Example: Houston, TX)
    • Country (Examples: Spain, Canada)
    • Country Code (Uses the 2-character country code. Examples: ES, CA)
    • County
    • Direction (Examples: North, Northeast, Southwest, East)
    • Full Address
    • Latitude (Examples: 33.51, 41.32)
    • Longitude (Examples: -84.05, -74.21)
    • Ordinal Direction (Examples: Northeast, Southwest)
    • Secondary Address (Examples: Apt 123, Suite 530)
    • State (Examples: Alabama, Wisconsin)
    • State Abbr (Examples: AL, WI)
    • Street Address (Example: 123 Main Street)
    • Street Suffix (Examples: Way, Hill, Drive)
    • Street Name (Examples: Broad, Elm)
    • US Address
    • US Address with Country
    • Zip Code (Example: 12345)
  3. 3.
    Toggle the Consistency setting to indicate whether to make the column consistent. By default, the consistency is disabled.
  4. 4.
    If consistency is enabled, then by default, the generator is self-consistent. To make the generator consistent with another column, from the Consistent to dropdown list, select the column. When the Address generator is consistent with itself, then the same value in the source database is always mapped to the same destination value. For example, for a column that contains a state name, Alabama is always mapped to Illinois. When the Address generator is consistent with another column, then the same value in the other column always results in the same destination value for the address column. For example, if the address column is consistent with a name column, then every instance of John Smith in the name column in the source database has the same address value in the destination database.

AI Synthesizer

Within a table, the AI synthesizer uses the columns that are assigned the AI Synthesizer to train a model and generate the synthetic data.
It uses deep neural networks for high-fidelity data mimicking.
For details, see Using the AI Synthesizer.

Algebraic

The algebraic generator identifies the algebraic relationship between three or more numeric values and generates new values to match. At least one of the values must be a non-integer.
If a relationship cannot be found, then the generator defaults to the Categorical generator.
This generator can be linked with other Algebraic generators.
Consistency
No, cannot be made consistent.
Linking
Yes, can be linked.
Differential privacy
No
Data-free
No
To configure the generator, from the Link To dropdown list, select the columns to link this column to. You can select other columns that are assigned the Algebraic generator.
You must select at least three columns.
The column values must be numeric. At least one of the columns must contain a value other than an integer.

Alphanumeric String Key

Generates unique alphanumeric strings of the same length as the input. For example, for the origin value ABC123, the output value is a six-character alphanumeric string such as D24N05.
Can be assigned to primary key columns.
Can be assigned to columns that have uniqueness constraints.
Consistency
Yes, can be made self-consistent.
Linking
No, cannot be linked.
Differential privacy
No
Data-free
No
To configure the generator, toggle the Consistency setting to indicate whether to make the generator self-consistent.
By default, the generator is not consistent.

Array Character Scramble

A version of the Character Scramble generator that can be used for array values.
This generator replaces letters with random other letters, and numbers with random other numbers. Punctuation and whitespace are preserved.
For example, for the following array value:
["ABC.123", 3, "last week"]
The output might be something like:
["KFR.860", 7, "sdrw mwoc"]
This generator securely masks letters and numbers. There is no way to recover the original data.
Consistency
Yes, can be made self-consistent.
Linking
No, cannot be linked.
Differential privacy
No
Data-free
No
To configure the generator, toggle the Consistency setting to indicate whether to make the generator self-consistent.
By default, the generator is not consistent.

Array JSON Mask

A version of the JSON Mask generator that can be used for array values.
Runs a selected generator on values that match a user-specified JSONPath.
Consistency
Determined by the specified sub-generators.
Linking
Determined by the specified sub-generators.
Differential privacy
Determined by the specified sub-generators.
Data-free
Determined by the specified sub-generators.
To configure the generator:
  1. 1.
    To assign a generator to a path expression:
    1. 1.
      Under Sub-generators, click Add Generator. On the sub-generator configuration panel, the Cell JSON field contains a sample value from the source database. You can use the previous and next icons to page through different values.
    2. 2.
      In the Path Expression field, type the JSONPath expression to identify the value to apply the generator to. Matched JSON Values shows the result from the value in Cell JSON.
    3. 3.
      By default, the selected generator is applied to any value that matches the expression. To limit the types of values to apply the generator to, from the Type Filter, specify the applicable types. You can select Any, or you can select any combination of String, Number, and Null.
    4. 4.
      From the Generator Configuration dropdown list, select the generator to apply to the path expression. You cannot select another composite generator.
    5. 5.
      Configure the selected generator.
    6. 6.
      To save the configuration and immediately add a generator for another path expression, click Save and Add Another. To save the configuration and close the add generator panel, click Save.
  2. 2.
    From the Sub-Generators list:
    1. 1.
      To edit a generator assignment, click the edit icon.
    2. 2.
      To remove a generator assignment, click the delete icon.
    3. 3.
      To move a generator assignment up or down in the list, click the up or down arrow.

Array Regex Mask

A version of the Regex Mask generator that can be used for array values.
Uses regular expressions to parse strings and replace specified substrings with the output of specified generators. The parts of the string to replace are specified inside unnamed top-level capture groups.
Consistency
Determined by the selected sub-generators.
Linking
Determined by the selected sub-generators.
Differential privacy
Determined by the selected sub-generators.
Data-free
Determined by the selected sub-generators.
To configure the generator:
  1. 1.
    To add a regular expression:
    1. 1.
      Click Add Regex. On the configuration panel, Cell Value shows a sample value from the source database. You can use the previous and next options to navigate through the values.
    2. 2.
      By default, Replace all matches is enabled. To only match the first occurrence of a pattern, toggle Replace all matches to the off position.
    3. 3.
      In the Pattern field, enter a regular expression. If the expression is valid, then Tonic displays the capture groups for the expression.
    4. 4.
      For each capture group, to select and configure the generator to apply, click the selected generator. You cannot select another composite generator.
    5. 5.
      To save the configuration and immediately add a generator for another path expression, click Save and Add Another. To save the configuration and close the add generator panel, click Save.
  2. 2.
    From the Regexes list:
    1. 1.
      To edit a regex, click the edit icon.
    2. 2.
      To remove a regex, click the delete icon.

ASCII Key

Generates unique alpha-numeric strings based on any printable ASCII characters. The length of the source string is not preserved.
Can be assigned to primary key columns.
Can be assigned to columns that have uniqueness constraints.
Consistency
Yes, can be made self-consistent.
Linking
No, cannot be linked.
Differential privacy
No
Data-free
No
To configure the generator, toggle the Consistency setting to indicate whether to make the generator consistent.
By default, the generator is not consistent.

Categorical

A categorical generator creates values at the same frequency and using the same values as the underlying data. In other words, it shuffles the existing values within a field. Note that NULL is considered a separate value.
It maintains the values and value frequency, but disassociates the values from other pieces of data.
For example, a column contains the values Small, Medium, and Large. Small appears 3 times, Medium appears 4 times, and Large appears 5 times. In the output data, each value still appears the same number of times, but the values are shuffled to different rows.
This generator is optimized for categories with fewer than 10,000 unique values. If your underlying data has more unique values (for example, your field is populated by freeform text entry), we recommend that you use the Character Scramble or Custom Categorical generator.
Consistency
No, cannot be made consistent.
Linking
Yes, can be linked.
Differential privacy
Configurable
Data-free
No
To configure the generator:
  1. 1.
    From the Link To dropdown, select the columns to link to the current column. You can select from other columns that use the Categorical generator.
  2. 2.
    Toggle the Differential Privacy setting to indicate whether to make the output data differentially private. By default, differential privacy is disabled.

Character Scramble

This generator replaces letters with random other letters and numbers with random other numbers. Punctuation, whitespace, and mathematical symbols are preserved.
For example, for the following input string:
ABC.123 123-456-789 Go!
The output would be something like:
PRX.804 296-915-378 Ab!
This generator securely masks letters and numbers. There is no way to recover the original data.
Consistency
Yes, can be made self-consistent
Linking
No, cannot be linked
Differential privacy
No
Data-free
No
To configure the generator, toggle the Consistency setting to indicate whether to make the generator self-consistent.
By default, the generator is not consistent.

Character Substitution

Performs a random character replacement that preserves formatting (spaces, capitalization, and punctuation).
Characters are replaced with other characters from within the same Unicode Block.
For example, for the following input string:
Miami Store #162
The output would be something like:
Vgkjg Gmlvf #681
This generator can be assigned to columns that have uniqueness constraints.
Consistency
This generator is implicitly self-consistent. You do not specify whether the generator is consistent. Every occurrence of a character always maps to the same substitute character. Because of this, it can be used to preserve a join between two text columns, such as a join on a name or email.
Linking
No, cannot be linked.
Differential privacy
No
Data-free
No
The Character Substitution generator has no configuration options.

Company Name

Generates a random company name-like string.
Consistency
Yes, can be made self-consistent or consistent with another column.
Linking
No, cannot be linked.
Differential privacy
Yes, if consistency is not enabled.
Data-free
Yes, if consistency is not enabled.
To configure the generator, toggle the Consistency setting to indicate whether to make the generator consistent.
By default, the generator is not consistent.
If consistency is enabled, then by default it is self-consistent. To make the generator consistent with another column, from the Consistent to dropdown list, select the column.
When the generator is consistent with itself, then a given source value is always mapped to the same destination value. For example, My Company is always mapped to New Company.
When the generator is consistent with another column, then a given source value in that other column always results in the same destination value for the company name column. For example, if the company name column is consistent with a name column, then every instance of John Smith in the name column in the source database has the same company name in the destination database.

Conditional

Applies different generators to the value conditionally based on any value in the table.
For example, a Users table contains Name, Username, and Role columns. For the Username column, you can use a conditional generator to indicate that if the value of Role is something other than Test, then use the Character Scramble generator for the Username value. For Test users, the name is not masked.
It can be assigned to columns that have uniqueness constraints.
Consistency
Determined by the selected generators.
Linking
Determined by the selected generators.
Differential privacy
Determined by the selected generators.
Data-free
Determined by the selected generators.
The generator consists of a list of options. Each option includes the required conditions and the generator to use if those conditions are met.
The generator always contains a Default option. The Default option is used if the value does not meet any of the conditions. To configure the Default option:
  1. 1.
    From the Default dropdown list, select the generator to use by default.
  2. 2.
    Configure the selected generator.
To add a condition option:
  1. 1.
    Click + Conditional Generator.
  2. 2.
    To add a condition:
    1. 1.
      Click + Condition.
    2. 2.
      From the column list, select the column for which to check the value.
    3. 3.
      Select the comparison type.
    4. 4.
      Enter the column value to check for.
    To remove a condition, click the delete icon for the condition.
  3. 3.
    From the Generator dropdown list, select the generator to run on the current column if the conditions are met. You cannot select another composite generator.
  4. 4.
    Choose the configuration options for the selected generator.
To view details for and edit a condition option, click the expand icon for that option.
To remove a condition option, click the delete icon for the option.

Constant

Uses a single value to mask all of the values in the column.
For example, you can replace every value in a string column with the String1. Or you can replace every value in a numeric column with the value 12345.
Consistency
No, cannot be made consistent.
Linking
No, cannot be linked.
Differential privacy
Yes
Data-free
Yes
To configure the generator, in the Constant Value field, provide the value to use.
The value must be compatible with the field type. For example, you cannot provide a string value for an integer column.

Continuous

Generates a continuous distribution to fit the underlying data.
This generator can be linked to other Continuous generators to create multivariate distributions and can be partitioned by other columns.
Consistency
No, cannot be made consistent.
Linking
Yes, can be linked.
Differential privacy
Configurable
Data-free
No
To configure the generator:
  1. 1.
    From the Link To drop-down list, select the other Continuous generator columns to link to. The linking creates a multivariate distribution.
  2. 2.
    From the Partition By drop-down list, select one or more columns to use to partition the data. The selected columns must have the generator set to either Passthrough or Categorical. For more information about partitioning and how it works, see Partitioning a column.
  3. 3.
    Toggle the Differential Privacy setting to indicate whether to make the output data differentially private. By default, the generator is not differentially private.

Cross Table Sum

Links columns in two tables. This column value is the sum of the values in a column in another table.
This generator does not provide a preview. The sums are not computed until the other table is generated.
For example, a Customers table contains a Total_Sales column. The Transactions table uses a foreign key Customer_ID column to identify the customer who made the transaction, and an Amount column that contains the amount of the sale. The Customer_ID value in the Transactions table is a value from the ID primary key column in the Customers table.
You assign the Cross Table Sum generator to the Total_Sales column. In the generator configuration, you indicate that the value is the sum of the Amount values for the Customer_ID value that matches the primary key ID value for the current row.
For the Customers row for ID 123, the Total_Sales column contains the sum of the Amount column for Transactions rows where Customer_ID is 123.
Consistency
No, cannot be made consistent.
Linking
No, cannot be linked.
Differential privacy
No
Data-free
No
To configure the generator:
  1. 1.
    From the Foreign Table dropdown list, select the table that contains the column for which to sum the values.
  2. 2.
    From the Foreign Key dropdown list, select the foreign key. The foreign key identifies the row from the current table that is referred to in the foreign table.
  3. 3.
    From the Sum Over dropdown list, select the column for which to sum the values.
  4. 4.
    From the Primary Key dropdown list, select the primary key for the current table.

CSV Mask

Masks text columns by parsing the values as rows whose columns are delimited by a specified character.
You can assign specific generators to specific indexes. You can also use the generator that is assigned to a specific index as the default. This applies the generator to every index that does not have an assigned generator.
The output value maintains the quotes around the index values.
For example, a column contains the following value:
"first","second","third"
You assign the Character Scramble generator to index 0 and assign Passthrough to index 2. You select index 0 as the index to use for the default generator.
In the output, the first and second values are masked by the Character Scramble generator. The third value is not masked. The output looks something like:
"wmcop", "xjorsl", "third"
Consistency
Determined by the selected sub-generators.
Linking
Determined by the selected sub-generators.
Differential privacy
Determined by the selected sub-generators.
Data-free
Determined by the selected sub-generators.
To configure the generator:
  1. 1.
    In the Delimiter field, type the delimiter that is used as a separator for the value. For example, for the value "first","second","third", the delimiter is a comma.
  2. 2.
    You can configure a generator for any or all of the indexes. To add a sub-generator for an index:
    1. 1.
      Under Sub-Generators, click Add Generator. On the add generator dialog, the Cell CSV field contains a sample value from the source data. You can use the navigation icons to page through the values.
    2. 2.
      In the CSV Index field, type the index to assign a generator to. The index numbers start with 0. You cannot use an index that already has an assigned generator. Matched CSV values shows the value at that index for the current sample column value.
    3. 3.
      Under Generator Configuration, from the Select a Generator dropdown list, select the generator to use for the selected index. You cannot select another composite generator. To remove the selection, click the delete icon.
    4. 4.
      Configure the selected generator.
    5. 5.
      To save the configuration and immediately add a generator for another index, click Save and Add Another. To save the configuration and close the add generator panel, click Save.
  3. 3.
    From the Sub-Generators list:
    1. 1.
      To edit a generator assignment, click the edit icon.
    2. 2.
      To remove a generator assignment, click the delete icon.
    3. 3.
      To move a generator assignment up or down in the list, click the up or down arrow.
  4. 4.
    After you configure a generator for at least one index, the Default Link dropdown list is displayed. From the Default Link dropdown list, select the index to use to determine how to mask values for indexes that do not have an assigned generator. For example, you assign the Character Scramble generator to index 2. If you set Default Link to 2, then all indexes that do not have an assigned generator use the Character Scramble generator.

Custom Categorical

A version of the Categorical generator that selects from values that you provide instead of shuffling the original values.
Consistency
Yes, can be made self-consistent.
Linking
Yes, can be linked.
Differential privacy
Yes, if consistency is not enabled.
Data-free
Yes, if consistency is not enabled.
To configure the generator:
  1. 1.
    From the Link To dropdown list, select the columns to link this column to. You can only select other columns that use the Custom Categorical generator.
  2. 2.
    In the Custom Categories text area, enter the list of values that the generator can choose from. Put each value on a separate line.
  3. 3.
    Toggle the Consistency setting to indicate whether to make the generator self-consistent. By default, the generator is not consistent.

Date Truncation

Truncates a date value or a timestamp to a specific part.
For a date or a timestamp, you can truncate to the year, month, or day.
For a timestamp, you can also truncate to the hour, minute, or second.
Consistency
No, cannot be made consistent.
Linking
No, cannot be linked.
Differential privacy
No
Data-free
No
To configure the generator:
  1. 1.
    From the dropdown list, select the part of the date or timestamp to truncate to. For both date and timestamp values, you can truncate to the year, month, or day. When you select one of these options, the time portion of a timestamp is set to 00:00:00. For the date, the values below the selected truncation value are set to 01. For example, when you truncate to month, the day value is set to 01, and the timestamp is set to 00:00:00. For a timestamp value, you also can truncate to the hour, minute, or second. The date values remain the same as the original data. The time values below the selected truncation value are set to 00. For example, when you truncate to minute, the seconds value is set to 00.
  2. 2.
    Toggle the Birth Date option. When you enable Birth Date, the generator shifts dates that are more than 90 years before the generation date to the date exactly 90 years before the generation date. For example, a generation occurs on January 1, 2023. Any date that occurs before January 1, 1933 is changed to January 1, 1933.
    This is mostly intended for birthdate values, to group birthdates for everyone who is older than 90 into a single year. This is used to comply with HIPAA Safe Harbor.
Here are examples of date and time values and how the selected truncation affects the output:
Option
Date value
Timestamp value
Original value
2021-12-20
2021-12-20 13:42:55
Truncate to year
2021-01-01
2021-01-01 00:00:00
Truncate to month
2021-12-01
2021-12-01 00:00:00
Truncate to day
2021-12-20
2021-12-20 00:00:00
Truncate to hour
Not applicable
2021-12-20 13:00:00
Truncate to minute
Not applicable
2021-12-20 13:42:00
Truncate to second
Not applicable
2021-12-20 13:42:55

Email

This generator scrambles the characters in an email address. It preserves formatting and keeps the @ and . characters.
For example, for the following input value:
The output value would be something like:
By default, the generator scrambles the domain. You can configure the generator to not mask specific domains. You can also specify a domain to use for all of the output email addresses.
For example, if you configure the generator to not scramble the domain company.com, then the output for [email protected] would look something like:
This generator securely masks letters and numbers. There is no way to recover the original data.
Consistency
Yes, can be made self-consistent.
Linking
No, cannot be linked.
Differential privacy
No
Data-free
No
To configure the generator:
  1. 1.
    In the Email Domain field, enter a domain to use for all of the output values. For example, use @mycompany.com for all of the generated values. The generator scrambles the content before the @.
  2. 2.
    In the Excluded Email Domains field, enter a comma-separated list of domains for which email addresses are not masked in the output values. This allows you, for example, to maintain internal or testing email addresses that are not considered sensitive.
  3. 3.
    Toggle the Replace invalid emails setting to indicate whether to replace an invalid email address with a generated valid email address. By default, invalid email addresses are not replaced. In the replacement values, the username is generated. If you specify a value for Email Domain, then the email addresses use that domain. Otherwise, the domain is generated.
  4. 4.
    Toggle the Consistency setting to indicate whether to make the column self-consistent. By default, consistency is disabled.

Event Timestamps

Generates timestamps fitting an event distribution. The source timestamp must include a date. It cannot be a time-only value.
Link columns to create a sequence of events across multiple columns. This generator can be partitioned by other columns.
Consistency
No, cannot be made consistent.
Linking
Yes, can be linked.
Differential privacy
No
Data-free
No
To configure the generator:
  1. 1.
    From the Link To dropdown list, select the other Event Timestamps generator columns to link this column to. Linking creates a sequence across multiple columns.
  2. 2.
    From the Partition drop-down list, select one or more columns to use to partition the data. The selected columns must have their generator set to either Passthrough or Categorical. For more information about partitioning and how it works, see Partitioning a column.
  3. 3.
    The Options list displays the current column and linked columns. Use the Up and Down buttons to configure the column sequence.

File Name

This generator scrambles characters while preserving formatting and keeping the file extension intact.
For example, for the following input value:
DataSummary1.pdf
The output value would look something like:
RsnoPwcsrtv5.pdf
This generator securely masks letters and numbers. There is no way to recover the original data.
Consistency
Yes, can be made self-consistent.
Linking
No, cannot be linked.
Differential privacy
No
Data-free
No
To configure the generator, toggle the Consistency setting to indicate whether to make the generator self-consistent.
By default, the generator is not consistent.

Find and Replace

This generator replaces all instances of the find string with the replace string.
For example, you can indicate to replace all instances of abc with 123.
Consistency
No, cannot be made consistent.
Linking
No, cannot be linked.
Differential privacy
No
Data-free
No
To configure the generator:
  1. 1.
    In the Find field, type the string to look for in the source column value. To use a regular expression to identify the source value, check the Use Regex check box. If you use a regular expression, use backslash ( \ ) as the escape character.
  2. 2.
    In the Replace field, type the string to replace the matching string with.

Geo

This generator can be used to mask columns of latitude and longitude.
It can be assigned to columns that have uniqueness constraints.
The Geo generator divides the globe into grids that are approximately 4.9 x 4.9 km. It then counts the number of points within each grid.
During data generation, each (latitude, longitude) pair is mapped to its grid.
  • If the grid contains a sufficient number of points to preserve privacy, then the generator returns a randomly chosen point in that grid.
  • If the grid does not contain enough points to preserve privacy, then the generator returns a random coordinate from the nearest grid that contains enough points.
Consistency
No, cannot be made consistent.
Linking
Yes, can be linked.
Differential privacy
No
Data-free
No
To configure the generator:
  1. 1.
    From the Link To dropdown list, select the column to link to this one. You typically assign the Geo generator to both the latitude and longitude column, then link those columns.
  2. 2.
    From the value type dropdown, select whether this column contains a latitude value or a longitude value.

HIPAA Address

This generator can be used to generate cities, states, and zip codes that follow HIPAA guidelines for safe harbor.
Zip Codes
How the HIPAA Address generator handles zip codes is based on whether the Replace zeros in truncated Zip Code toggle in the generator configuration is off or on.
By default, the setting is off. In this case, the last two digits of the zip code in the column are replaced with zeros, unless the zip code is a low population area as designated by the current census. For a low population area, all of the digits in the zip code are replaced with zeros.
If the setting is on, then the generator selects a real zip code that starts with the same three digits as the original zip code. For a low population area, if a state is linked, then the generator selects a random zip code from within that state. Otherwise the generator selects a random zip code from the United States.
Cities
When a zip code column is not linked, a random city is chosen in the United States. When a zip code is already added to the link, a city is chosen at random that has at least some overlap with the zip code.
If the original zip code is designated as a low population area then a random city is chosen within the state, this is done only if the user has linked a State column. If they have not, a random city within the United States is chosen.
For example, if the original city and zip code were (Atlanta, 30305), the zip code would be replaced with 30300. There are many cities that contain zip codes beginning in 303 such as Atlanta, Decatur, Chamblee, Hapeville, Dunwoody, College Park, etc.). One of these cities is chosen at random so that our final value is (Chamblee, 30300), for example.
States
HIPAA guidelines allow for information at the state level to be kept. Therefore, these values are passed through.
Latitude and longitude (GPS) coordinates
GPS coordinates are randomly generated in descending order of dependence of the linked HIPAA address components:
  1. 1.
    If a zip code is linked, a random point within the same 3-digit zip code prefix is generated, if the 3-digit zip code prefix is not designated a low population area. If it is a low population area, use the linked state.
  2. 2.
    If a state is available and a zip code and city are not, or the zip code or city are in a 3-digit zip code prefix that is designated a low population area, then a random GPS coordinate is generated somewhere within the state.
  3. 3.
    If no zip code, city, or state is linked, or one or more of them were provided, but there was a problem generating a random GPS coordinate within the linked areas, then a GPS coordinate is generated at a random location within the United States.
Note: If the city component of the HIPAA address is linked with latitude and/or longitude, the GPS coordinate components are randomly generated independently of city.
Other address parts
All other address parts are generated randomly and hence their value is not influenced at all by the underlying value in the column.
Consistency
Yes, can be made self-consistent.
Linking
Yes, can be linked.
Differential privacy
No
Data-free
No
To configure the generator:
  1. 1.
    From the Link To dropdown list, select the other columns to link to. You can only select columns that are also assigned the HIPAA Address generator.
  2. 2.
    From the address part dropdown list, select the type of address value that is in the column.
  3. 3.
    Toggle the Replace zeros in truncated Zip Code setting how to generate zip codes. If the setting is off, then the last two digits are replaced with zero. For low population areas, the entire zip code is populated with zeroes. If the setting is on, then a real zip code is selected that starts with the first three digits of the original zip code. For low population areas, if a state is linked, a random zip code from the state is used. Otherwise, a random zip code from the United States is used.
  4. 4.
    Toggle the Consistency setting to indicate whether to make the column self-consistent. By default, consistency is disabled.

Spark support

Spark workspaces (Amazon EMR, Databricks, and self-managed Spark clusters) only support the following address parts:
  • City
  • City with State
  • City with State Abbr
  • State
  • State Abbr
  • US Address
  • US Address with Country
  • Zip Code
To mask other address parts, except for Latitude and Longitude, you can use the Address generator. Spark does not support the Latitude and Longitude address parts.

Hostname

Generates random host names, based on the English language.
Consistency
Yes, can be made self-consistent or consistent with another column.
Linking
No, cannot be linked.
Differential privacy
Yes, if consistency is not enabled.
Data-free
Yes, if consistency is not enabled.
To configure the generator, toggle the Consistency setting to indicate whether to make the generator consistent.
By default, the generator is not consistent.
If you enable consistency, then by default the generator is self-consistent. To make the generator consistent with another column, from Consistent to, select the column.
When the generator is consistent with itself, then a given value in the source database is mapped to the same value in the destination database. For example, Host123 in the source database always produces MyHostABC in the destination database.
When the generator is consistent with another column, then a given source value in the other column results in the same host name value in the destination database. For example, a host name column is consistent with a department column. Every instance of Sales in the source data is given the same host name in the destination database.

HStore Mask

Runs selected generators on specified key values in an HStore column in a PostgreSQL database. HStore columns contain a set of key-value pairs.
Consistency
Determined by the selected sub-generators.
Linking
Determined by the selected sub-generators.
Differential privacy
Determined by the selected sub-generators.
Data-free
Determined by the selected sub-generators.
To configure the generator:
  1. 1.
    To assign a generator to a key:
    1. 1.
      Under Sub-generators, click Add Generator. On the sub-generator configuration panel, the Cell HStore field contains a sample value from the source database. You can use the previous and next icons to page through different values.
    2. 2.
      Under Enter a key, enter the name of a key from the column value. Matched HStore Values shows the result from the value in Cell HStore.
    3. 3.
      From the Generator Configuration dropdown list, select the generator to apply to the key value. You cannot select another composite generator.
    4. 4.
      Configure the selected generator.
    5. 5.
      To save the configuration and immediately add a generator for another key, click Save and Add Another. To save the configuration and close the add generator panel, click Save.
  2. 2.
    From the Sub-Generators list:
    1. 1.
      To edit a generator assignment, click the edit icon.
    2. 2.
      To remove a generator assignment, click the delete icon.
    3. 3.
      To move a generator assignment up or down in the list, click the up or down arrow.

HTML Mask

This is a composite generator.
Masks text columns by parsing the contents as HTML, and applying sub-generators to specified path expressions.
If applying a sub-generator fails because of an error, the generator selected as the fallback generator is applied instead.
Path expressions are defined using the XPath syntax.
Consistency
Determined by the selected sub-generators.
Linking
Determined by the selected sub-generators.
Differential privacy
Determined by the selected sub-generators.
Data-free
Determined by the selected sub-generators.
To configure the generator:
  1. 1.
    To assign a generator to a path expression:
    1. 1.
      Under Sub-generators, click Add Generator. On the sub-generator configuration panel, the Cell HTML field contains a sample value from the source database. You can use the previous and next icons to page through different values.
    2. 2.
      In the Path Expression field, type the path expression to identify the value to apply the generator to. Matched HTML Values shows the result from the value in Cell HTML.
    3. 3.
      From the Generator Configuration dropdown list, select the generator to apply to the path expression. You cannot select another composite generator.
    4. 4.
      Configure the selected generator.
    5. 5.
      To save the configuration and immediately add a generator for another path expression, click Save and Add Another. To save the configuration and close the add generator panel, click Save.
  2. 2.
    From the Sub-Generators list:
    1. 1.
      To edit a generator assignment, click the edit icon.
    2. 2.
      To remove a generator assignment, click the delete icon.
    3. 3.
      To move a generator assignment up or down in the list, click the up or down arrow.
  3. 3.
    From the Fallback Generator dropdown list, select the generator to use if the assigned generator for a path expression fails. The options are:

Integer Key

Generates integer values that are between 0 and 2^32 - 1.
The input values must be in the range 0 to 2^31 - 1.
This generator can be assigned to primary key columns.
It can be assigned to columns that have uniqueness constraints.
Consistency
Yes, can be made self-consistent.
Linking
No, cannot be linked.
Differential privacy
Yes, if consistency is not enabled.
Data-free
Yes, if consistency is not enabled.
To configure the generator:
  1. 1.
    In the Minimum field, enter the minimum value to use for an output value. The minimum value cannot be larger than any of the values in the source data.
  2. 2.
    In the Maximum field, enter the maximum value to use for an output value. The maximum value cannot be smaller than any of the values in the source data.
  3. 3.
    Toggle the Consistency setting to indicate whether to make the column self-consistent. By default, consistency is disabled.

IP Address

Generates a random IP address formatted string.
Consistency
Yes, can be made self-consistent or consistent with another column.
Linking
No, cannot be linked.
Differential privacy
Yes, if consistency is not enabled.
Data-free
Yes, if consistency is not enabled.
To configure the generator:
  1. 1.
    In the Percent IPv4 field, type the percentage of output values that are IPv4 addresses. For example, if you set this to 60, then 60% of the generated IP addresses are IPv4 addresses, and 40% of the generated IP addresses are IPv6 addresses. If you set this to 100, then all of the generated IP addresses are IPv4 addresses. If you set this to 0, then all of the generated IP addresses are IPv6 addresses.
  2. 2.
    Toggle the Consistency setting to indicate whether to make the column consistent. By default, consistency is disabled.
  3. 3.
    If you enable consistency, then by default the generator is self-consistent. To make the generator consistent with another column, from the Consistent to dropdown list, select the column. When a generator is self-consistent, then a given value in the source database is always mapped to the same value in the destination database. When a generator is consistent with another column, then a given source value in that column always results in the same IP address value in the destination database. For example, an IP address column is consistent with a username column. For each instance of User1 in the source database, the value in the IP address column is the same.

JSON Mask

Runs a selected generator on values that match a user specified JSONPath.
If an error occurs, the selected fallback generator is used for the entirety of the JSON value.
Sub-generators are applied sequentially, from the sub-generator at the top of the list to the sub-generator at the bottom of the list.
If multiple JSONPath expressions point to the same key, the most recently added generator takes priority.
JSON paths can also contain regular expressions and comparison logic, which allows the configured sub-generators to be applied only when there are properties that satisfy the query.
For example, a column contains this JSON:
[ { file_name: "foo.txt", b: 10 }, ... ]
The following JSON path only applies to array elements that contain a file_name property that ends in .txt:
$.[?(@.file_Name =~ /^.*.txt$/)]
Consistency
Determined by the selected sub-generators.