1 of 78

Generators

Generators transform the data in a source database column. You assign the generators to use. Tonic Structural offers a variety of generators to transform different types of data. For the sensitive columns that it detects, Structural also recommends the generator configuration to use.

For Enterprise instances, generator presets allow you to configure custom configurations of generators that you can then assign to columns.

You can also view this .

About the available generators

Generator characteristics and types

Assigning a generator to a column

Generator summary

The following table summarizes the available generators. The table includes generator characteristics that you might take into account when you select the generator to use for a column.

Generator hints and tips also provides some suggestions for generators to use for specific use cases.

Information in the table

The generator summary includes the following columns:

Generator - The name of the generator, linked to the entry in the generator reference.
Description - An overview description of the generator.
Supported features - Includes the following information:
- The generator characteristics that the generator supports
- Whether the generator is a composite generator or a primary key generator
- The generator privacy ranking

Generator

Description

Supported features

Generator reference

This generator reference provides the details for each of the the supported generators in Tonic Structural.

Information provided for each generator

For each generator, the reference provides:

Overview description
A table that contains:
- Generator characteristics that you might want to take into account when you select the generator.
- The generator , which indicates the level of protection that the generator provides.
- The generator ID to use in the Structural API. The generator ID is linked to the API details for the generator.
Instructions for how to configure the generator

The generator characteristics include:

- Whether you configure the generator to base the the destination values on the source values.
- Whether you can link columns that use the generator to indicate that there is a relationship between them.
- Whether the generator supports differential privacy, which ensures that the source value cannot be reverse engineered from the output value.
- Whether the generator is data-free, meaning that the output data is completely unrelated to the source data.
- Whether you can assign the generator to primary key columns.
- Whether you can assign the generator to columns that require unique values.
- Whether the generator uses FPE to encrypt the values.

The generators are in alphabetical order by the generator name.

Here are some groupings to help to identify generators that are used for different types of values. also provides some suggestions for generators to use for specific uses cases.

Composite generators

Transform data that uses complex formats or based on a condition. For more information, go to .

Information type generators

These generators produce specific types of values.

(and the deprecated )

Datetime value generators

These generators are used to specifically transform datetime values.

Key generators

Intended for use with primary key columns. For more information, go to .

Numeric value generators

These generators are specifically intended to work with numeric values.

String value generators

These generators are useful for transforming string values that aren't covered by a specific information type generator.

Other value substitution and replacement generators

These generators perform other types of transformation on column values.

Address

Generates a random address-like string.

You can indicate which part of an address string that the column contains. For example, the column might contain only the street address or the city, or it might contain the full address.

Characteristics

How to configure

To configure the generator:

From the Link To dropdown list, select the columns to link this column to. You can link columns that use the Address generator to mask one of the following address components:
- City
- City State
- Country
- Country Code
- State
- State Abbreviation
- Zip Code
- Latitude
- Longitude
Note that when linked to another address column, a country or country code is always the United States.
From the address component dropdown list, select the address component that this column contains. The available options are:
- Building Number
- Cardinal Direction (North, South, East, West)
- City
- City Prefix (Examples: North, South, East, West, Port, New)
- City Suffix (Examples: land, ville, furt, town)
- City with State (Example: Spokane, Washington)
- City with State Abbr (Example: Houston, TX)
- Country (Examples: Spain, Canada)
- Country Code (Uses the 2-character country code. Examples: ES, CA)
- County
- Direction (Examples: North, Northeast, Southwest, East)
- Full Address
- Latitude (Examples: 33.51, 41.32)
- Longitude (Examples: -84.05, -74.21)
- Ordinal Direction (Examples: Northeast, Southwest)
- Secondary Address (Examples: Apt 123, Suite 530)
- State (Examples: Alabama, Wisconsin)
- State Abbr (Examples: AL, WI)
- Street Address (Example: 123 Main Street)
- Street Name (Examples: Broad, Elm)
- Street Suffix (Examples: Way, Hill, Drive)
- US Address
- US Address with Country
- Zip Code (Example: 12345)
Toggle the Consistency setting to indicate whether to make the column consistent. By default, the consistency is disabled.
If consistency is enabled, then by default, the generator is self-consistent. To make the generator consistent with another column, from the Consistent to dropdown list, select the column. When the Address generator is consistent with itself, then the same value in the source database is always mapped to the same destination value. For example, for a column that contains a state name, Alabama is always mapped to Illinois. When the Address generator is consistent with another column, then the same value in the other column always results in the same destination value for the address column. For example, if the address column is consistent with a name column, then every instance of John Smith in the name column in the source database has the same address value in the destination database.
If Structural data encryption is enabled, then to use it for this column, toggle Use data encryption process to the on position.

Spark supported address parts

For the Address generator, Spark workspaces (Amazon EMR, Databricks, and self-managed Spark clusters) only support the following address parts:

Building Number
City
Country
Country Code
Full Address
Latitude
Longitude
State
State Abbr
Street Address
Street Name
Street Suffix
US Address
US Address with Country
Zip Code

Algebraic

The algebraic generator identifies the algebraic relationship between three or more numeric values and generates new values to match. At least one of the values must be a non-integer.

If a relationship cannot be found, then the generator defaults to the Categorical generator.

This generator can be linked with other Algebraic generators.

Characteristics

How to configure

To configure the generator, from the Link To dropdown list, select the columns to link this column to. You can select other columns that are assigned the Algebraic generator.

You must select at least three columns.

The column values must be numeric. At least one of the columns must contain a value other than an integer.

If Structural data encryption is enabled, then to use it for this column, toggle Use data encryption process to the on position.

Alphanumeric String Key

Generates unique alphanumeric strings of the same length as the input.

For example, for the origin value ABC123, the output value is a six-character alphanumeric string such as D24N05.

Characteristics

How to configure

To configure the generator, toggle the Consistency setting to indicate whether to make the generator self-consistent.

By default, the generator is not consistent.

If is enabled, then to use it for this column, toggle Use data encryption process to the on position.

Array Character Scramble

A version of the generator that can be used for array values.

This generator replaces letters with random other letters, and numbers with random other numbers. Punctuation and whitespace are preserved.

For example, for the following array value:

["ABC.123", 3, "last week"]

The output might be something like:

["KFR.860", 7, "sdrw mwoc"]

This generator securely masks letters and numbers. There is no way to recover the original data.

Characteristics

How to configure

To configure the generator, toggle the Consistency setting to indicate whether to make the generator self-consistent.

By default, the generator is not consistent.

Array JSON Mask

This is a .

A version of the generator that can be used for array values.

Runs a selected generator on values that match a user-specified .

Characteristics

How to configure

Adding a sub-generator

To assign a generator to a path expression:

Under Sub-generators, click Add Generator. On the sub-generator configuration panel, the Cell JSON field contains a sample value from the source database. You can use the previous and next icons to page through different values.
In the Path Expression field, type the JSONPath expression to identify the value to apply the generator to. To populate a path expression, you can also click a value in the Cell JSON field. Matched JSON Values shows the result from the value in Cell JSON.
By default, the selected generator is applied to any value that matches the expression. To limit the types of values to apply the generator to, from the Type Filter, specify the applicable types. You can select Any, or you can select any combination of String, Number, and Null.
From the Generator Configuration dropdown list, select the generator to apply to the path expression. You cannot select another composite generator.
Configure the selected generator. You cannot configure the selected generator to be consistent with another column.
To save the configuration and immediately add a generator for another path expression, click Save and Add Another. To save the configuration and close the add generator panel, click Save.

Managing the sub-generator list

From the Sub-Generators list:

To edit a generator assignment, click the edit icon.
To remove a generator assignment, click the delete icon.
To move a generator assignment up or down in the list, click the up or down arrow.

Enabling data encryption

ASCII Key

Generates unique alpha-numeric strings based on any printable ASCII characters. The length of the source string is not preserved. You can choose to exclude lowercase letters from the generated values.

Characteristics

How to configure

To configure the generator:

To exclude lowercase letters from the generated values, toggle Exclude Lowercase Alphabet to the on position.
Toggle the Consistency setting to indicate whether to make the generator consistent. By default, the generator is not consistent.
If Structural data encryption is enabled, then to use it for this column, toggle Use data encryption process to the on position.

Business Name

Generates a random company name-like string.

Characteristics

How to configure

To configure the generator, toggle the Consistency setting to indicate whether to make the generator consistent.

By default, the generator is not consistent.

If consistency is enabled, then by default it is self-consistent. To make the generator consistent with another column, from the Consistent to dropdown list, select the column.

When the generator is consistent with itself, then a given source value is always mapped to the same destination value. For example, My Business is always mapped to New Business.

When the generator is consistent with another column, then a given source value in that other column always results in the same destination value for the company name column. For example, if the company name column is consistent with a name column, then every instance of John Smith in the name column in the source database has the same company name in the destination database.

Categorical

The Categorical generator shuffles the existing values within a field while maintaining the overall frequency of the values. It disassociates the values from other pieces of data. Note that NULL is considered a separate value.

For example, a column contains the values Small, Medium, and Large. Small appears 3 times, Medium appears 4 times, and Large appears 5 times. In the output data, each value still appears the same number of times, but the values are shuffled to different rows.

This generator is optimized for categories with fewer than 10,000 unique values. If your underlying data has more unique values (for example, your field is populated by freeform text entry), we recommend that you use the Character Scramble or Custom Categorical generator.

Characteristics

How to configure

To configure the generator:

From the Link To dropdown, select the columns to link to the current column. You can select from other columns that use the Categorical generator.
Toggle the Differential Privacy setting to indicate whether to make the output data differentially private. By default, differential privacy is disabled.
If Structural data encryption is enabled, then to use it for this column, toggle Use data encryption process to the on position.

Character Scramble

This generator replaces letters with random other letters and numbers with random other numbers. Punctuation, whitespace, and mathematical symbols are preserved.

For example, for the following input string:

ABC.123 123-456-789 Go!

The output would be something like:

PRX.804 296-915-378 Ab!

This generator securely masks letters and numbers. There is no way to recover the original data.

Character Scramble is similar to , with a couple of key differences.

While you can enable consistency for the entire value, Character Scramble does not always replace the same source character with the same destination character. Because there is no guarantee of unique values, you cannot use Character Scramble on unique columns.

Character Substitution, however, does always map the same source character to the same destination character. Character Substitution is always consistent, which makes it less secure than Character Scramble. You can use Character Substitution on unique columns.

Characteristics

How to configure

To configure the generator, toggle the Consistency setting to indicate whether to make the generator self-consistent.

By default, the generator is not consistent.

Character Substitution

Performs a random character replacement that preserves formatting (spaces, capitalization, and punctuation).

Characters are replaced with other characters from within the same Unicode Block. A given source character is always mapped to the same destination character. For example, M might always map to V.

For example, for the following input string:

Miami Store #162

The output would be something like:

Vgkjg Gmlvf #681

Note that for a numeric column, when a generated number starts with a 0, the starting 0 is removed. This could result in matching output values in different columns. For example, one column is changed to 113 and the other to 0113, which also becomes 113.

Character Substitution is similar to Character Scramble, with a couple of key differences. Because Character Substitution always maps the same source character to the same destination character, it is always consistent. It also can be used for unique columns.

In Character Scramble, the character mapping is random, which makes Character Scramble slightly more secure. However, Character Scramble cannot be used for unique columns.

Characteristics

How to configure

If Structural data encryption is enabled, then to use it for this column, toggle Use data encryption process to the on position.

Conditional

This is a composite generator.

Applies different generators to the value conditionally based on any value in the table.

For example, a Users table contains Name, Username, and Role columns. For the Username column, you can use a conditional generator to indicate that if the value of Role is something other than Test, then use the Character Scramble generator for the Username value. For Test users, the name is not masked.

Characteristics

How to configure

The generator consists of a list of options. Each option includes the required conditions and the generator to use if those conditions are met.

Setting the default generator

The generator always contains a Default option. The Default option is used if the value does not meet any of the conditions. To configure the Default option:

From the Default dropdown list, select the generator to use by default.
Configure the selected generator.

Adding a condition option

To add a condition option:

Click + Conditional Generator.
To add a condition:
1. Click + Condition.
2. From the column list, select the column for which to check the value.
3. Select the comparison type.
4. Enter the column value to check for.
To remove a condition, click the delete icon for the condition.
From the Generator dropdown list, select the generator to run on the current column if the conditions are met. You cannot select another composite generator.
Choose the configuration options for the selected generator.

Viewing and editing condition options

To view details for and edit a condition option, click the expand icon for that option.

Removing a condition option

To remove a condition option, click the delete icon for the option.

Continuous

Generates a continuous distribution to fit the underlying data.

This generator can be linked to other Continuous generators to create multivariate distributions and can be partitioned by other columns.

Characteristics

How to configure

To configure the generator:

From the Link To drop-down list, select the other Continuous generator columns to link to. The linking creates a multivariate distribution.
From the Partition By drop-down list, select one or more columns to use to partition the data. The selected columns must have the generator set to either Passthrough or Categorical. For more information about partitioning and how it works, go to Partitioning a column.
Toggle the Differential Privacy setting to indicate whether to make the output data differentially private. By default, the generator is not differentially private.
If Structural data encryption is enabled, then to use it for this column, toggle Use data encryption process to the on position.

Cross Table Sum

Links columns in two tables. This column value is the sum of the values in a column in another table.

This generator does not provide a preview. The sums are not computed until the other table is generated.

For example, a Customers table contains a Total_Sales column. The Transactions table uses a foreign key Customer_ID column to identify the customer who made the transaction, and an Amount column that contains the amount of the sale. The Customer_ID value in the Transactions table is a value from the ID primary key column in the Customers table.

You assign the Cross Table Sum generator to the Total_Sales column. In the generator configuration, you indicate that the value is the sum of the Amount values for the Customer_ID value that matches the primary key ID value for the current row.

For the Customers row for ID 123, the Total_Sales column contains the sum of the Amount column for Transactions rows where Customer_ID is 123.

Characteristics

How to configure

To configure the generator:

From the Foreign Table dropdown list, select the table that contains the column for which to sum the values.
From the Foreign Key dropdown list, select the foreign key. The foreign key identifies the row from the current table that is referred to in the foreign table.
From the Sum Over dropdown list, select the column for which to sum the values.
From the Primary Key dropdown list, select the primary key for the current table.
If Structural data encryption is enabled, then to use it for this column, toggle Use data encryption process to the on position.

CSV Mask

This is a composite generator.

Masks text columns by parsing the values as rows whose columns are delimited by a specified character.

You can assign specific generators to specific indexes. You can also use the generator that is assigned to a specific index as the default. This applies the generator to every index that does not have an assigned generator.

The output value maintains the quotes around the index values.

For example, a column contains the following value:

"first","second","third"

You assign the Character Scramble generator to index 0 and assign Passthrough to index 2. You select index 0 as the index to use for the default generator.

In the output, the first and second values are masked by the Character Scramble generator. The third value is not masked. The output looks something like:

"wmcop", "xjorsl", "third"

Characteristics

How to configure

Setting the delimiter

In the Delimiter field, type the delimiter that is used as a separator for the value.

For example, for the value "first","second","third", the delimiter is a comma.

Adding a sub-generator

You can configure a generator for any or all of the indexes. To add a sub-generator for an index:

Under Sub-Generators, click Add Generator. On the add generator dialog, the Cell CSV field contains a sample value from the source data. You can use the navigation icons to page through the values.
In the CSV Index field, type the index to assign a generator to. The index numbers start with 0. You cannot use an index that already has an assigned generator. Matched CSV values shows the value at that index for the current sample column value.
Under Generator Configuration, from the Select a Generator dropdown list, select the generator to use for the selected index. You cannot select another composite generator. To remove the selection, click the delete icon.
Configure the selected generator. You cannot configure the selected generator to be consistent with another column.
To save the configuration and immediately add a generator for another index, click Save and Add Another. To save the configuration and close the add generator panel, click Save.

Managing the sub-generator list

From the Sub-Generators list:

To edit a generator assignment, click the edit icon.
To remove a generator assignment, click the delete icon.
To move a generator assignment up or down in the list, click the up or down arrow.

Setting the default for indexes without a generator

After you configure a generator for at least one index, the Default Link dropdown list is displayed.

From the Default Link dropdown list, select the index to use to determine how to mask values for indexes that do not have an assigned generator.

For example, you assign the Character Scramble generator to index 2. If you set Default Link to 2, then all indexes that do not have an assigned generator use the Character Scramble generator.

Custom Categorical

A version of the Categorical generator that selects from values that you provide instead of shuffling the original values.

Characteristics

How to configure

To configure the generator:

From the Link To dropdown list, select the columns to link this column to. You can only select other columns that use the Custom Categorical generator.
In the Custom Categories text area, enter the list of values that the generator can choose from. Put each value on a separate line. To add a NULL value to the list, use the keyword {NULL}.
Toggle the Consistency setting to indicate whether to make the column consistent. By default, consistency is disabled.
If you enable consistency, then by default the generator is self-consistent. To make the generator consistent with another column, from the Consistent to dropdown list, select the column. When a generator is self-consistent, then a given value in the source database is always mapped to the same value in the destination database. When a generator is consistent with another column, then a given source value in that column always results in the same value for the current column in the destination database. For example, a department column is consistent with a username column. For each instance of User1 in the source database, the value in the department column is the same.
If Structural data encryption is enabled, then to use it for this column, toggle Use data encryption process to the on position.

Date Truncation

Truncates a date value or a timestamp to a specific part.

For a date or a timestamp, you can truncate to the year, month, or day.

For a timestamp, you can also truncate to the hour, minute, or second.

Characteristics

How to configure

To configure the generator:

From the dropdown list, select the part of the date or timestamp to truncate to. For both date and timestamp values, you can truncate to the year, month, or day. When you select one of these options, the time portion of a timestamp is set to 00:00:00. For the date, the values below the selected truncation value are set to 01. For example, when you truncate to month, the day value is set to 01, and the timestamp is set to 00:00:00. For a timestamp value, you also can truncate to the hour, minute, or second. The date values remain the same as the original data. The time values below the selected truncation value are set to 00. For example, when you truncate to minute, the seconds value is set to 00.
Toggle the Birth Date option. When you enable Birth Date, the generator shifts dates that are more than 90 years before the generation date to the date exactly 90 years before the generation date. For example, a generation occurs on January 1, 2023. Any date that occurs before January 1, 1933 is changed to January 1, 1933.
This is mostly intended for birthdate values, to group birthdates for everyone who is older than 89 into a single year. This is used to comply with HIPAA Safe Harbor.
If is enabled, then to use it for this column, toggle Use data encryption process to the on position.

Truncation examples

Here are examples of date and time values and how the selected truncation affects the output:

Event Timestamps

Generates timestamps fitting an event distribution. The source timestamp must include a date. It cannot be a time-only value.

Link columns to create a sequence of events across multiple columns. This generator can be partitioned by other columns.

Characteristics

How to configure

To configure the generator:

From the Link To dropdown list, select the other Event Timestamps generator columns to link this column to. Linking creates a sequence across multiple columns.
From the Partition drop-down list, select one or more columns to use to partition the data. The selected columns must have their generator set to either Passthrough or Categorical. For more information about partitioning and how it works, go to .
The Options list displays the current column and linked columns. Use the Up and Down buttons to configure the column sequence.
If is enabled, then to use it for this column, toggle Use data encryption process to the on position.

Reviewing and applying recommended generators

Required workspace permission: Configure column generators

The Tonic Structural sensitivity scan identifies specific types of sensitive data. For each sensitivity type that it detects, Structural can have a recommended generator. For example, for a value that the sensitivity scan identifies as a Social Security Number, Structural recommends the SSN generator. For a first name, Structural recommends the Name generator configured with First as the value type.

From Privacy Hub and Database View, you can review and apply the recommended generators.

Applying the recommended generator to a single column

Privacy Hub

In Privacy Hub, on the settings view of the column details panel, for a detected sensitive column that does not have an applied generator, and that has a recommended generator, Structural displays a button for the recommended generator.

To apply the recommended generator, click the button.

Database View

On Database View, when a column has a recommended generator, the generator dropdown displays the available recommendation icon.

To apply the recommended generator:

Click the generator dropdown.
On the recommended generator panel, click Apply.

Privacy Hub - reviewing and applying recommended generators by sensitivity type

When there are detected sensitive columns that are not protected, Privacy Hub displays a Sensitivity Recommendations banner. The banner displays the number of detected, unprotected columns.

To review the recommended generators, and determine whether to apply them, click Review Recommendations.

The Recommended Generators by Sensitivity Type panel displays the list of sensitivity types for which there are detected, unprotected columns.

Displaying the list of columns for a sensitivity type

To display the columns for a sensitivity type, click the expand icon for that type.

To hide the column list, click the collapse icon.

For each column, the list includes the following information:

The table and schema name
The column name, with the column data type
An example value from the source data (Original Data), with a corresponding destination value when the recommended generator is applied (Expected Output).

To display a larger sample of source and destination values, click the view icon in the Expected Output column.

Viewing linkable columns for addresses

Address columns that can be linked are displayed in groups.

For example, if a table includes columns for both city and state values, then those columns are displayed as a group. When you apply the recommended generators to the group, the columns are also linked.

There are separate Address entries for individual columns and for groups of columns to link.

Filtering the column lists

To filter the lists, you can use either:

Schema name
Table name
Column name

Start to type text in the schema, table, or column name. As you type, Structural applies the filter to all of the lists.

Selecting and deselecting columns

When you first display the panel, all of the columns are selected. The columns that are affected when you apply recommended generators or ignore columns.

Within each sensitivity type, you can select or deselect individual columns.

You can use the checkbox in the column heading to select or deselect all of the columns for a sensitivity type.

Enabling and disabling consistency for columns

Before you apply a recommended generator, you can enable or disable consistency for each individual column, or for all of the columns for a sensitivity type.

You can only enable self-consistency. You cannot configure consistency with other columns.

The recommended generators panel contains a Consistency toggle for each column. You use the toggle in the column heading to enable or disable consistency for all of the columns for a sensitivity type. If the recommended generator does not support consistency, then the toggle is disabled.

To enable self-consistency, toggle Consistency to the on position. To disable self-consistency, toggle Consistency to the off position.

Applying the recommended generator for a sensitivity type

To apply the recommended generator to the selected columns for a sensitivity type, click the Apply option for that sensitivity type.

When you apply the recommended generator, Structural removes the column from the list.

Removing the generator recommendation for a sensitivity type

If the recommended generator is incorrect, then you can ignore the recommendation.

To ignore the recommended generator for the selected columns in a sensitivity type:

Click the Ignore option for the sensitivity type.
In the Ignore dropdown list, click Ignore generator recommendation.

When you ignore the generator recommendation:

The column is removed from the list.
The recommended generator is removed. This includes the recommendation on the Privacy Hub column configuration panel.
The column continues to be marked as sensitive.

Marking a column as not sensitive

Required workspace permission: Configure column sensitivity

You can mark selected columns for a sensitivity type as not sensitive. For example, a value might be correctly identified as a first name, but be a test value that is not actually sensitive and does not need to be transformed.

To mark selected columns in a sensitivity type as not sensitive:

Click the Ignore option for the sensitivity type.
In the Ignore dropdown list, click Mark as not sensitive.

When you mark a column as not sensitive, it is removed from the list.

Applying the recommended generator to all of the selected columns.

To apply the recommended generators to all of the selected columns across all of the sensitivity types, click Apply All.

Database View - Applying recommended generators to multiple columns

On Database View, the Bulk Edit panel includes an option to apply the recommended generators to the selected columns for which there is an available recommendation.

From Database View, to apply recommended generators to multiple columns:

Check the checkbox for each column to update.
Click Bulk Edit.
On the bulk editing panel, in the Generator recommendations found panel, click Apply.

Generator hints and tips

These hints and tips can help you to choose generators and address some specific use cases.

Recommended generators for specific types of data

Names

Tonic Structural provides several options for de-identifying names of individuals names. The method that you select depends on the specific use case, including the required realism of the output and privacy needs.

The following are a few of the generator options and how and why you might use them.

Name generator Randomly returns a name from a dictionary of primarily Westernized names, unrelated to the original value. Can provide complete privacy, unless you use Consistency. The output is realistic because the values returned are real names.
Categorical generator This generator shuffles all of the values in the field while preserving the overall frequency of the values. It ensures that the output contains realistic-looking names, and that the output uses the names from the original data set. This can be beneficial if the original data contains, for example, names that are common to a particular region and that should be maintained. When you use this generator with the Differential Privacy option, it ensures the output is secure from re-identification. However, if the source data set is small or each name is highly unique, Structural might prevent you from using this option.
Custom Categorical Allows you to provide your own dictionary of values. These values are included in the output at the same frequency that the original values occur in the source data.
Character Scramble Randomly replaces characters with other characters. The output does not provide realistic looking names, but it provides a high level of privacy that prevents recovery of the original data. It does preserve whitespace, punctuation (such as hyphenated names), and capitalization. Because it is a character-level replacement, it preserves the length of the input string.
Character Substitution Similar to Character Scramble, but uses a single character mapping throughout the generated data. This reduces the privacy level, but ensures consistency and uniqueness. This generator also has more support for additional unicode blocks to ensure that the output characters more closely match the input. This might be helpful if the input includes names with characters outside of the basic Latin (a-z, A-Z) characters.

Dates, events, timestamps

Rows of data often have multiple date or timestamp fields that have a logical dependency, such as START_DATE and END_DATE.

In this case, a randomly generated date is not viable, because it could produce a nonsensical output where events occur chronologically out of order.

The following generator options handle these scenarios:

Timestamp Shift generator (with Consistency) To solve the problem described above, you ensure that two or more timestamps are randomly shifted by the same amount instead of independently from each other. The key is to use the consistency option. For example, a row of data represents an individual that is identified by a primary key of PERSON_ID. The row also contains START_DATE and END_DATE columns. You can apply a timestamp shift to the START_DATE and END_DATE columns within a desired range, and make both columns consistent to PERSON_ID. Whenever the generator encounters the same PERSON_ID value, it shifts the dates by the same amount.
Event Timestamps generator You can apply the Event Timestamps generator to multiple date columns on the same table. You can link them to follow the underlying distribution of dates. For more information, go to the blog post Simulating event pipelines for fun and profit (and for testing too).
Date Truncation generator This generator can sometimes address the described problem. You can configure this generator to truncate the input to the year, month, day, hour, minute, or second. It guarantees that a secondary event does not occur BEFORE a primary event. However, truncation might cause them to become the same date value or timestamp. Whether you can use this generator for this purpose depends on the typical time separation between the two events relative to the truncation option, and whether truncation provides an adequate level of privacy for the particular use case.

Free text

Free text refers to text fields in the source database that might come from an "uncontrolled" source such as user text entry. In these cases, any record might or might not contain sensitive information.

Some possible examples include:

Notes from a doctor or healthcare provider that contain Protected Health Information (PHI)
Other personally identifiable information, such as a Social Security number or telephone number, that a user enters into an open-ended text entry form

Structural provides several suitable options. The method that you select depends on the specific use case, including the required realism of the output and any privacy requirements.

Here are a few generator options for free text fields, with information on how and why you might use them.

Character Scramble generator Randomly replaces characters with other characters. The output does not contain meaningful text, but it provides a high level of privacy that prevents recovery of the original data. The Character Scramble generator does preserve whitespace, punctuation, and capitalization. Because it is a character-level replacement, it preserves the length of the input string.
Regex Mask generator Uses regular expressions to parse strings. It then replaces specified substrings with the output of selected generators. The parts of the string to replace are specified in unnamed top-level capture groups. The Regex Mask generator can preserve more realism of the underlying text, but introduces privacy risks. Any sensitive information that does not conform to a known and configured pattern is not captured and replaced. As an example of matching specific formats, a configuration that includes the following two patterns would replace both telephone numbers that use the ###-###-#### format, and SSNs that use the ###-##-#### format, but leave the surrounding text unmodified: SSN: ([0-9]{3}-[0-9]{2}-[0-9]{4}) Telephone Number: ([0-9]{3}-[0-9]{3}-[0-9]{4}) You can configure multiple regular expression patterns to handle all known or expected sensitive information formats. You cannot use this method to replace values that you cannot use a regular expression to reliably identify, such as names within free text. When you use this option, make sure to enable Replace all matches for each pattern.
Constant, Custom Categorical, and Null generators Each of these options provides the highest level of privacy, because they completely remove or replace the original text. You might use each one for different reasons:
- Null: If the field is nullable and the use case does not require any data in the field, you can use the Null generator to replace the values with NULL.
- Constant: Allows you to provide a fixed value to replace all of the source value. For example, you could provide a "Lorem ipsum" string or other dummy value that is appropriate for your data set.
- Custom Categorical: Similar to the Constant generator, it replaces the original value with a fixed value. To increase the cardinality of the output, you enter a list of possible values. The values are randomly used on the output records.

Maintaining empty values

Most Structural generators preserve NULL values that are in the data.

They do not automatically preserve empty values.

To make sure that any empty values stay empty in the destination database:

Assign the Conditional generator to the column.
For the default generator, select the generator to apply to the non-empty values.
Create a condition to look for empty values. You can either:
- Use the regex comparison against the regex whitespace value (\s*).
- Use the = operator and leave the value empty or empty except for a single space.
If you are not sure which characters the empty strings use, the regex option is more flexible. However, it is less efficient.
For the empty value condition, set the generator to Passthrough.

Path expressions to replace all text values

You sometimes might want to apply the same generator to all of the text values in a JSON, HTML, or XML value. For example, you might want to apply the Character Scramble to all of the text.

Instead of creating separate path expressions for each path, you can use one or two path expressions that capture all of the values.

For the Array JSON Mask or JSON Mask generator, the path expression $..* captures all of the text values. You can then select the generator to apply to the values.

For the HTML Mask and XML Mask generators, you create two path expressions:

//text() gets all of the text nodes.
//@* gets all of the attribute values.

You apply the generator to each expression.

Sub-generators are applied sequentially. You can apply the wildcard paths in addition to more specific paths and generators.

For example, one path expression references a specific name or address and uses the Name or Address generator. The wildcard path expressions use the Character Scramble generator to mask any unknown fields in the document that could contain sensitive information.

As another example, you might assign the Passthrough generator to specific known fields that never contain sensitive information.

Path expressions for XML with namespaces

When your XML includes namespaces, then to include the namespaces in the path expression, specify the elements as:

*[name()='namespace:elementName']

For example, for the following XML:

<ns0:Message xmlns:ns0=".">
    <ns0:Payload>
        <ns1:Customer xmlns:ns1=".">
            <ns1:name>
                Josh
            </ns1:name>
        </ns1:Customer>
    </ns0:Payload>
</ns0:Message>

A working XPath to mask the name value is:

/*[name()='ns0:Message']/*[name()='ns0:Payload']/*[name()='ns1:Customer']/*[name()='ns1:name']

Passing through default minimum and maximum date values

You might sometimes set default date values to the absolute minimum and maximum values that are allowed by the database. For example, for SQL Server, these values are January 1, 1753 and December 31, 9999.

When you assign the Timestamp Shift generator, the minimum value cannot be shifted backward and the maximum value cannot be shifted forward.

To skip those default values and shift the other values:

Assign the Conditional generator to the column.
For the default generator, select the Timestamp Shift generator.
Create conditions to look for the minimum or maximum values.
For those conditions, set the generator to Passthrough.

Use Regex Mask to add values

You might sometimes want to add values that are the output of a generator to the results of the transformation by another generator.

For example, you use Character Scramble to mask a username. You might also want to prefix the value with a fixed constant value, or append a sequential integer.

To accomplish this:

Apply the Regex Mask generator to the column.
In addition to the capture groups that are specific to your data:
- Use (^) as a capture group for a prefix.
- Use ($) as a capture group for a suffix.
- Use () as an empty group at any point in the regex pattern.
Apply the relevant generators to each capture group.

So to implement the example above (prefix with a constant, scramble the value, append a sequential integer), you provide the expression (^)(.*)()($).

This produces four capture groups:

Group 0 is for the prefix. You assign the Constant generator and provide the value to use as the prefix.
Group 1 captures all of the original values. You assign the Character Scramble generator.
Group 2 captures any empty values. You assign the Constant generator to provide a value to use for those values.
Group 3 is for the suffix. You assign the Sequential Integer generator.

Aligning email addresses to names

A table that contains user data might include both name and email address columns. If a user's email address is based on their name, then in the destination data, you might want to also tie the email addresses to the names.

For example, your email addresses might use the format firstName.lastName@mycompany.com. In the source data, the email address for John Smith is John.Smith@mycompany.com. In the destination data, assuming John Smith is replaced by Michael Jones, you want the email address to be Michael.Jones@mycompany.com.

At a high level, to line up name and email address columns:

Assign the Name generator to the name fields. Make the Name generator consistent with an identifier column.
Assign the Regex Mask generator to the email address field.
Create a regular expression that extracts to capture groups the name portion of the email address. The specific expression varies based on the email address format.
Assign the Name generator to each name capture group. Make the Name generator consistent with the same identifier column.

In this example, the source data contains userId, firstName, lastName, and emailAddress fields, and the email address is firstName.lastName@mycompany.com.

To ensure that the destination data email addresses are aligned to the destination data names:

For the firstName field, assign the Name generator, configured to produce a first name. Make the generator consistent with the userId column.
For the lastName field, assign the Name generator, configured to produce a last name. Make the generator consistent with the userId column.
For the emailAddress field, assign the Regex Mask generator. Use the following regular expression to extract the parts of the email address to capture groups: ([a-zA-Z]+).([a-zA-Z]+)@(.*)
For the first name and last name capture groups:
1. Assign the Name generator, configured to produce the first and last names.
2. Make the Name generator consistent with the userId column.

Enabling consistency

Consistency is an option for some generators that when turned on, maps the same input to the same output across an entire database.

Consistency can also be maintained across multiple databases of varying types. For example, if consistency is turned on for a name generator, it always maps the same input name (for example, Albert Einstein) to the same output (for example, Richard Feynman).

You can also view this video overview of consistency.

Why use consistency?

The primary reasons for using consistency are to:

Enable joining on columns that don't have explicit database constraints in the schema. This is often seen with values such as email addresses. With consistency, you can completely anonymize an email address and still use it in a join.
Preserve the approximate cardinality of a column. For example, a city column contains 50 different cities. To randomize this column but still have ~50 cities, you can use consistency to maintain the approximate cardinality. Because consistency does not guarantee uniqueness, the cardinality might change. However, it is guaranteed to not increase. If unique 1-to-1 mappings are required, a Key generator should be used.
Match duplicated data across 1 or more databases. For example, you have a user database that contains a username in both a column and a JSON blob, and another database that contains their website activity, identified by the same username values. To anonymize the username, but still have the username be the same in all locations/databases, use consistency.

Types of consistency

Self-consistency

Self-consistency indicates that the value in the destination database is consistent with the value of the same column in the source database.

For example, a column contains a first name. You make the assigned generator self-consistent. A given first name in the source database is always replaced by the same first name in the destination database. For example, the first name value John is always replaced by the value Michael.

Consistency with another column

Consistency with another column indicates that the value in the destination database is consistent with the value of a different column in the source database.

For example, a column contains an IP address. You make the assigned generator consistent with the username column. Every row that has the username User1 in the input database has the same IP address in the destination database.

When you select a generator as the sub-generator for a composite generator, in most cases you cannot configure the generator to be consistent with another column. Only the Conditional generator and the Regex Mask generator allow a sub-generator to be consistent with another column.

Note that consistency with another column cannot be configured in a generator preset. You can only configure it when you configure an individual column.

Enabling consistency

To enable consistency, on the generator configuration panel, toggle the Consistency switch.

Not all generators support consistency.

Consistency is a function of the both the data type and the value.

For example, a numeric field contains the value 123. A string/varchar field contains the value "123".

Both fields have consistent generators applied.

The output is not consistent between the two fields.

Consistency example

To demonstrate the effect of consistency on the output, we'll use a column that contains a first name, and that uses the Name generator.

Here is the sample input and output when consistency is not enabled:

In this sample data, the first name Melissa appears twice, but is mapped to Walton the first time and Linn the second time.

Here is the sample input and output when consistency is enabled:

In this case, the first name Melissa is mapped to Rosella both times.

Consistency considerations

Consistency does not imply uniqueness

A consistent generator ensures that the same input value always produces the same output value.

It does not guarantee that two different input values produce two different output values.

Consistent generators are not 1:1 mappings.

Consistency can reduce data privacy

Consistency can reduce the privacy of your data, because it reveals something about the frequency of the data values.

For example, if someone is familiar with the source data values and frequency, they might be able to connect the source and destination values. For example, they know that Jane appears 20 times and Michael appears 3 times in the source. When they see 20 instances of Susan and 3 instances of John, they might infer that Susan is mapped from Jane and John from Michael.

However, this risk does require some knowledge of the source data. Tonic Structural does not store mappings of the source data to the destination data. In other words, someone can see that in the destination data the name Susan appears 20 times and the name John appears 3 times. But without any knowledge of the source data, they cannot determine that Susan is mapped from Jane and John is mapped from Michael.

Also, the mapping of source to destination values is not guaranteed to be unique. Both Jane and Michael could be mapped to John. In that case there would be 23 instances of John, which would not match the frequency of a specific source value. To guarantee unique values, use a primary key generator.

Consistency is across an entire database

Any column, regardless of which table it resides in, is consistent with any other column that uses the same consistent generator.

For example, your database includes a Customers table and an Employees table. Each table contains a column for the first name of the customer or employee. You assign the Name generator to both columns to generate a first name, and make the generators consistent. The same first name value in either column is mapped to the same destination value. For example, the first name John is always mapped to Michael, whether the name John appears in the Customers table or the Employees table.

However, by default, consistency is not guaranteed between data generation runs, even if the run is on the same database.

Enabling consistency across runs or multiple databases

By default, consistency is only guaranteed across a single data generation for a single workspace.

For example, for a column that contains a first name value, you assign the Name generator and configure the generator to be consistent. The first time you run data generation, all instances of the name John might be replaced with Michael. The next time you run data generation, all instances of the name John might instead be replaced with Gregory.

You can enable consistency across runs and workspaces so that, for example, every time you run a data generation, John is always replaced with Michael.

To do this, you configure a seed value. You can either:

Configure the Structural environment setting TONIC_STATISTICS_SEED. This ensures consistency across all workspaces and data generation runs.
Configure a seed value for a workspace. This ensures consistency across all data generation runs for that workspace, as well as across other workspaces that have the same seed value.
Disable cross-data generation consistency for a workspace. This indicates to not have consistency across data generation runs or with other workspaces.

Configuring a Structural seed value

To ensure consistency across all data generations and workspaces, add the following environment setting to the Structural worker and web server containers:

TONIC_STATISTICS_SEED: <ANY 32-BIT SIGNED INTEGER>

When you configure a value for this environment setting, then consistency is across all data generations for all workspaces that do not either:

Have a workspace seed value configured.
Have disabled consistency across data generations.

Overriding the Structural seed value for a workspace

For an individual workspace, you can override the Structural seed value. When you override the Structural seed value, you can either:

Disable consistency across data generation runs for the workspace.
Provide a seed value for the workspace.

When a workspace has a configured seed value, then consistency is across the data generation runs for that workspace.

Consistency is also across all of the data generations for all of the workspaces that have the same seed value.

On the workspace details view, to override the Structural seed value:

Toggle Override Statistics Seed to the on position.
To disable consistency across data generations, click Don't use consistency.
To provide a seed value for the workspace:
1. Click Consistency value.
2. In the field, enter the seed value. It must be a 32-bit signed integer. The value defaults to the current value of TONIC_STATISTICS_SEED.

Generators that can only be made self-consistent

The following generators can be made consistent to themselves. This means that the same input value in the column always produces the same output value.

Alphanumeric String Key
Array Character Scramble
ASCII Key
Character Scramble
Email
File Name
HIPAA Address
Integer Key
International Address
MAC Address
Mongo ObjectId Key
Numeric String Key
Phone
SIN
Unique Email
UUID Key

Generators that can be made self-consistent and to other columns

The following generators can be made consistent either to themselves or to other columns.

When a column is consistent to another column, the output value is based on the other column.

For example, a column contains a company name. You assign the Company Name generator, and make it consistent with the username column. Every row that has the username User1 in the input database has the same company name in the destination database.

Address
Business Name
Company Name (Deprecated)
Custom Categorical
FNR
Hostname
IP Address
Name
Noise Generator
Shipping Container
SSN
Timestamp Shift

Managing generator presets

Required license: Enterprise

On Basic or Professional instances, you select and configure generators separately for each column.

Required global permission: Create and manage generator presets

Not available on Structural Cloud.

A generator preset is a saved configuration for a generator.

Tonic Structural provides a built-in preset for every generator. You can update the configuration of the built-in presets.

You can also create custom generator presets that have different configurations. For example, for the Address generator, you can have one generator preset to use for city columns, and another generator preset to use for full addresses. You can edit and delete the custom generator presets. The custom generator presets are available to assign to columns throughout the Structural instance.

Generator presets allow you to standardize the configuration for generators, and saves your users from having to replicate the same configuration selections across different columns, tables, and workspaces. For example, you might modify the generator preset for the Integer Key generator to enable consistency. Whenever a user assigns the Integer Key generator to a column, consistency is enabled.

For information about assigning and updating generator presets for a column, go to Assigning and configuring generators.

You can also view the video tutorial about generator presets.

Viewing the list of presets

The Generator Presets view contains the list of built-in generator presets for the entire Structural instance. The configured presets are not specific to a workspace or a user.

To display the Generator Presets view, in the Tonic heading, click Generator Presets.

Information in the generator preset list

For each generator preset, the list provides the following information:

The name of the generator preset. For the built-in presets, the generator preset name always matches the generator name.
Whether the generator preset is built-in or custom.
The number of occurrences. Includes the number of occurrences that use the current baseline configuration, and the number of occurrences that have overrides to the baseline configuration.
An occurrence has an override if, after a user assigns the generator preset to a column, one the following occurs:
- A user changes the generator configuration options for that occurrence.
- A user changes the baseline configuration for the generator preset.
When the preset configuration was most recently modified.

You cannot create or configure generator presets for generators that do not have any configuration options. For example, the Null generator does not have any configuration options.

For composite generators, you cannot create or configure generator presets from Generator Presets view. Generator Presets does not have access to data from which to create path expressions. You can create a new preset or update a preset baseline configuration from a column configuration panel in Privacy Hub, Database View, or Table View.

The list indicates when a generator does not allow you to configure a preset.

Filtering the generator preset list

You can filter the list of generator presets by the preset name, whether it is built-in or custom, and by the underlying generator type.

Filtering by preset name

To filter by the preset name, begin typing text from the name. As you type, Structural filters the list to only include the matching presets.

Filtering by preset type (built-in or custom)

To filter the list based on whether the preset is built-in or custom:

Click Filter by Type.
In the dropdown list: To only include built-in presets, click Built-in. To only include custom presets, click Custom.

Tonic adds the selection to the selected filters.

Filtering by generator type

Every generator preset is based on a Structural generator type. For example, there is a built-in generator preset for the Address generator, and you can also create custom generator presets based on the Address generator.

To filter the list based on the generator type:

Click Filter by Generator.
In the generator list, click a generator to include. You can use the search field to search for a specific generator. When you click the generator name, Structural adds the generator to the selected filters.

Sorting the generator preset list

You can sort the generator preset list by the preset name and the by the modification date.

To sort the generator preset list by a column, click the column heading. To reverse the sort order, click the column heading again.

Configuring generator presets

Creating a custom generator preset

To create a new custom generator preset, you can either create a completely new preset, or copy an existing preset.

For composite generators such as JSON Mask, you cannot create a generator preset from Generator Presets view. Generator Presets view does not have access to data to use for path expressions. You can create presets for composite generators from a column configuration panel in Privacy Hub, Database View, or Table View.

You cannot create a custom preset at all for a generator that has no configuration options. For example, you cannot create a custom preset for the Null generator.

Creating a completely new custom generator preset

To create a completely new custom generator preset:

On the Generator Presets view, click Create Preset.
On the Create Preset panel, configure the generator preset.
Click Create.

Copying an existing generator preset

When you copy an existing generator preset, the new generator preset by default inherits the configuration from the copied generator preset.

To copy an existing generator preset:

On the Generator Presets view, click the copy icon for the generator preset that you want to copy.
On the Copy Preset dialog, enter a name for the new generator preset, then click Copy. The new preset is added to the Generator Presets list, and the details panel is displayed to allow you to change the new preset configuration.
After you update the configuration, click Save and Apply.
On the confirmation panel, click Confirm.

Updating a generator preset

To edit a preset, you must be either an editor or owner of at least one workspace in the Structural instance. If you are not an editor or owner of a workspace, then you can view the list of presets, but you cannot edit the presets.

When you change the configuration of a generator preset, the updated configuration becomes the new baseline configuration for the generator preset.

The baseline configuration is used whenever you select the generator preset. Existing occurrences of the generator preset keep their current configuration. You can reset those occurrences to use the current baseline configuration.

A change to the generator preset description is not considered a change to the baseline configuration.

For composite generators such as JSON Mask, you cannot update a generator preset from Generator Presets view. Generator Presets view does not have access to data to use for path expressions. You can update the baseline configuration from a column configuration panel in Privacy Hub, Database View, or Table View.

To update the baseline configuration of a generator preset:

On the Generator Presets view, click the edit icon for the preset.
On the Configuration tab of the Edit Preset panel, update the configuration. You cannot change the selected generator for the preset.

Click Save and Apply.
On the confirmation panel, click Confirm.

Configuration options for generator presets

Each generator preset includes the following configuration:

Preset Name - The name of the generator preset. You can change the name of built-in presets. Built-in presets always use the generator name.
Preset Description - A longer description of the generator preset and how it is intended to be used.
Generator Type - Used to select the generator for a new generator preset. When you copy or edit a generator preset, you cannot change the selected generator type.
Generator configuration - The configuration options for the selected generator. For details on the specific configuration options for each generator, go to the Generator reference.

The following items are not included in the generator preset configuration. They are always configured for individual columns after you select the generator preset:

Linking
Consistency with another column
Partitioning
Custom value processors

Viewing generator preset occurrences

On the generator preset details panel, the Occurrences tab indicates where the generator preset is used. You can also see whether each occurrence overrides the current baseline configuration.

The Occurrences tab displays the list of workspaces that contain occurrences of the preset. Each workspace indicates the total number of occurrences that use the current baseline configuration and that have overrides to the current baseline configuration.

For workspaces that you have access to:

You can expand the workspace to display the list of columns that use the generator preset. For each column, the entry indicates whether the column uses the current baseline configuration.
You can click the Database View icon to navigate to Database View.

For workspaces that you do not have access to, you can only see the total number of occurrences. You cannot display the column list or navigate to Database View.

Deleting a custom generator preset

You can delete custom generator presets. You cannot delete built-in generator presets.

When you delete a custom generator preset, existing occurrences are assigned the built-in generator preset for that generator. If the current configuration does not match the baseline configuration for the built-in generator preset, then the occurrences also are marked as having overrides.

For example, a column is assigned a custom generator preset for the Name generator. The custom generator preset is deleted. The column is then assigned the built-in generator preset for the Name generator, and is marked as having overrides.

To delete a custom generator preset:

On the Generator Presets view, click the delete icon for the generator preset.
On the confirmation dialog, click Delete Preset.