Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
The algebraic generator identifies the algebraic relationship between three or more numeric values and generates new values to match. At least one of the values must be a non-integer.
If a relationship cannot be found, then the generator defaults to the Categorical generator.
This generator can be linked with other Algebraic generators.
To configure the generator, from the Link To dropdown list, select the columns to link this column to. You can select other columns that are assigned the Algebraic generator.
You must select at least three columns.
The column values must be numeric. At least one of the columns must contain a value other than an integer.
If Structural data encryption is enabled, then to use it for this column, toggle Use data encryption process to the on position.
Generates a random address-like string.
You can indicate which part of an address string that the column contains. For example, the column might contain only the street address or the city, or it might contain the full address.
To configure the generator:
From the Link To dropdown list, select the columns to link this column to. You can link columns that use the Address generator to mask one of the following address components:
City
City State
Country
Country Code
State
State Abbreviation
Zip Code
Latitude
Longitude
Note that when linked to another address column, a country or country code is always the United States.
From the address component dropdown list, select the address component that this column contains. The available options are:
Building Number
Cardinal Direction (North, South, East, West)
City
City Prefix (Examples: North, South, East, West, Port, New)
City Suffix (Examples: land, ville, furt, town)
City with State (Example: Spokane, Washington)
City with State Abbr (Example: Houston, TX)
Country (Examples: Spain, Canada)
Country Code (Uses the 2-character country code. Examples: ES, CA)
County
Direction (Examples: North, Northeast, Southwest, East)
Full Address
Latitude (Examples: 33.51, 41.32)
Longitude (Examples: -84.05, -74.21)
Ordinal Direction (Examples: Northeast, Southwest)
Secondary Address (Examples: Apt 123, Suite 530)
State (Examples: Alabama, Wisconsin)
State Abbr (Examples: AL, WI)
Street Address (Example: 123 Main Street)
Street Name (Examples: Broad, Elm)
Street Suffix (Examples: Way, Hill, Drive)
US Address
US Address with Country
Zip Code (Example: 12345)
Toggle the Consistency setting to indicate whether to make the column consistent. By default, the consistency is disabled.
If consistency is enabled, then by default, the generator is self-consistent. To make the generator consistent with another column, from the Consistent to dropdown list, select the column. When the Address generator is consistent with itself, then the same value in the source database is always mapped to the same destination value. For example, for a column that contains a state name, Alabama is always mapped to Illinois. When the Address generator is consistent with another column, then the same value in the other column always results in the same destination value for the address column. For example, if the address column is consistent with a name column, then every instance of John Smith in the name column in the source database has the same address value in the destination database.
If Structural data encryption is enabled, then to use it for this column, toggle Use data encryption process to the on position.
For the Address generator, Spark workspaces (Amazon EMR, Databricks, and self-managed Spark clusters) only support the following address parts:
Building Number
City
Country
Country Code
Full Address
Latitude
Longitude
State
State Abbr
Street Address
Street Name
Street Suffix
US Address
US Address with Country
Zip Code
Generators transform the data in a source database column. You assign the generators to use. Tonic Structural offers a variety of generators to transform different types of data. For the sensitive columns that it detects, Structural also recommends the generator configuration to use.
For Enterprise instances, generator presets allow you to configure custom configurations of generators that you can then assign to columns.
You can also view this .
A version of the generator that can be used for array values.
This generator replaces letters with random other letters, and numbers with random other numbers. Punctuation and whitespace are preserved.
For example, for the following array value:
["ABC.123", 3, "last week"]
The output might be something like:
["KFR.860", 7, "sdrw mwoc"]
This generator securely masks letters and numbers. There is no way to recover the original data.
To configure the generator, toggle the Consistency setting to indicate whether to make the generator self-consistent.
By default, the generator is not consistent.
This is a .
A version of the generator that can be used for array values.
Runs a selected generator on values that match a user-specified .
To assign a generator to a path expression:
Under Sub-generators, click Add Generator. On the sub-generator configuration panel, the Cell JSON field contains a sample value from the source database. You can use the previous and next icons to page through different values.
In the Path Expression field, type the JSONPath expression to identify the value to apply the generator to. To populate a path expression, you can also click a value in the Cell JSON field. Matched JSON Values shows the result from the value in Cell JSON.
By default, the selected generator is applied to any value that matches the expression. To limit the types of values to apply the generator to, from the Type Filter, specify the applicable types. You can select Any, or you can select any combination of String, Number, and Null.
From the Generator Configuration dropdown list, select the generator to apply to the path expression. You cannot select another composite generator.
Configure the selected generator. You cannot configure the selected generator to be consistent with another column.
To save the configuration and immediately add a generator for another path expression, click Save and Add Another. To save the configuration and close the add generator panel, click Save.
From the Sub-Generators list:
To edit a generator assignment, click the edit icon.
To remove a generator assignment, click the delete icon.
To move a generator assignment up or down in the list, click the up or down arrow.
Generates unique alphanumeric strings of the same length as the input.
For example, for the origin value ABC123
, the output value is a six-character alphanumeric string such as D24N05
.
To configure the generator, toggle the Consistency setting to indicate whether to make the generator self-consistent.
By default, the generator is not consistent.
If is enabled, then to use it for this column, toggle Use data encryption process to the on position.
Generates a random company name-like string.
To configure the generator, toggle the Consistency setting to indicate whether to make the generator consistent.
By default, the generator is not consistent.
If consistency is enabled, then by default it is self-consistent. To make the generator consistent with another column, from the Consistent to dropdown list, select the column.
When the generator is consistent with itself, then a given source value is always mapped to the same destination value. For example, My Business is always mapped to New Business.
When the generator is consistent with another column, then a given source value in that other column always results in the same destination value for the company name column. For example, if the company name column is consistent with a name column, then every instance of John Smith in the name column in the source database has the same company name in the destination database.
If is enabled, then to use it for this column, toggle Use data encryption process to the on position.
If is enabled, then to use it for this column, toggle Use data encryption process to the on position.
If is enabled, then to use it for this column, toggle Use data encryption process to the on position.
Consistency
No, cannot be made consistent.
Linking
Yes, can be linked.
Differential privacy
No
Data-free
No
Allowed for primary keys
No
Allowed for unique columns
No
Uses format-preserving encryption (FPE)
No
Privacy ranking
3
Generator ID (for the API)
Consistency
Yes, can be made self-consistent or consistent with another column.
Linking
Yes, can be linked.
Differential privacy
Yes, if consistency is not enabled.
Data-free
Yes, if consistency is not enabled.
Allowed for primary keys
No
Allowed for unique columns
No
Uses format-preserving encryption (FPE)
No
Privacy ranking
1 if not consistent
4 if consistent
Generator ID (for the API)
Consistency | Yes, can be made self-consistent. |
Linking | No, cannot be linked. |
Differential privacy | No |
Data-free | No |
Allowed for primary keys | No |
Allowed for unique columns | No |
Uses format-preserving encryption (FPE) | No |
Privacy ranking |
|
Generator ID (for the API) |
Consistency | Determined by the specified sub-generators. |
Linking | Determined by the specified sub-generators. |
Differential privacy | Determined by the specified sub-generators. |
Data-free | Determined by the specified sub-generators. |
Allowed for primary keys | No |
Allowed for unique columns | No |
Uses format-preserving encryption (FPE) | No |
Privacy ranking | 5 |
Generator ID (for the API) |
Consistency | Yes, can be made self-consistent or consistent with another column. |
Linking | No, cannot be linked. |
Differential privacy | Yes, if consistency is not enabled. |
Data-free | Yes, if consistency is not enabled. |
Allowed for primary keys | No |
Allowed for unique columns | No |
Uses format-preserving encryption (FPE) | No |
Privacy ranking |
|
Generator ID (for the API) |
Consistency | Yes, can be made self-consistent. |
Linking | No, cannot be linked. |
Differential privacy | No |
Data-free | No |
Allowed for primary keys | Yes |
Allowed for unique columns | Yes |
Uses format-preserving encryption (FPE) | Yes |
Privacy ranking |
|
Generator ID (for the API) |
Performs a random character replacement that preserves formatting (spaces, capitalization, and punctuation).
Characters are replaced with other characters from within the same Unicode Block. A given source character is always mapped to the same destination character. For example, M
might always map to V
.
For example, for the following input string:
Miami Store #162
The output would be something like:
Vgkjg Gmlvf #681
Note that for a numeric column, when a generated number starts with a 0, the starting 0 is removed. This could result in matching output values in different columns. For example, one column is changed to 113 and the other to 0113, which also becomes 113.
Character Substitution is similar to Character Scramble, with a couple of key differences. Because Character Substitution always maps the same source character to the same destination character, it is always consistent. It also can be used for unique columns.
In Character Scramble, the character mapping is random, which makes Character Scramble slightly more secure. However, Character Scramble cannot be used for unique columns.
If Structural data encryption is enabled, then to use it for this column, toggle Use data encryption process to the on position.
Generates unique alpha-numeric strings based on any printable ASCII characters. The length of the source string is not preserved. You can choose to exclude lowercase letters from the generated values.
To configure the generator:
To exclude lowercase letters from the generated values, toggle Exclude Lowercase Alphabet to the on position.
Toggle the Consistency setting to indicate whether to make the generator consistent. By default, the generator is not consistent.
If Structural data encryption is enabled, then to use it for this column, toggle Use data encryption process to the on position.
This generator replaces letters with random other letters and numbers with random other numbers. Punctuation, whitespace, and mathematical symbols are preserved.
For example, for the following input string:
ABC.123 123-456-789 Go!
The output would be something like:
PRX.804 296-915-378 Ab!
This generator securely masks letters and numbers. There is no way to recover the original data.
Character Scramble is similar to , with a couple of key differences.
While you can enable consistency for the entire value, Character Scramble does not always replace the same source character with the same destination character. Because there is no guarantee of unique values, you cannot use Character Scramble on unique columns.
Character Substitution, however, does always map the same source character to the same destination character. Character Substitution is always consistent, which makes it less secure than Character Scramble. You can use Character Substitution on unique columns.
To configure the generator, toggle the Consistency setting to indicate whether to make the generator self-consistent.
By default, the generator is not consistent.
This generator reference provides the details for each of the the supported generators in Tonic Structural.
The generators are in alphabetical order by the generator name.
Here are some groupings to help to identify generators that are used for different types of values. also provides some suggestions for generators to use for specific uses cases.
If is enabled, then to use it for this column, toggle Use data encryption process to the on position.
- Also useable for numeric columns.
- Also useable for numeric columns.
Consistency
This generator is implicitly self-consistent. You do not specify whether the generator is consistent. Every occurrence of a character always maps to the same substitute character. Because of this, it can be used to preserve a join between two text columns, such as a join on a name or email.
Linking
No, cannot be linked.
Differential privacy
No
Data-free
No
Allowed for primary keys
No
Allowed for unique columns
Yes
Uses format-preserving encryption (FPE)
No
Privacy ranking
4
Generator ID (for the API)
Consistency
Yes, can be made self-consistent.
Linking
No, cannot be linked.
Differential privacy
No
Data-free
No
Allowed for primary keys
Yes
Allowed for unique columns
Yes
Uses format-preserving encryption (FPE)
Yes
Privacy ranking
3 if not consistent
4 if consistent
Generator ID (for the API)
Consistency | Yes, can be made self-consistent |
Linking | No, cannot be linked |
Differential privacy | No |
Data-free | No |
Allowed for primary keys | No |
Allowed for unique columns | No |
Uses format-preserving encryption (FPE) | No |
Privacy ranking |
|
Generator ID (for the API) |
The following table summarizes the available generators. The table includes generator characteristics that you might take into account when you select the generator to use for a column.
Generator hints and tips also provides some suggestions for generators to use for specific use cases.
Generator | Description | Supported features |
---|---|---|
The Categorical generator shuffles the existing values within a field while maintaining the overall frequency of the values. It disassociates the values from other pieces of data. Note that NULL is considered a separate value.
For example, a column contains the values Small
, Medium
, and Large
. Small
appears 3 times, Medium
appears 4 times, and Large
appears 5 times. In the output data, each value still appears the same number of times, but the values are shuffled to different rows.
This generator is optimized for categories with fewer than 10,000 unique values. If your underlying data has more unique values (for example, your field is populated by freeform text entry), we recommend that you use the Character Scramble or Custom Categorical generator.
To configure the generator:
From the Link To dropdown, select the columns to link to the current column. You can select from other columns that use the Categorical generator.
Toggle the Differential Privacy setting to indicate whether to make the output data differentially private. By default, differential privacy is disabled.
If Structural data encryption is enabled, then to use it for this column, toggle Use data encryption process to the on position.
A version of the Categorical generator that selects from values that you provide instead of shuffling the original values.
To configure the generator:
From the Link To dropdown list, select the columns to link this column to. You can only select other columns that use the Custom Categorical generator.
In the Custom Categories text area, enter the list of values that the generator can choose from.
Put each value on a separate line.
To add a NULL value to the list, use the keyword {NULL}
.
Toggle the Consistency setting to indicate whether to make the column consistent. By default, consistency is disabled.
If you enable consistency, then by default the generator is self-consistent. To make the generator consistent with another column, from the Consistent to dropdown list, select the column. When a generator is self-consistent, then a given value in the source database is always mapped to the same value in the destination database. When a generator is consistent with another column, then a given source value in that column always results in the same value for the current column in the destination database. For example, a department column is consistent with a username column. For each instance of User1 in the source database, the value in the department column is the same.
If Structural data encryption is enabled, then to use it for this column, toggle Use data encryption process to the on position.
Generates a continuous distribution to fit the underlying data.
This generator can be linked to other Continuous generators to create multivariate distributions and can be partitioned by other columns.
To configure the generator:
From the Link To drop-down list, select the other Continuous generator columns to link to. The linking creates a multivariate distribution.
From the Partition By drop-down list, select one or more columns to use to partition the data. The selected columns must have the generator set to either Passthrough or Categorical. For more information about partitioning and how it works, go to Partitioning a column.
Toggle the Differential Privacy setting to indicate whether to make the output data differentially private. By default, the generator is not differentially private.
If Structural data encryption is enabled, then to use it for this column, toggle Use data encryption process to the on position.
This is a composite generator.
Applies different generators to the value conditionally based on any value in the table.
For example, a Users table contains Name, Username, and Role columns. For the Username column, you can use a conditional generator to indicate that if the value of Role is something other than Test, then use the Character Scramble generator for the Username value. For Test users, the name is not masked.
The generator consists of a list of options. Each option includes the required conditions and the generator to use if those conditions are met.
The generator always contains a Default option. The Default option is used if the value does not meet any of the conditions. To configure the Default option:
From the Default dropdown list, select the generator to use by default.
Configure the selected generator.
To add a condition option:
Click + Conditional Generator.
To add a condition:
Click + Condition.
From the column list, select the column for which to check the value.
Select the comparison type.
Enter the column value to check for.
To remove a condition, click the delete icon for the condition.
From the Generator dropdown list, select the generator to run on the current column if the conditions are met. You cannot select another composite generator.
Choose the configuration options for the selected generator.
To view details for and edit a condition option, click the expand icon for that option.
To remove a condition option, click the delete icon for the option.
Links columns in two tables. This column value is the sum of the values in a column in another table.
This generator does not provide a preview. The sums are not computed until the other table is generated.
For example, a Customers table contains a Total_Sales column. The Transactions table uses a foreign key Customer_ID column to identify the customer who made the transaction, and an Amount column that contains the amount of the sale. The Customer_ID value in the Transactions table is a value from the ID primary key column in the Customers table.
You assign the Cross Table Sum generator to the Total_Sales column. In the generator configuration, you indicate that the value is the sum of the Amount values for the Customer_ID value that matches the primary key ID value for the current row.
For the Customers row for ID 123
, the Total_Sales column contains the sum of the Amount column for Transactions rows where Customer_ID is 123
.
To configure the generator:
From the Foreign Table dropdown list, select the table that contains the column for which to sum the values.
From the Foreign Key dropdown list, select the foreign key. The foreign key identifies the row from the current table that is referred to in the foreign table.
From the Sum Over dropdown list, select the column for which to sum the values.
From the Primary Key dropdown list, select the primary key for the current table.
If Structural data encryption is enabled, then to use it for this column, toggle Use data encryption process to the on position.
This is a composite generator.
Masks text columns by parsing the values as rows whose columns are delimited by a specified character.
You can assign specific generators to specific indexes. You can also use the generator that is assigned to a specific index as the default. This applies the generator to every index that does not have an assigned generator.
The output value maintains the quotes around the index values.
For example, a column contains the following value:
"first","second","third"
You assign the Character Scramble generator to index 0 and assign Passthrough to index 2. You select index 0 as the index to use for the default generator.
In the output, the first and second values are masked by the Character Scramble generator. The third value is not masked. The output looks something like:
"wmcop", "xjorsl", "third"
In the Delimiter field, type the delimiter that is used as a separator for the value.
For example, for the value "first","second","third"
, the delimiter is a comma.
You can configure a generator for any or all of the indexes. To add a sub-generator for an index:
Under Sub-Generators, click Add Generator. On the add generator dialog, the Cell CSV field contains a sample value from the source data. You can use the navigation icons to page through the values.
In the CSV Index field, type the index to assign a generator to. The index numbers start with 0. You cannot use an index that already has an assigned generator. Matched CSV values shows the value at that index for the current sample column value.
Under Generator Configuration, from the Select a Generator dropdown list, select the generator to use for the selected index. You cannot select another composite generator. To remove the selection, click the delete icon.
Configure the selected generator. You cannot configure the selected generator to be consistent with another column.
To save the configuration and immediately add a generator for another index, click Save and Add Another. To save the configuration and close the add generator panel, click Save.
From the Sub-Generators list:
To edit a generator assignment, click the edit icon.
To remove a generator assignment, click the delete icon.
To move a generator assignment up or down in the list, click the up or down arrow.
After you configure a generator for at least one index, the Default Link dropdown list is displayed.
From the Default Link dropdown list, select the index to use to determine how to mask values for indexes that do not have an assigned generator.
For example, you assign the Character Scramble generator to index 2. If you set Default Link to 2, then all indexes that do not have an assigned generator use the Character Scramble generator.
Truncates a date value or a timestamp to a specific part.
For a date or a timestamp, you can truncate to the year, month, or day.
For a timestamp, you can also truncate to the hour, minute, or second.
To configure the generator:
From the dropdown list, select the part of the date or timestamp to truncate to. For both date and timestamp values, you can truncate to the year, month, or day. When you select one of these options, the time portion of a timestamp is set to 00:00:00. For the date, the values below the selected truncation value are set to 01. For example, when you truncate to month, the day value is set to 01, and the timestamp is set to 00:00:00. For a timestamp value, you also can truncate to the hour, minute, or second. The date values remain the same as the original data. The time values below the selected truncation value are set to 00. For example, when you truncate to minute, the seconds value is set to 00.
Toggle the Birth Date option. When you enable Birth Date, the generator shifts dates that are more than 90 years before the generation date to the date exactly 90 years before the generation date. For example, a generation occurs on January 1, 2023. Any date that occurs before January 1, 1933 is changed to January 1, 1933.
This is mostly intended for birthdate values, to group birthdates for everyone who is older than 89 into a single year. This is used to comply with HIPAA Safe Harbor.
If is enabled, then to use it for this column, toggle Use data encryption process to the on position.
Here are examples of date and time values and how the selected truncation affects the output:
Generates timestamps fitting an event distribution. The source timestamp must include a date. It cannot be a time-only value.
Link columns to create a sequence of events across multiple columns. This generator can be partitioned by other columns.
To configure the generator:
From the Link To dropdown list, select the other Event Timestamps generator columns to link this column to. Linking creates a sequence across multiple columns.
From the Partition drop-down list, select one or more columns to use to partition the data. The selected columns must have their generator set to either Passthrough or Categorical. For more information about partitioning and how it works, go to .
The Options list displays the current column and linked columns. Use the Up and Down buttons to configure the column sequence.
If is enabled, then to use it for this column, toggle Use data encryption process to the on position.
Option | Date value | Timestamp value |
---|
Address API: AddressGenerator
Generates replacement values for U.S. mailing addresses. You select the address component or format for the replacement values. For example, the column might only contain a street address or a postal code, or it might contain a full address.
Consistency - Self and other Linkable Differential privacy if not consistent Data-free if not consistent Privacy ranking: - 1 if not consistent - 4 if consistent
Identifies the algebraic relationship between 3 or more numeric values, including at least one non-integer. Based on the relationship, generates new values to match. If there is no relationship, uses the Categorical generator.
Linkable - linking is required Privacy ranking: 3
Generates unique alphanumeric strings of the same length as the input.
For example, for the origin value ABC123
, the output value is a six-character alphanumeric string such as D24N05
.
Consistency - Self only Primary key generator Unique columns allowed Format-preserving encryption (FPE) Privacy ranking: - 3 if not consistent - 4 if consistent
Within an array, replaces letters with random other letters, and numbers with random other numbers. Preserves punctuation and whitespace.
Consistency - Self only Privacy ranking: - 3 if not consistent - 4 if consistent
Used to transform array values in JSON.
To identify values to transform, you provide a list of JSONPaths. For each JSONPath, you assign a sub-generator to apply to matching values.
Composite generator. Feature support is based on the sub-generators. Privacy ranking: 5
Used to transform values in an array. To identify values to transform, you provide a regular expression. For each capture group in an expression, you assign a sub-generator to apply to matching values.
Composite generator. Feature support is based on the sub-generators. Privacy ranking: 5
Generates unique alpha-numeric strings based on any printable ASCII characters. You can optionally exclude lowercase letters from the generated values. The replacement value does not preserve the length of the original value.
Consistency - Self only Primary key generator Unique columns allowed Format-preserving encryption (FPE) Privacy ranking: - 3 if not consistent - 4 if consistent
Generates a random company name-like string.
Consistency - Self or other Differential privacy if not consistent Data-free if not consistent Privacy ranking: - 1 if not consistent - 4 if consistent
Shuffles the original values for a column to different rows. Maintains the overall frequency of each value.
For example, a column contains the values Small
(3 times), Medium
(4 times), and Large (5 times).
In the transformed data, each value appears the same number of times, but the values are shuffled to different rows.
Linkable Differential privacy is configurable Privacy ranking: - 2 with differential privacy - 3 without differential privacy
Replaces letters with random other letters and numbers with random other numbers. Preserves punctuation, whitespace, and mathematical symbols.
Consistency - Self only Privacy ranking: - 3 if not consistent - 4 if consistent
Replaces characters with other random characters. Preserves punctuation, capitalization, and whitepace.
A replacement character is always from within the same Unicode Block as the source character.
A source character is always mapped to the same destination character. For example, M
might always map to V
.
Always self-consistent Unique columns allowed Privacy ranking: 4
Company Name (Deprecated) API: CompanyNameGenerator
This generator is deprecated. Use the Business Name generator instead. Generates a random company name-like string.
Consistency - Self or other Differential privacy if not consistent Data-free if not consistent Privacy ranking: - 1 if not consistent - 4 if consistent
Applies different generators to rows conditionally based on the column value. For example, apply the Character Scramble generator for values other than Test. You configure a list of conditions. Each condition performs a check against the column value. For each condition, you assign a sub-generator to apply to matching values.
Unique columns allowed Composite generator. Other feature support is based on the sub-generators. Privacy ranking: If a fallback generator is selected, then the lower of 5 or the fallback generator. 5 if no fallback generator is selected.
Uses a single specified value to replace all of the values in the column. The replacement value must be compatible with the column data type.
Differential privacy Data-free Privacy ranking: 1
Generates a continuous distribution to fit the underlying data. Can link to other columns to create multivariate distributions. Can also be partitioned by other columns.
Linkable Differential privacy is configurable Privacy ranking: - 2 with differential privacy - 3 without differential privacy
Populates the column using the sum of values from a column in another table. To select the rows to use, uses a foreign key value that matches the primary key value for the current row. For example, to transform the Total_Sales column in the Customers table, from the Transactions table, use the sum of the Amount values for rows where the Customer_ID value matches the primary key value for the current customer.
Privacy ranking: 3
CSV Mask API: CsvMaskGenerator
Used to mask text in a delimited format.
Parses the text as a row where the columns are delimited by a specified character. For each index, you assign a sub-generator to apply to the index value.
Composite generator. Feature support is based on the sub-generators. Privacy ranking: 5
Replaces the original column value with a value from list of values that you provide.
Consistency - Self and other Linkable Differential privacy if not consistent Data-free if not consistent Privacy ranking: - 1 if not consistent - 4 if consistent
Truncates dates or timestamps to a specific date or time component. For example, you might truncate a date value to the month or a timestamp to the hour.
Privacy ranking: 5
Email API: EmailGenerator
Scrambles characters in an email address.
Preserves the formatting and keeps the @
and .
.
You can identify specific email domains to not scramble.
Consistency - Self only Privacy ranking: - 3 if not consistent - 4 if consistent
Generates timestamps that fit an event distribution. You can link columns to create a sequence of events across multiple columns. You can also partition the generator by other columns.
Linkable Privacy ranking: 3
Scrambles characters in a file name.
Preserves the formatting and the file extension.
Consistency - Self only Privacy ranking: - 3 if not consistent - 4 if consistent
Replaces all instances of the find string with the replace string. For the find string, you can optionally provide a regular expression.
Privacy ranking: 5
FNR API: FnrGenerator
Transforms Norwegian national identity numbers. You can optionally preserve the gender and birthdate portions of the identifier values.
Consistency - Self and other Unique columns allowed Privacy ranking - 3 if not consistent - 4 if consistent
Geo API: GeoGenerator
Used to transform columns that contain latitude and longitude values.
Linkable Unique columns allowed Privacy ranking: 3
Can be used to generate cities, states, zip codes, and latitude/longitude values that follow HIPAA guidelines for safe harbor.
Consistency - Self only Privacy ranking: - 3 if not consistent - 4 if consistent
Generates random host names, based on the English language.
Consistency - Self and other Differential privacy if not consistent Data-free if not consistent Privacy ranking: - 1 if not consistent - 4 if consistent
Used to transform values in an HStore column in a PostgreSQL database. You specify a list of keys for which to transform the values. For each key, you assign a generator to apply to the key value.
Composite generator. Feature support is based on the sub-generators. Privacy ranking: 5
Used to transform columns that contain HTML content. To identify the values to transform, you provide a list of path expressions. For each path expression, you assign a generator to apply to the matching value.
Composite generator. Feature support is based on the sub-generators. Privacy ranking: 5
Generates unique integer values.
By default, the generated values are within the range of the column’s data type.
You can also specify a range for the generated values. The source values must be within that range.
Differential privacy if not consistent Data-free if not consistent Primary key generator Unique columns allowed Format-preserving encryption (FPE) Privacy ranking: - 1 if not consistent - 4 if consistent
For Canadian mailing addresses, can generate:
Street name
Postal code
For United Kingdom (UK) mailing addresses, can generate postal codes.
Consistency - Self only Differential privacy if not consistent Data-free if not consistent Privacy ranking: - 1 if not consistent - 4 if consistent
Generates a random IP address-formatted string. You specify the percentage of IPv4 addresses. The remaining addresses are IPv6.
Consistency - Self or other Differential privacy if not consistent Data-free if not consistent Privacy ranking: - 1 if not consistent - 4 if consistent
Used to transform values in JSON columns. To identify values to transform, you provide a list of JSONPaths.
For each JSONPath, you assign a sub-generator to apply to matching values.
Composite generator. Feature support is based on the sub-generators. Privacy ranking: 5
Generates a random MAC address formatted string.
Consistency - Self only Differential privacy if not consistent Data-free if not consistent Format-preserving encryption (FPE) Privacy ranking: - 1 if not consistent - 4 if consistent
Generates unique MongoDB objectId values. Can be assigned to text columns that contain MongoDB ObjectId values. The column value must be 12 bytes long.
Consistency - Self only Privacy ranking: - 3 if not consistent - 4 if consistent
Name API: NameGenerator
Generates a random name string from a dictionary of first and last names. You specify the name format. For example, a column might contain only a first name, or a full name that is last name first.
Consistency - Self or other Differential privacy if not consistent Data-free if not consistent Privacy ranking: - 1 if not consistent - 4 if consistent
Masks values in numeric columns.
Either adds or multiplies the original value by random noise.
Consistency - Self or other Privacy ranking: - 3 if not consistent - 4 if consistent
Null API: NullGenerator
Replaces all of the column values with NULL
values.
Differential privacy Data-free Unique columns allowed Privacy ranking: 1
Generates unique numeric strings of the same length as the input numeric string.
Consistency - Self only Primary key generator Unique columns allowed Format-preserving encryption (FPE) Privacy ranking: - 3 if not consistent - 4 if consistent
Default generator. Does not perform any transformation on the source data.
Unique columns allowed Privacy ranking: 6
Generates a random phone number that matches the country or region and format of the input phone number. For invalid phone numbers, either replaces individual numbers or generates a valid replacement number.
Consistency - Self only Privacy ranking: 3
Generates a random boolean value. You specify the percentage of true values. The remaining values are false.
Differential privacy Data-free Privacy ranking: 1
Generates a random double number that is between the specified minimum (inclusive) and maximum (exclusive) values.
Differential privacy Data-free Privacy ranking: 1
Generates a random hash string.
Differential privacy Data-free Privacy ranking: 1
Returns a random integer that is between the specified minimum (inclusive) and maximum (exclusive) values.
Differential privacy Data-free Privacy ranking: 1
Generates random dates, times, and timestamps that fall within a specified range.
Differential privacy Data-free Privacy ranking: 1
Random UUID API: UUIDGenerator
Generates a random new UUID string.
Differential privacy Data-free Unique columns allowed Privacy ranking: 1
To identify values to transform, you provide a regular expression.
For each capture group in an expression, you assign a sub-generator to apply to matching values.
Unique columns allowed Composite generator. Other feature support is based on the sub-generators. Privacy ranking: 5
Generates a column of unique integer values that start with specified value, and then increment by 1 for each processed row.
Linkable Unique columns allowed Privacy ranking: 3
Generates values of ISO 6346 compliant shipping container codes. The codes are all in the freight ("U") category.
Consistency - Self or other Differential privacy if not consistent Data-free if not consistent Privacy ranking: - 1 if not consistent - 4 if consistent
SIN API: SINGenerator
Generates a new valid Canadian Social Insurance Number. Preserves the formatting from the original value.
Consistency - Self only Data-free if not consistent Unique columns allowed Format-preserving encryption (FPE) Privacy ranking: - 1 if not consistent - 4 if consistent
SSN API: SsnGenerator
Generates a new valid United States Social Security Number. For numeric columns, the dashes (xxx-xx-xxxx) are always excluded. Otherwise, you can specify the percentage of values for which to include the dashes.
Consistency - Self or other Differential privacy if not consistent Data-free if not consistent Privacy ranking: - 1 if not consistent - 4 if consistent
Used to transform StructFields within a StructType in Spark databases (Databricks and Amazon EMR). To identify the StructField value to transform, you provide a path expression. For each path expression, you assign a sub-generator to apply to the matching values.
Composite generator. Feature support is based on the sub-generators. Privacy ranking: 5
Shifts timestamps by a random amount of a specific unit of time, within a set range. The range can start before the original value.
Consistency - Self or other Privacy ranking: - 3 if not consistent - 4 if consistent
Generates unique email addresses.
Replaces the username with a randomly generated GUID, and masks the domain with a character scramble.
Consistency - Self only Unique columns allowed Privacy ranking: - 3 if not consistent - 4 if consistent
URL API: UrlGenerator
Used to transform URLs. Preserves the formatting. Keeps the URL scheme and top-level domain intact.
Unique columns allowed Privacy ranking: 3
UUID Key API: UuidPkGenerator
Generates UUIDs.
Consistency - Self only Primary key generator Unique columns allowed Format-preserving encryption (FPE) Privacy ranking: - 3 if not consistent - 4 if consistent
XML Mask API: XmlMaskGenerator
Used to transform values in XML columns. To identify the values to transform, you provide XPaths. For each XPath, you assign a sub-generator to apply to the matching values.
Composite generator. Feature support is based on the sub-generators. Privacy ranking: 5
Consistency
No, cannot be made consistent.
Linking
Yes, can be linked.
Differential privacy
Configurable
Data-free
No
Allowed for primary keys
No
Allowed for unique columns
No
Uses format-preserving encryption (FPE)
No
Privacy ranking
2 if differential privacy enabled
3 if differential privacy not enabled
Generator ID (for the API)
Consistency
Yes, can be made self-consistent or consistent with another column.
Linking
Yes, can be linked.
Differential privacy
Yes, if consistency is not enabled.
Data-free
Yes, if consistency is not enabled.
Allowed for primary keys
No
Allowed for unique columns
No
Uses format-preserving encryption (FPE)
No
Privacy ranking
1 if not consistent
4 if consistent
Generator ID (for the API)
Consistency
No, cannot be made consistent.
Linking
Yes, can be linked.
Differential privacy
Configurable
Data-free
No
Allowed for primary keys
No
Allowed for unique columns
No
Uses format-preserving encryption (FPE)
No
Privacy ranking
2 if differential privacy enabled
3 if differential privacy not enabled
Generator ID (for the API)
Consistency
Determined by the selected generators.
Linking
Determined by the selected generators.
Differential privacy
Determined by the selected generators.
Data-free
Determined by the selected generators.
Allowed for primary keys
No
Allowed for unique columns
Yes
Uses format-preserving encryption (FPE)
No
Privacy ranking
If a fallback generator is selected, then the lower of either 5 or the fallback generator.
5 if no fallback generator is selected
Generator ID (for the API)
Consistency
No, cannot be made consistent.
Linking
No, cannot be linked.
Differential privacy
No
Data-free
No
Allowed for primary keys
No
Allowed for unique columns
No
Uses format-preserving encryption (FPE)
No
Privacy ranking
3
Generator ID (for the API)
Consistency
Determined by the selected sub-generators.
Linking
Determined by the selected sub-generators.
Differential privacy
Determined by the selected sub-generators.
Data-free
Determined by the selected sub-generators.
Allowed for primary keys
No
Allowed for unique columns
No
Uses format-preserving encryption (FPE)
No
Privacy ranking
5
Generator ID (for the API)
Original value | 2021-12-20 | 2021-12-20 13:42:55 |
Truncate to year | 2021-01-01 | 2021-01-01 00:00:00 |
Truncate to month | 2021-12-01 | 2021-12-01 00:00:00 |
Truncate to day | 2021-12-20 | 2021-12-20 00:00:00 |
Truncate to hour | Not applicable | 2021-12-20 13:00:00 |
Truncate to minute | Not applicable | 2021-12-20 13:42:00 |
Truncate to second | Not applicable | 2021-12-20 13:42:55 |
Consistency | No, cannot be made consistent. |
Linking | No, cannot be linked. |
Differential privacy | No |
Data-free | No |
Allowed for primary keys | No |
Allowed for unique columns | No |
Uses format-preserving encryption (FPE) | No |
Privacy ranking | 5 |
Generator ID (for the API) |
Consistency | No, cannot be made consistent. |
Linking | Yes, can be linked. |
Differential privacy | No |
Data-free | No |
Allowed for primary keys | No |
Allowed for unique columns | No |
Uses format-preserving encryption (FPE) | No |
Privacy ranking | 3 |
Generator ID (for the API) |
This generator can be used to generate cities, states, and zip codes that follow HIPAA guidelines for safe harbor.
How the HIPAA Address generator handles zip codes is based on whether the Replace zeros in truncated Zip Code toggle in the generator configuration is off or on.
By default, the setting is off. In this case, the last two digits of the zip code in the column are replaced with zeros, unless the zip code is a low population area as designated by the current census. For a low population area, all of the digits in the zip code are replaced with zeros.
If the setting is on, then the generator selects a real zip code that starts with the same three digits as the original zip code. For a low population area, if a state is linked, then the generator selects a random zip code from within that state. Otherwise the generator selects a random zip code from the United States.
When a zip code column is not linked, a random city is chosen in the United States. When a zip code is already added to the link, a city is chosen at random that has at least some overlap with the zip code.
If the original zip code is designated as a low population area then a random city is chosen within the state, this is done only if the user has linked a State column. If they have not, a random city within the United States is chosen.
For example, if the original city and zip code were (Atlanta, 30305), the zip code would be replaced with 30300. There are many cities that contain zip codes beginning in 303 such as Atlanta, Decatur, Chamblee, Hapeville, Dunwoody, College Park, etc.). One of these cities is chosen at random so that our final value is (Chamblee, 30300), for example.
HIPAA guidelines allow for information at the state level to be kept. Therefore, these values are passed through.
GPS coordinates are randomly generated in descending order of dependence of the linked HIPAA address components:
If a zip code is linked, a random point within the same 3-digit zip code prefix is generated, if the 3-digit zip code prefix is not designated a low population area. If it is a low population area, use the linked state.
If a state is available and a zip code and city are not, or the zip code or city are in a 3-digit zip code prefix that is designated a low population area, then a random GPS coordinate is generated somewhere within the state.
If no zip code, city, or state is linked, or one or more of them were provided, but there was a problem generating a random GPS coordinate within the linked areas, then a GPS coordinate is generated at a random location within the United States.
Note: If the city component of the HIPAA address is linked with latitude and/or longitude, the GPS coordinate components are randomly generated independently of the city.
All other address parts are generated randomly. The output value is not influenced at all by the underlying value in the column.
To configure the generator:
From the Link To dropdown list, select the other columns to link to. You can only select columns that are also assigned the HIPAA Address generator.
From the address part dropdown list, select the type of address value that is in the column.
Toggle the Replace zeros in truncated Zip Code setting how to generate zip codes. If the setting is off, then the last two digits are replaced with zero. For low population areas, the entire zip code is populated with zeroes. If the setting is on, then a real zip code is selected that starts with the first three digits of the original zip code. For low population areas, if a state is linked, a random zip code from the state is used. Otherwise, a random zip code from the United States is used.
Toggle the Consistency setting to indicate whether to make the column self-consistent. By default, consistency is disabled.
If Structural data encryption is enabled, then to use it for this column, toggle Use data encryption process to the on position.
For the HIPAA Address generator, Spark workspaces (Amazon EMR, Databricks, and self-managed Spark clusters) only support the following address parts:
City
City with State
City with State Abbr
State
State Abbr
US Address
US Address with Country
Zip Code
The Address generator provides support for additional address parts in Spark workspaces.
Scrambles the characters in an email address. It preserves formatting and keeps the @
and .
characters.
For example, for the following input value:
johndoe@company.com
The output value would be something like:
brwomse@xorwxlt.slt
By default, the generator scrambles the domain. You can configure the generator to not mask specific domains. You can also specify a domain to use for all of the output email addresses.
For example, if you configure the generator to not scramble the domain company.com
, then the output for johndoe@company.com
would look something like:
brwomse@company.com
This generator securely masks letters and numbers. There is no way to recover the original data.
If your email addresses include name values - for example, John.Smith@mycompany.com - then you can use the Regex Mask generator to produce email addresses that are tied to name values in the same table. For information on how to do this, go to #generator-tips-email-name-alignment.
To configure the generator:
In the Email Domain field, enter a domain to use for all of the output values.
For example, use @mycompany.com
for all of the generated values. The generator scrambles the content before the @
.
In the Excluded Email Domains field, enter a comma-separated list of domains for which email addresses are not masked in the output values. This allows you, for example, to maintain internal or testing email addresses that are not considered sensitive.
Toggle the Replace invalid emails setting to indicate whether to replace an invalid email address with a generated valid email address. By default, invalid email addresses are not replaced. In the replacement values, the username is generated. If you specify a value for Email Domain, then the email addresses use that domain. Otherwise, the domain is generated.
Toggle the Consistency setting to indicate whether to make the column self-consistent. By default, consistency is disabled.
This generator scrambles characters while preserving formatting and keeping the file extension intact.
For example, for the following input value:
DataSummary1.pdf
The output value would look something like:
RsnoPwcsrtv5.pdf
This generator securely masks letters and numbers. There is no way to recover the original data.
To configure the generator, toggle the Consistency setting to indicate whether to make the generator self-consistent.
By default, the generator is not consistent.
If Structural data encryption is enabled, then to use it for this column, toggle Use data encryption process to the on position.
This generator can be used to mask columns of latitude and longitude.
The Geo generator divides the globe into grids that are approximately 4.9 x 4.9 km. It then counts the number of points within each grid.
During data generation, each (latitude, longitude) pair is mapped to its grid.
If the grid contains a sufficient number of points to preserve privacy, then the generator returns a randomly chosen point in that grid.
If the grid does not contain enough points to preserve privacy, then the generator returns a random coordinate from the nearest grid that contains enough points.
To configure the generator:
From the Link To dropdown list, select the column to link to this one. You typically assign the Geo generator to both the latitude and longitude column, then link those columns.
From the value type dropdown, select whether this column contains a latitude value or a longitude value.
If Structural data encryption is enabled, then to use it for this column, toggle Use data encryption process to the on position.
Generates unique integer values. By default, the generated values are within the range of the column’s data type.
You can also specify a range for the generated values. The source values must be within that range.
This generator cannot be used to transform negative numbers.
To configure the generator:
In the Minimum field, enter the minimum value to use for an output value. The minimum value cannot be larger than any of the values in the source data.
In the Maximum field, enter the maximum value to use for an output value. The maximum value cannot be smaller than any of the values in the source data.
Toggle the Consistency setting to indicate whether to make the column self-consistent. By default, consistency is disabled.
If is enabled, then to use it for this column, toggle Use data encryption process to the on position.
This is a .
Runs selected generators on specified key values in an HStore column in a PostgreSQL database. HStore columns contain a set of key-value pairs.
To assign a generator to a key:
Under Sub-generators, click Add Generator. On the sub-generator configuration panel, the Cell HStore field contains a sample value from the source database. You can use the previous and next icons to page through different values.
Under Enter a key, enter the name of a key from the column value.
For example, for the column value:
"pages"=>"446", "title"=>"The Iliad", "category"=>"mythology"
To apply a generator to the title, you would enter title
as the key.
Matched HStore Values shows the result from the value in Cell HStore.
From the Generator Configuration dropdown list, select the generator to apply to the key value. You cannot select another composite generator.
Configure the selected generator. You cannot configure the selected generator to be consistent with another column.
To save the configuration and immediately add a generator for another key, click Save and Add Another. To save the configuration and close the add generator panel, click Save.
From the Sub-Generators list:
To edit a generator assignment, click the edit icon.
To remove a generator assignment, click the delete icon.
To move a generator assignment up or down in the list, click the up or down arrow.
This is a .
Masks text columns by parsing the contents as HTML, and applying sub-generators to specified path expressions.
If applying a sub-generator fails because of an error, the generator selected as the fallback generator is applied instead.
Path expressions are defined using the .
For example, for the following HTML:
To get the value of h1
, the expression is //h1/text(
).
To get the value of the first list item, the expression is //ul/li[1]/text()
.
To assign a generator to a path expression:
Under Sub-generators, click Add Generator. On the sub-generator configuration panel, the Cell HTML field contains a sample value from the source database. You can use the previous and next icons to page through different values.
In the Path Expression field, type the path expression to identify the value to apply the generator to. Matched HTML Values shows the result from the value in Cell HTML.
From the Generator Configuration dropdown list, select the generator to apply to the path expression. You cannot select another composite generator.
Configure the selected generator. You cannot configure the selected generator to be consistent with another column.
To save the configuration and immediately add a generator for another path expression, click Save and Add Another. To save the configuration and close the add generator panel, click Save.
From the Sub-Generators list:
To edit a generator assignment, click the edit icon.
To remove a generator assignment, click the delete icon.
To move a generator assignment up or down in the list, click the up or down arrow.
From the Fallback Generator dropdown list, select the generator to use if the assigned generator for a path expression fails.
The options are:
Generates random host names, based on the English language.
To configure the generator, toggle the Consistency setting to indicate whether to make the generator consistent.
By default, the generator is not consistent.
If you enable consistency, then by default the generator is self-consistent. To make the generator consistent with another column, from Consistent to, select the column.
When the generator is consistent with itself, then a given value in the source database is mapped to the same value in the destination database. For example, Host123 in the source database always produces MyHostABC in the destination database.
When the generator is consistent with another column, then a given source value in the other column results in the same host name value in the destination database. For example, a host name column is consistent with a department column. Every instance of Sales in the source data is given the same host name in the destination database.
This generator replaces all instances of the find string with the replace string.
For example, you can indicate to replace all instances of abc with 123.
To configure the generator:
In the Find field, type the string to look for in the source column value.
To use a regular expression to identify the source value, check the Use Regex checkbox.
If you use a regular expression, use backslash ( \
) as the escape character.
In the Replace field, type the string to replace the matching string with.
If is enabled, then to use it for this column, toggle Use data encryption process to the on position.
Generates an address-like string to replace either:
For a Canadian postal address, the street name or postal code
For a United Kingdom (UK) mailing address, the postal code
To replace a Canadian postal code:
The generator selects a real postal code that starts with the same three digits - has the same Forward Sortation Area (FSA) - as the original postal code, but that has a different Local Delivery Unit (LDU).
For a postal code whose FSA is not on the list that the generator uses, you can provide a fallback value to use.
To replace a UK postal code, the generator selects a real postal code.
To configure the generator:
From the Generator Type dropdown list, select International Address.
From the Country dropdown list, select the country (Canada or United Kingdom).
From the Address Component dropdown list, select the address component that this column contains. For Canada, the available options are:
Street Name
Postal Code
For the UK, the only option is to generate a postal code.
For a Canadian postal code, in the Fallback Value field, type the FSA to use if the value in the data does not exist.
For example, the FSA in the data might be new and not yet in the list that Tonic uses, or the FSA might be invalid.
By default, the fallback value is NULL
, meaning that the postal code value will also be string literal "NULL" in the destination data.
Toggle the Consistency setting to indicate whether to make the column consistent. By default, consistency is disabled.
If Tonic data encryption is enabled, then to use it for this column, toggle Use data encryption process to the on position.
Generates a random IP address formatted string.
To configure the generator:
In the Percent IPv4 field, type the percentage of output values that are IPv4 addresses.
For example, if you set this to 60
, then 60% of the generated IP addresses are IPv4 addresses, and 40% of the generated IP addresses are IPv6 addresses.
If you set this to 100
, then all of the generated IP addresses are IPv4 addresses.
If you set this to 0
, then all of the generated IP addresses are IPv6 addresses.
Toggle the Consistency setting to indicate whether to make the column consistent. By default, consistency is disabled.
If you enable consistency, then by default the generator is self-consistent. To make the generator consistent with another column, from the Consistent to dropdown list, select the column. When a generator is self-consistent, then a given value in the source database is always mapped to the same value in the destination database. When a generator is consistent with another column, then a given source value in that column always results in the same IP address value in the destination database. For example, an IP address column is consistent with a username column. For each instance of User1 in the source database, the value in the IP address column is the same.
If is enabled, then to use it for this column, toggle Use data encryption process to the on position.
If is enabled, then to use it for this column, toggle Use data encryption process to the on position.
Consistency
Yes, can be made self-consistent.
Linking
No, cannot be linked.
Differential privacy
No
Data-free
No
Allowed for primary keys
No
Allowed for unique columns
No
Uses format-preserving encryption (FPE)
No
Privacy ranking
3 if not consistent
4 if consistent
Generator ID (for the API)
Consistency
Yes, can be made self-consistent.
Linking
No, cannot be linked.
Differential privacy
No
Data-free
No
Allowed for primary keys
No
Allowed for unique columns
No
Uses format-preserving encryption (FPE)
No
Privacy ranking
3 if not consistent
4 if consistent
Generator ID (for the API)
Consistency
No, cannot be made consistent.
Linking
Yes, can be linked.
Differential privacy
No
Data-free
No
Allowed for primary keys
No
Allowed for unique columns
Yes
Uses format-preserving encryption (FPE)
No
Privacy ranking
3
Generator ID (for the API)
Consistency
Yes, can be made self-consistent.
Linking
Yes, can be linked.
Differential privacy
No
Data-free
No
Allowed for primary keys
No
Allowed for unique columns
No
Uses format-preserving encryption (FPE)
No
Privacy ranking
3 if not consistent
4 if consistent
Generator ID (for the API)
Consistency | Yes, can be made self-consistent. |
Linking | No, cannot be linked. |
Differential privacy | Yes, if consistency is not enabled. |
Data-free | Yes, if consistency is not enabled. |
Allowed for primary keys | Yes |
Allowed for unique columns | Yes |
Uses format-preserving encryption (FPE) | Yes |
Privacy ranking |
|
Generator ID (for the API) |
Consistency | Determined by the selected sub-generators. |
Linking | Determined by the selected sub-generators. |
Differential privacy | Determined by the selected sub-generators. |
Data-free | Determined by the selected sub-generators. |
Allowed for primary keys | No |
Allowed for unique columns | No |
Uses format-preserving encryption (FPE) | No |
Privacy ranking | 5 |
Generator ID (for the API) |
Consistency | Determined by the selected sub-generators. |
Linking | Determined by the selected sub-generators. |
Differential privacy | Determined by the selected sub-generators. |
Data-free | Determined by the selected sub-generators. |
Allowed for primary keys | No |
Allowed for unique columns | No |
Uses format-preserving encryption (FPE) | No |
Privacy ranking | 5 |
Generator ID (for the API) |
Consistency | Yes, can be made self-consistent or consistent with another column. |
Linking | No, cannot be linked. |
Differential privacy | Yes, if consistency is not enabled. |
Data-free | Yes, if consistency is not enabled. |
Allowed for primary keys | No |
Allowed for unique columns | No |
Uses format-preserving encryption (FPE) | No |
Privacy ranking |
|
Generator ID (for the API) |
Consistency | No, cannot be made consistent. |
Linking | No, cannot be linked. |
Differential privacy | No |
Data-free | No |
Allowed for primary keys | No |
Allowed for unique columns | No |
Uses format-preserving encryption (FPE) | No |
Privacy ranking | 5 |
Generator ID (for the API) |
Consistency | Yes, can be made self-consistent. |
Linking | No, cannot be linked. |
Differential privacy | Yes, if consistency is not enabled. |
Data-free | Yes, if consistency is not enabled. |
Allowed for primary keys | No |
Allowed for unique columns | No |
Uses format-preserving encryption (FPE) | No |
Privacy ranking |
|
Generator ID (for the API) |
Consistency | Yes, can be made self-consistent or consistent with another column. |
Linking | No, cannot be linked. |
Differential privacy | Yes, if consistency is not enabled. |
Data-free | Yes, if consistency is not enabled. |
Allowed for primary keys | No |
Allowed for unique columns | No |
Uses format-preserving encryption (FPE) | No |
Privacy ranking |
|
Generator ID (for the API) |
This is a composite generator.
Runs a selected sub-generator on values that match a user specified JSONPath. You can only search for and apply sub-generators to individual key values. You cannot apply a sub-generator to an object or to an array.
If an error occurs, the selected fallback generator is used for the entirety of the JSON value.
Sub-generators are applied sequentially, from the sub-generator at the top of the list to the sub-generator at the bottom of the list.
If multiple JSONPath expressions point to the same key, the most recently added generator takes priority.
JSON paths can also contain regular expressions and comparison logic, which allows the configured sub-generators to be applied only when there are properties that satisfy the query.
For example, a column contains this JSON:
[ { file_name: "foo.txt", b: 10 }, ... ]
The following JSON path only applies to array elements that contain a file_name
key for which the value ends in .txt
:
$.[?(@.file_Name =~ /^.*.txt$/)]
A JSON path can also be used to point to a key name recursively. For example, a column contains this JSON:
The following JSON path applies to all properties for which the key is first_name
:
$..first_name
To assign a generator to a path expression:
Under Sub-generators, click Add Generator. On the sub-generator configuration panel, the Cell JSON field contains a sample value from the source database. You can use the previous and next icons to page through different values.
In the Path Expression field, type the path expression to identify the value to apply the generator to. To create a path expression, you can also click the value in Cell JSON that you want the expression to point to. The path expression must identify a key value. You cannot apply sub-generators to an object or to an array. Matched JSON Values shows the result from the value in Cell JSON.
By default, the selected generator is applied to any value that matches the expression. To limit the types of values to apply the generator to, from the Type Filter, specify the applicable types. You can select Any, or you can select any combination of String, Number, Boolean, and Null.
From the Generator Configuration dropdown list, select the generator to apply to the path expression. You cannot select another composite generator.
Configure the selected generator. You cannot configure the selected generator to be consistent with another column.
To save the configuration and immediately add a generator for another path expression, click Save and Add Another. To save the configuration and close the add generator panel, click Save.
From the Sub-Generators list:
To edit a generator assignment, click the edit icon.
To remove a generator assignment, click the delete icon.
To move a generator assignment up or down in the list, click the up or down arrow.
From the Fallback Generator dropdown list, select the generator to use if the assigned generator for a path expression fails.
The options are:
Generates unique object identifiers.
Can be assigned to text columns that contain MongoDB ObjectId
values. The column value must be 12 bytes long.
To configure the generator:
A MongoID object identifier consists of an epoch timestamp, a random value, and an incremented counter. To only change the random value portion of the identifier, but keep the timestamp and counter portions, toggle Preserve Timestamp and Incremental Counter to the on position.
Toggle the Consistency setting to indicate whether to make the generator self-consistent. By default, the generator is not consistent.
If Structural data encryption is enabled, then to use it for this column, toggle Use data encryption process to the on position.
Generates a random MAC address formatted string.
To configure the generator:
In the Bytes Preserved field, enter the number of bytes to preserve in the generated address.
Toggle the Consistency setting to indicate whether to make the column self-consistent. By default, consistency is disabled.
If Structural data encryption is enabled, then to use it for this column, toggle Use data encryption process to the on position.
Masks values in numeric columns. Adds or multiplies the original value by random noise.
The additive noise generator draws noise from an interval around 0 scaled to the magnitude of original value. For example, the default scale is 10% of the underlying value. The larger the value, the larger the amount of noise that is added.
The multiplicative noise generator multiplies the original value by a random scaling factor that falls within a specified range.
You can use either the additive noise generator or the multiplicative noise generator, then set the other generator settings.
To use the additive noise generator:
From the dropdown list, choose Additive.
In the Relative noise scale field, type the percentage of the underlying value to scale the noise to. The default value is 10
.
Tonic samples the additive noise from a range between [-{
scale
/100} * |
value
|, {
scale
/ 100} * |
value
|)
, where scale
is the noise scale, and value
is the original data value.
The lower value of the range is inclusive, and the upper value of the range is exclusive.
For example, for the default noise scale of 10
, and a data value of 20
, the additive noise range would be [-.1 * 20, .1 * 20)
. In other words, between -2 (inclusive) and 2 (exclusive).
To use the multiplicative noise generator:
From the dropdown list, choose Multiplicative.
In the Min field, type the minimum value for the scaling factor. The minimum value is inclusive. The default value is 0.5
.
In the Max field, type the maximum value for the scaling factor. The maximum value is exclusive. The default value is 5
.
Tonic scales the original value from a range between [
min
,
max
)
, where min
is the minimum scaling factor, and max
is the maximum scaling factor.
For example, for the default values of 0.5
and 5
, Tonic multiplies the original data value by a value from between 0.5 (inclusive) and 5 (exclusive).
To configure the generator consistency and data encryption:
Toggle the Consistency setting to indicate whether to make the column consistent. By default, the consistency is disabled.
If you enable consistency, then by default the generator is self-consistent. To make the generator consistent with another column, from the Consistent to dropdown list, select the column. If the generator is self-consistent, then a given value in the source database is masked in exactly the same way to produce the value in the destination database. If the generator is consistent with another column, then for a given value in that other column, the column that is assigned the Noise generator is always masked in exactly the same way in the destination database. For example, a field containing a salary value is assigned the Noise Generator and is consistent with the username field. For each instance of User1, the Noise Generator masks the salary value in exactly the same way.
If Structural data encryption is enabled, then to use it for this column, toggle Use data encryption process to the on position.
Generates a random name string from a dictionary of first and last names.
You specify the name information that is contained in the column. A column might only contain a first name or last name, or might contain a full name. A full name might be first name first or last name first.
For example, a Name column contains a full name in the format Last, First. For the input value Smith, John
, the output value would be something like, Jones, Mary
.
To configure the generator:
From the name format dropdown list, select the type of name value that the column contains:
First. This also is commonly used for standalone middle name fields.
Last
First Last
First Middle Last
First Middle Initial Last
Last, First
Last, First Middle
Middle Initial
Toggle the Preserve Capitalization setting to indicate whether to preserve the capitalization of the column value. By default, the capitalization is not preserved.
Toggle the Consistency setting to indicate whether to make the column consistent. By default, consistency is disabled.
If you enable consistency, then by default the generator is self-consistent. To make the generator consistent with another column, from the Consistent to dropdown list, select the column.
If Structural data encryption is enabled, then to use it for this column, toggle Use data encryption process to the on position.
Generates a random boolean value.
To configure the generator, in the Percent True field, enter the percentage of values to set to True
in the output.
For example, if you set this to 60
, then 60% of the output values are True
, and 40% of the output values are False
.
If you set this to 100
, then all of the output values are True
.
If you set this to 0
, then all of the output values are False
.
If Structural data encryption is enabled, then to use it for this column, toggle Use data encryption process to the on position.
Generates a random double number between the specified minimum (inclusive) and maximum (exclusive).
To configure the generator:
In the Minimum field, type the minimum value to use in the output values. The minimum value is inclusive. The output values can be that value or higher.
In the Maximum field, type the maximum value to use in the output values. The maximum value is exclusive. The output values are lower than that value.
If Structural data encryption is enabled, then to use it for this column, toggle Use data encryption process to the on position.
Generates a random hash string.
If Structural data encryption is enabled, then to use it for this column, toggle Use data encryption process to the on position.
Generates a random phone number that matches the country or region of the input phone number while maintaining the format. For example, (123) 456-7890 or 123-456-7890.
If the input is not a valid phone number, the generator randomly replaces numeric characters. You can also replace invalid numbers with valid numbers.
By default, the numbers are United States phone numbers. Generated numbers pass Google's libphonenumber
verification if the input is a valid phone number or if you replace invalid numbers.
To configure the generator:
Toggle the Replace invalid numbers setting to indicate whether to replace invalid input values with a valid output value. By default, the generator does not replace invalid values. It randomly replaces numeric characters.
Toggle the Consistency setting to indicate whether to make the generator self-consistent. By default, consistency is disabled.
If Structural data encryption is enabled, then to use it for this column, toggle Use data encryption process to the on position.
Generates unique numeric strings of the same length as the input value.
For example, for the input value 123456
, the output value would be something like 832957
.
You can apply this generator only to columns that contain numeric strings.
To configure the generator, toggle the Consistency setting to indicate whether to make the generator self-consistent.
By default, the generator is not consistent.
If Structural data encryption is enabled, then to use it for this column, toggle Use data encryption process to the on position.
Generates random dates, times, and timestamps that fall within a specified range.
For example, you might want the output dates to all fall within a specific year or month.
To configure the generator, in the Range fields, provide the start and end dates, times, or timestamps to use for the output values.
If is enabled, then to use it for this column, toggle Use data encryption process to the on position.
Generates a column of unique integer values. The values increment by 1.
To configure the generator:
From the Link To dropdown list, select the other columns to link to the current column. You can only select columns that also use the Sequential Integer generator.
In the Starting Point field, type the number to use as the starting point.
By default, the starting point is 0
. This means that the column value in the first processed row is 0
. The value in the next processed row is 1
. The generator continues to increment the value by 1 in each row that it processes.
If is enabled, then to use it for this column, toggle Use data encryption process to the on position.
This is a .
Uses regular expressions to parse strings and replace specified substrings with the output of specified generators. The parts of the string to replace are specified inside unnamed top-level capture groups.
Defining multiple expressions allows you to attach completely different sets of sub-generators to to a given cell, depending on the cell's value.
If multiple regular expressions match a given string, the regular expressions and their associated generators are applied in the order that they are specified. The first expression defined that matches has the selected sub-generators applied.
With the Replace all matches option, the Regex Mask generator behaves similarly to a traditional regex parser. It matches all occurrences of a pattern before the next pattern is encountered. For example, the pattern ^(a)$
applied to the string aaab
matches every occurrence of the letter a
, instead of just the first.
Note that for Spark-based data connectors, depending on your environment, there might be slight differences in the regular expression support.
To ensure consistent results across all data connectors, use regular expression patterns that are compatible with both Java and C#.
For more information about regular expressions in C#, go to . For more information about regular expressions in Java, go to .
In a cell that contains the string ProductId:123-BuyerId:234
, to mask the substrings 123
and 234
, specify the regular expression:
^ProductId:([0-9]{3})-BuyerId:([0-9]{3})$
This captures the two occurrences of three-digit numbers in the pattern ProductId:xxx-BuyerId:xxx
. This makes it possible to define a sub-generator on neither, either, or both of these captured substrings.
The following regular expression defines a broader capture that matches more cell values:
^(\w+).(\d+).(\w+).(\d+)$
This captures pairs of words ((\w+)
) and numbers ((\d+)
) if there is a single character of any value between them, instead of the relatively more specific pattern of the first expression.
To add a regular expression:
Click Add Regex. On the configuration panel, Cell Value shows a sample value from the source database. You can use the previous and next options to navigate through the values.
By default, Replace all matches is enabled. To only match the first occurrence of a pattern, toggle Replace all matches to the off position.
In the Pattern field, enter a regular expression. If the expression is valid, then Tonic displays the capture groups for the expression.
For each capture group, to select and configure the generator to apply, click the selected generator. You cannot select another composite generator.
To save the configuration and immediately add a generator for another path expression, click Save and Add Another. To save the configuration and close the add generator panel, click Save.
From the Regexes list:
To edit a regex, click the edit icon.
To remove a regex, click the delete icon.
Generates a new valid United States Social Security Number.
You specify the percentage of values for which to include the dashes. For columns that have a numeric data type, Structural automatically excludes the dashes.
To configure the generator:
In the Percent with -'s field, type the percentage of output values for which to include dashes in the format.
For example, if you set this to 60
, then 60% of the output values are formatted 123-45-6789
, and 40% are formatted 123456789
.
If you set this to 100
, then all of the output values are formatted 123-45-6789
.
If you set this to 0
, then all of the output values are formatted 12345679
.
For columns that have a numeric data type, Structural automatically generates the output values without dashes.
Toggle the Consistency setting to indicate whether to make the generator consistent. By default, consistency is disabled.
If you enable consistency, then by default the generator is self-consistent. To make the generator consistent with another column, from the Consistent to dropdown list, select the column. When a generator is self-consistent, then a given value in the source database is always mapped to the same value in the destination database. When a generator is consistent with another column, then a given value for that other column in the source database results in the same SSN in the destination database. For example, if the SSN column is consistent with a Name column, then every instance of John Smith in the source database results in the same SSN in the destination database.
If is enabled, then to use it for this column, toggle Use data encryption process to the on position.
Generates values of ISO 6346 compliant shipping container codes. All generated codes are in the freight category ("U").
To configure the generator, toggle the Consistency setting to indicate whether to make the generator consistent.
By default, the generator is not consistent.
If you enable consistency, then by default the generator is self-consistent. To make the generator consistent with another column, from the Consistent to dropdown list, select the column.
When the generator is self-consistent, then a given value in the source database is always mapped to the same value in the destination database.
When the generator is consistent with another column, then a given value for the other column in the source database always results in the same shipping container code value in the destination database. For example, a shipping container column is consistent with an owner column. Every instance of an owner column from the source database has the same shipping container value in the destination database.
If is enabled, then to use it for this column, toggle Use data encryption process to the on position.
Consistency
No, cannot be made consistent.
Linking
No, cannot be linked.
Differential privacy
Yes
Data-free
Yes
Allowed for primary keys
No
Allowed for unique columns
Yes
Uses format-preserving encryption (FPE)
No
Privacy ranking
1
Generator ID (for the API)
Consistency
Yes, can be made self-consistent
Linking
No, cannot be linked
Differential privacy
No
Data-free
No
Allowed for primary keys
No
Allowed for unique columns
No
Uses format-preserving encryption (FPE)
No
Privacy ranking
3 if not consistent
4 if consistent
Generator ID (for the API)
Consistency
Determined by the selected sub-generators.
Linking
Determined by the selected sub-generators.
Differential privacy
Determined by the selected sub-generators.
Data-free
Determined by the selected sub-generators.
Allowed for primary keys
No
Allowed for unique columns
No
Uses format-preserving encryption (FPE)
No
Privacy ranking
5
Generator ID (for the API)
Consistency
Yes, can be made self-consistent.
Linking
No, cannot be linked.
Differential privacy
Yes, if consistency is not enabled.
Data-free
Yes, if consistency is not enabled.
Allowed for primary keys
No
Allowed for unique columns
No
Uses format-preserving encryption (FPE)
Yes
Privacy ranking
1 if not consistent
4 if consistent
Generator ID (for the API)
Consistency
Yes, can be made self-consistent or consistent with another column.
Linking
No, cannot be linked.
Differential privacy
No
Data-free
No
Allowed for primary keys
No
Allowed for unique columns
No
Uses format-preserving encryption (FPE)
No
Privacy ranking
3 if not consistent
4 if consistent
Generator ID (for the API)
Consistency
Yes, can be made self-consistent or consistent with another column. Note that all Name generator columns that have the same consistency configuration are automatically consistent with each other. The columns must either be all self-consistent or all consistent with the same other column. For example, you can use this to ensure that a first name and last name column value always match the first name and last name in a full name column.
Linking
No, cannot be linked.
Differential privacy
Yes, if consistency is not enabled.
Data-free
Yes, if consistency is not enabled.
Allowed for primary keys
No
Allowed for unique columns
No
Uses format-preserving encryption (FPE)
No
Privacy ranking
1 if not consistent
4 if consistent
Generator ID (for the API)
Consistency
No, cannot be made consistent.
Linking
No, cannot be linked.
Differential privacy
No
Data-free
No
Allowed for primary keys
No
Allowed for unique columns
Yes
Uses format-preserving encryption (FPE)
No
Privacy ranking
6
Generator ID (for the API)
Consistency
No, cannot be made consistent.
Linking
No, cannot be linked.
Differential privacy
Yes
Data-free
Yes
Allowed for primary keys
No
Allowed for unique columns
No
Uses format-preserving encryption (FPE)
No
Privacy ranking
1
Generator ID (for the API)
Consistency
No, cannot be made consistent.
Linking
No, cannot be linked.
Differential privacy
Yes
Data-free
Yes
Allowed for primary keys
No
Allowed for unique columns
No
Uses format-preserving encryption (FPE)
No
Privacy ranking
1
Generator ID (for the API)
Consistency
No, cannot be made consistent.
Linking
No, cannot be linked.
Differential privacy
Yes
Data-free
Yes
Allowed for primary keys
No
Allowed for unique columns
No
Uses format-preserving encryption (FPE)
No
Privacy ranking
1
Generator ID (for the API)
Consistency
Yes, can be made self-consistent.
Linking
No, cannot be linked.
Differential privacy
No
Data-free
No
Allowed for primary keys
No
Allowed for unique columns
No
Uses format-preserving encryption (FPE)
No
Privacy ranking
3
Generator ID (for the API)
Consistency
Yes, can be made self-consistent.
Linking
No, cannot be linked.
Differential privacy
No
Data-free
No
Allowed for primary keys
Yes
Allowed for unique columns
Yes
Uses format-preserving encryption (FPE)
Yes
Privacy ranking
3 if not consistent
4 if consistent
Generator ID (for the API)
Consistency | Determined by the selected sub-generators. |
Linking | Determined by the selected sub-generators. |
Differential privacy | Determined by the selected sub-generators. |
Data-free | Determined by the selected sub-generators. |
Allowed for primary keys | No |
Allowed for unique columns | Yes |
Uses format-preserving encryption (FPE) | No |
Privacy ranking | 5 |
Generator ID (for the API) |
Consistency | Yes, can be made self-consistent or consistent with another column. |
Linking | No, cannot be linked. |
Differential privacy | Yes, if consistency is not enabled. |
Data-free | Yes, if consistency is not enabled. |
Allowed for primary keys | No |
Allowed for unique columns | No |
Uses format-preserving encryption (FPE) | No |
Privacy ranking |
|
Generator ID (for the API) |
Consistency | Yes, can be made self-consistent or consistent with another column. |
Linking | No, cannot be linked. |
Differential privacy | Yes, if consistency is not enabled. |
Data-free | Yes, if consistency is not enabled. |
Allowed for primary keys | No |
Allowed for unique columns | No |
Uses format-preserving encryption (FPE) | No |
Privacy ranking |
|
Generator ID (for the API) |
Consistency | No, cannot be made consistent. |
Linking | No, cannot be linked. |
Differential privacy | Yes |
Data-free | Yes |
Allowed for primary keys | No |
Allowed for unique columns | No |
Uses format-preserving encryption (FPE) | No |
Privacy ranking | 1 |
Generator ID (for the API) |
Consistency | No, cannot be made consistent. |
Linking | Yes, can be linked. |
Differential privacy | No |
Data-free | No |
Allowed for primary keys | No |
Allowed for unique columns | Yes |
Uses format-preserving encryption (FPE) | No |
Privacy ranking | 3 |
Generator ID (for the API) |
This is a substitution cipher that preserves formatting, but keeps the URL scheme and top-level domain intact.
For example, for the following input value:
http://www.example.com/products/clothes
The output value would be something like:
http://www.example.com/sowrmsl/kwctlsn
This mask is not secure.
If Structural data encryption is enabled, then to use it for this column, toggle Use data encryption process to the on position.
Returns a random integer between the specified minimum (inclusive) and maximum (exclusive).
For example, for a column that contains a percentage value, you can indicate to use a value between 0
and 101
.
To configure the generator:
In the Minimum field, type the minimum value to use in the output values. The minimum value is inclusive. The output values can be that value or higher.
In the Maximum field, type the maximum value to use in the output values. The maximum value is exclusive. The output values are lower than that value.
If Structural data encryption is enabled, then to use it for this column, toggle Use data encryption process to the on position.
Generates UUIDs on primary key columns.
All foreign key columns that reference the configured column automatically have their UUID values masked.
To configure the generator:
To preserve the version and variant bits from the source UUID in the output value, toggle Preserve Version and Variant to the on position.
Toggle the Consistency setting to indicate whether to make the generator self-consistent. By default, the generator is not consistent.
If Structural data encryption is enabled, then to use it for this column, toggle Use data encryption process to the on position.
This is a composite generator.
Applies selected generators to specific StructFields within a StructType in a Spark database (Databricks and Amazon EMR).
For example, for the following StructType:
To get the value of the occupation
field, you would use the expression root.occupation
.
To assign a generator to a path expression:
Under Sub-generators, click Add Generator. On the sub-generator configuration panel, the Cell Struct field contains a sample value from the source database. You can use the previous and next icons to page through different values.
In the Path Expression field, type the path expression to identify the value to apply the generator to. Matched Struct Values shows the result from the value in Cell Struct.
From the Generator Configuration dropdown list, select the generator to apply to the path expression. You cannot select another composite generator.
Configure the selected generator. You cannot configure the selected generator to be consistent with another column.
To save the configuration and close the add generator panel, click Save.
From the Sub-Generators list:
To edit a generator assignment, click the edit icon.
To remove a generator assignment, click the delete icon.
To move a generator assignment up or down in the list, click the up or down arrow.
Generates a random new UUID string.
If Structural data encryption is enabled, then to use it for this column, toggle Use data encryption process to the on position.
Shifts timestamps by a random amount of a specific unit of time within a set range.
For date-only values, the Timestamp Shift Generator supports the following date formats. The example values are all for February 23, 2021.
MM/dd/yyyy
- 02/23/2021
MM/dd/yy
- 02/23/21
MM-dd-yyyy
- 02-23-2021
yyyyMMdd
- 20210223
yyyy/MM/dd
- 2021/02/23
MMddyyyy
- 02232021
To configure the generator:
From the Date Part dropdown list, select the unit of time to use for the minimum and maximum shift.
In the Minimum Shift field, type the minimum amount the value can be shifted from the original value.
Use negative numbers to indicate to shift the date to the past.
For example, assume that the date part is Day. -3
indicates that the day cannot be shifted earlier than 3 days before the original day. 3
indicates that the date cannot be shifted earlier than 3 days after the original day.
In the Maximum Shift field, type the maximum amount by which the value can be shifted from the original value.
For example, assume that the date part is Day. 5
indicates that the date cannot be shifted later than 5 days after the original day.
Toggle the Consistency setting to indicate whether to make the generator consistent. By default, consistency is disabled.
If you enable consistency, then by default the generator is self-consistent. To make the generator consistent with another column, from the Consistent to dropdown list, select the column. When a column is consistent with itself, then the same date part value is always shifted by the same amount.
When a column is consistent with another column, then for the same value in the other column, the date part value is always shifted by the same amount. For example, for the same value of username, the birthdate column value is always shifted by the same amount.
If multiple columns that use the Timestamp Shift generator are consistent with the same other column, then for those columns, the date part value shifts by the same amount. For example, the startdate
and enddate
columns are both consistent with the username
column. Both startdate
and enddate
use the Timestamp Shift generator. For the same value of username
, both startdate
and enddate
are shifted by the same amount.
If Structural data encryption is enabled, then to use it for this column, toggle Use data encryption process to the on position.
This generator is deprecated. Use the Business Name generator instead.
Generates a random company name-like string.
To configure the generator, toggle the Consistency setting to indicate whether to make the generator consistent.
By default, the generator is not consistent.
If consistency is enabled, then by default it is self-consistent. To make the generator consistent with another column, from the Consistent to dropdown list, select the column.
When the generator is consistent with itself, then a given source value is always mapped to the same destination value. For example, My Company is always mapped to New Company.
When the generator is consistent with another column, then a given source value in that other column always results in the same destination value for the company name column. For example, if the company name column is consistent with a name column, then every instance of John Smith in the name column in the source database has the same company name in the destination database.
If Structural data encryption is enabled, then to use it for this column, toggle Use data encryption process to the on position.
Uses a single value to mask all of the values in the column.
For example, you can replace every value in a string column with the String1
. Or you can replace every value in a numeric column with the value 12345
.
To configure the generator, in the Constant Value field, provide the value to use.
The value must be compatible with the field type. For example, you cannot provide a string value for an integer column.
If Structural data encryption is enabled, then to use it for this column, toggle Use data encryption process to the on position.
Differential privacy is one technique that Tonic Structural uses to ensure the privacy of your data.
Differential privacy limits the effect of a single source record or user on the destination data. Someone who views the output of a process that has differential privacy cannot determine whether a particular individual's information was used to generate that output.
Data that is protected by a process with differential privacy cannot be reverse engineered, re-identified, or otherwise compromised.
Any generator that does not use the underlying data at all is considered "data-free". A data-free generator always has differential privacy.
Several Structural generators are either always data-free, or are data-free if consistency is not enabled.
The configuration options for the and generators include a Differential Privacy toggle to enable or disable differential privacy.
The Categorical generator shuffles the values of a column while preserving the overall frequency of the values. Note that NULL is considered its own category of value.
Differential privacy (disabled by default) further protects the privacy of your data by:
First, adding noise to the frequencies of categories.
After that, if needed, removing rare categories from the possible samples.
These steps ensure that a single row of source data has limited influence on the output values. By default, the for this generator is with , where is the number of rows.
Differential privacy is not appropriate when the data in each row is unique or nearly unique. As a general rule of thumb, categories that are represented by fewer than 15 rows are at risk of being suppressed.
Structural warns you when a column isn’t suitable for differential privacy. A column is not suitable for differential privacy if most or all categories have fewer than 15 rows.
The Continuous generator produces samples that preserve the individual column distributions and correlations between columns.
Suppose we want to count the number of users in a database that have some sensitive property. For example, the number of users with a particular medical diagnosis.
A common relaxation, called approximate differential privacy, allows for flexible privacy analysis with noise drawn that is from a wider array of distributions than the Laplace distribution.
This is a .
Runs a selected generator on values that match a user specified path expression.
Path expressions are defined using the .
For example, for the following XML content:
To get the first_name
value, you would use /household/member/first_name
.
You can also select a fallback generator to run on the entire XML value if there is any error during data generation.
To assign a generator to a path expression:
Under Sub-generators, click Add Generator. On the sub-generator configuration panel, the Cell XML field contains a sample value from the source database. You can use the previous and next icons to page through different values.
In the Path Expression field, type the path expression to identify the value to apply the generator to. Matched XML Values shows the result from the value in Cell XML.
From the Generator Configuration dropdown list, select the generator to apply to the value at the path expression. You cannot select another composite generator.
Configure the selected generator. You cannot configure the selected generator to be consistent with another column.
To save the configuration and immediately add a generator for another path expression, click Save and Add Another. To save the configuration and close the add generator panel, click Save.
From the Sub-Generators list:
To edit a generator assignment, click the edit icon.
To remove a generator assignment, click the delete icon.
To move a generator assignment up or down in the list, click the up or down arrow.
From the Fallback Generator dropdown list, select the generator to use if any error occurs in the generation.
The fallback generator is then used for the entire XML value.
The options are:
Using format-preserving encryption (FPE) means to encrypt data in such a way that the output is in the same format as the input. For example, a number in the input produces a number in the generated output.
For the following generators, Tonic Structural uses FPE to encrypt the generated values. Note that the Structural implementation of FPE might not guarantee compliance with standards. For example, the ASCII Key generator does not guarantee that the length of the output data matches the length of the input data.
Each generator supports a specific input character set or domain.
When a generator attempts to process data that is not within the expected domain, it results in encryption errors. For example, the generator cannot process a string that includes non-numeric characters such as letters or symbols. The generator cannot process any value that is not a valid UUID.
If you see encryption errors, then it probably means that the column contains values that are incompatible with the selected generator. To address this, you need to choose a different generator.
One option is the generator, which has very few restrictions on the allowed values.
Partitioning allows the value of a column to be based on the values of other related columns. It is one way to generate more realistic destination values.
The following generators support partitioning:
Note that partitioning cannot be configured as part of a . You can only configure partitioning when you configure a specific column.
To enable partitioning, from the Partition by dropdown list, you choose one or more columns to partition by.
You can only choose columns that have the generator set to or .
For each value or combination of values in the partitioning columns, Tonic Structural generates a distribution of values for the original column.
For example, you assign the Continuous generator to an Income column, and partition it by an Occupation column. For each Occupation value, Structural generates a distribution of Income values. In other words, it generates a range of incomes for each occupation, such as Doctor and Construction Worker.
If you choose multiple columns, then the distribution is for each combination of column values. For example, you partition by both Occupation and Region. Structural creates a distribution of income values for each combination of occupation and region. So there is a distribution for Doctor and Northeast, and a different distribution for Doctor and Southeast.
In the destination database, Structural sets the value of the partitioned column to a value from the appropriate distribution. The distribution that Structural uses is based on the value of the partitioning columns in the destination database, not the original value of the partitioning columns in the source database.
To continue our example, assume that the Occupation column uses the Categorical generator. During data generation, Structural assigns to each record a random occupation value from the current values. For one of the records, the occupation value is Doctor in the source database and Construction Worker in the destination database.
For the Income column for that record, Structural assigns a value from the distribution of income values for the Construction Worker occupation. In other words, it assigns an income value that is realistic for the destination occupation value based on the source data.
The linking option for a generator allows multiple columns within the same table to use a single generator.
At a high level, consider using linking when columns share a strong interdependency or correlation.
When you link columns, you tell Tonic Structural that the columns are related to each other, and that Structural should take this relationship into account when it generates new data.
In a , if you change the configuration of a linked column, the columns that it is linked to also are marked as having overrides to the parent workspace configuration.
Note that you cannot configure linking as part of a . You can only configure linking when you configure specific columns.
To link columns, you first assign the same generator to those columns.
After you assign the generator, then on the generator configuration panel for any of the columns, you can link the columns.
Categorical generators support linking and can be used to preserve hierarchical data. Examples of hierarchical data include:
City, State, Zip
Job Title, Department
Day of Month, Month, Year
To illustrate how linking works, we'll use an example of city and state columns. Here is the original data:
The below image shows the results when you apply the Categorical generator to city and state columns, but do not link the columns. Because the columns are not linked, the values in each column are shuffled independently. In the output, the city and state combinations are not valid. For example, Phoenix is not in Florida and Baltimore is not in Tennessee.
The next image shows the results when you apply the Categorical generator to and link the city and state columns. This preserves the data hierarchy and ensures that the city and state combinations are valid.
The following generators can be linked:
Some generators can be data-free. When a generator is data-free, it means that the output data is completely unrelated to the source data. There is no way to use the output data to uncover the source data. Data-free generators implicitly have differential privacy. A generator is not data-free if consistency is enabled.
The following generators are always data-free:
The following generators are data-free only when consistency is disabled:
(deprecated)
A column that has a uniqueness constraint must have a unique value for every record.
Primary key columns automatically require uniqueness. Uniqueness can also be required for other columns. For example, in a users
table, userid
is the primary key column, but username
also must be unique.
The following generators can be used with columns that have uniqueness constraints:
Generators that are applied to primary key columns are different from other generators in the following ways:
The generated data must be unique in order to not break constraints
The generators are (same input → same output), so that when this generator is applied to a primary key column and its linked foreign key columns, no links are broken.
This is accomplished using .
For more information on this, and details on how to provide your own encryption key, contact .
You apply a primary key generator in the same way as you do any other generator.
Tonic Structural then automatically applies the same generator to all foreign key columns that reference the primary key.
Foreign keys are either defined by the source schema or added from the Foreign Key Relationships page. For more information, go to .
Structural currently supports the following generators for primary key columns:
The ASCII Key generator does not preserve the format of the input value. It uses the ASCII alphabet for input and the alphanumeric alphabet for output. This leads to output values that are longer than the input values.
You also cannot assign a primary key generator on a table that is related to a Scale mode table through a foreign key.
These topics talk about groups of related generators that have similar functions and configurations.
Required workspace permission: Configure column generators
The Tonic Structural sensitivity scan identifies specific types of sensitive data. For each sensitivity type that it detects, Structural can have a recommended generator. For example, for a value that the sensitivity scan identifies as a Social Security Number, Structural recommends the SSN generator. For a first name, Structural recommends the Name generator configured with First as the value type.
From Privacy Hub and Database View, you can review and apply the recommended generators.
In Privacy Hub, on the settings view of the column details panel, for a detected sensitive column that does not have an applied generator, and that has a recommended generator, Structural displays a button for the recommended generator.
To apply the recommended generator, click the button.
On Database View, when a column has a recommended generator, the generator dropdown displays the available recommendation icon.
To apply the recommended generator:
Click the generator dropdown.
On the recommended generator panel, click Apply.
When there are detected sensitive columns that are not protected, Privacy Hub displays a Sensitivity Recommendations banner. The banner displays the number of detected, unprotected columns.
To review the recommended generators, and determine whether to apply them, click Review Recommendations.
The Recommended Generators by Sensitivity Type panel displays the list of sensitivity types for which there are detected, unprotected columns.
To display the columns for a sensitivity type, click the expand icon for that type.
To hide the column list, click the collapse icon.
For each column, the list includes the following information:
The table and schema name
The column name, with the column data type
An example value from the source data (Original Data), with a corresponding destination value when the recommended generator is applied (Expected Output).
To display a larger sample of source and destination values, click the view icon in the Expected Output column.
Address columns that can be linked are displayed in groups.
For example, if a table includes columns for both city and state values, then those columns are displayed as a group. When you apply the recommended generators to the group, the columns are also linked.
There are separate Address entries for individual columns and for groups of columns to link.
To filter the lists, you can use either:
Schema name
Table name
Column name
Start to type text in the schema, table, or column name. As you type, Structural applies the filter to all of the lists.
When you first display the panel, all of the columns are selected. The columns that are affected when you apply recommended generators or ignore columns.
Within each sensitivity type, you can select or deselect individual columns.
You can use the checkbox in the column heading to select or deselect all of the columns for a sensitivity type.
Before you apply a recommended generator, you can enable or disable consistency for each individual column, or for all of the columns for a sensitivity type.
You can only enable self-consistency. You cannot configure consistency with other columns.
The recommended generators panel contains a Consistency toggle for each column. You use the toggle in the column heading to enable or disable consistency for all of the columns for a sensitivity type. If the recommended generator does not support consistency, then the toggle is disabled.
To enable self-consistency, toggle Consistency to the on position. To disable self-consistency, toggle Consistency to the off position.
To apply the recommended generator to the selected columns for a sensitivity type, click the Apply option for that sensitivity type.
When you apply the recommended generator, Structural removes the column from the list.
If the recommended generator is incorrect, then you can ignore the recommendation.
To ignore the recommended generator for the selected columns in a sensitivity type:
Click the Ignore option for the sensitivity type.
In the Ignore dropdown list, click Ignore generator recommendation.
When you ignore the generator recommendation:
The column is removed from the list.
The recommended generator is removed. This includes the recommendation on the Privacy Hub column configuration panel.
The column continues to be marked as sensitive.
Required workspace permission: Configure column sensitivity
You can mark selected columns for a sensitivity type as not sensitive. For example, a value might be correctly identified as a first name, but be a test value that is not actually sensitive and does not need to be transformed.
To mark selected columns in a sensitivity type as not sensitive:
Click the Ignore option for the sensitivity type.
In the Ignore dropdown list, click Mark as not sensitive.
When you mark a column as not sensitive, it is removed from the list.
To apply the recommended generators to all of the selected columns across all of the sensitivity types, click Apply All.
On Database View, the Bulk Edit panel includes an option to apply the recommended generators to the selected columns for which there is an available recommendation.
From Database View, to apply recommended generators to multiple columns:
Check the checkbox for each column to update.
Click Bulk Edit.
On the bulk editing panel, in the Generator recommendations found panel, click Apply.
The FNR generator transforms Norwegian national identity numbers. In Norwegian, the term for national identity number abbreviates to FNR.
The first six digits of an FNR reflects the person's birthdate. You can choose to preserve the birthdates from the source values in the destination values. If you do not preserve the source values, the destination values are still within the same date range as the source values.
Another digit in an FNR indicates whether the person is male or female. You can specify whether to preserve in the generated value the gender indicated in the source value.
The last digits in an FNR are a checksum value. The last digits in the destination value are not a checksum - the values are random.
To configure the generator:
To preserve the gender from the source value in the destination value, toggle Preserve Gender to the on position.
To preserve the birthdate from the source value in the destination value, toggle Preserve Birthdate to the on position.
Toggle the Consistency setting to indicate whether to make the generator consistent. By default, consistency is disabled.
If you enable consistency, then by default the generator is self-consistent. To make the generator consistent with another column, from the Consistent to dropdown list, select the column. When a generator is self-consistent, then a given value in the source database is always mapped to the same value in the destination database. When a generator is consistent with another column, then a given value for that other column in the source database results in the same value in the destination database. For example, if the FNR column is consistent with a Name column, then every instance of John Smith in the source database results in the same FNR in the destination database.
If is enabled, then to use it for this column, toggle Use data encryption process to the on position.
This is a .
A version of the generator that can be used for array values.
Uses regular expressions to parse strings and replace specified substrings with the output of specified generators. The parts of the string to replace are specified inside unnamed top-level capture groups.
To add a regular expression:
Click Add Regex. On the configuration panel, Cell Value shows a sample value from the source database. You can use the previous and next options to navigate through the values.
By default, Replace all matches is enabled. To only match the first occurrence of a pattern, toggle Replace all matches to the off position.
In the Pattern field, enter a regular expression. If the expression is valid, then Structural displays the capture groups for the expression.
For each capture group, to select and configure the generator to apply, click the selected generator. You cannot select another composite generator.
To save the configuration and immediately add a generator for another path expression, click Save and Add Another. To save the configuration and close the add generator panel, click Save.
From the Regexes list:
To edit a regex, click the edit icon.
To remove a regex, click the delete icon.
When differential privacy is enabled, noise is added to the individual distributions and the correlation matrix, using the mechanism described in [].
The default privacy budget for this generator is with .
Differential privacy is a property of a randomized algorithm , which takes as input a database and produces some output The outputs could be counts or summary statistics or synthetic databases — the specific type is not important for this formulation.
For this formulation, we say two databases and are neighbors if they differ by a single row.
For a given , we say that is differentially private if, for all subset of outputs , we have:
When is non-zero, this is sometimes called approximately differentially private.
The parameter is the privacy budget of the algorithm, and quantifies in a precise sense an upper bound on how much information an adversary can gain from observing the outputs of the algorithm on an unknown database.
Suppose an attacker suspects that our secret databaseis one of two possible neighboring databases , with some fixed odds.
Ifis differentially private, then observing updates the the attacker's log odds of vs by at most .
The closer is to , the better the privacy guarantee, as an attacker is more and more limited in what information they can learn from .
Conversely, larger values of mean that an attacker can possibly learn significant information by observing .
Dwork, McSherry, Nissim and Smith introduced in [] the Laplace Mechanism as a way to publish these counts in a secure way, by adding noise sampled from the Laplace distribution.
This noise affords us plausible deniability. If the underlying count changed by , then the probability of observing the same noisy output does not change by much:
We illustrate this visually, showing the probability density function (pdf) of the observed values given true counts of (blue), (orange), and (green).
The blue shaded region shows that the the possibly noisy count values for and lie within a factor of of the noisy count values of , so this mechanism is differentially private with .
For example, the AnalyzeGauss mechanisms of [], and differentially private gradient descent of [], use Gaussian noise as a fundamental ingredient, which requires the following relaxation:
For a given and , we say that is differentially private if, for all subset of outputs , we have:
The parameter is often described as the risk of a (possibly catastrophic) privacy violation. While this formal definition does allow, for example, a mechanism that reveals a sensitive database with probability , in practice this is not a plausible outcome with carefully designed mechanisms. Also, taking to be small relative to the size of the database ensures that the risk of disclosure is low.
Martin Abadi, Andy Chu, Ian Goodfellow, H. Brendan McMahan, Ilya Mironov, Kunal Talwar, and Li Zhang. 2016. Deep Learning with Differential Privacy. In Proceedings of the 2016 ACM SIGSAC Conference on Computer and Communications Security (CCS '16). Association for Computing Machinery, New York, NY, USA, 308–318. DOI:
Cynthia Dwork, Frank McSherry, Kobbi Nissim and Adam Smith. 2006 Calibrating Noise to Sensitivity in Private Data Analysis. In: Halevi S., Rabin T. (eds) Theory of Cryptography. (TCC '06). Lecture Notes in Computer Science, vol 3876. Springer, Berlin, Heidelberg. DOI:
Cynthia Dwork and Aaron Roth. 2014. The Algorithmic Foundations of Differential Privacy. Found. Trends Theor. Comput. Sci. 9, 3–4 (August 2014), 211–407. DOI:
Cynthia Dwork, Kunal Talwar, Abhradeep Thakurta, and Li Zhang. 2014. Analyze gauss: optimal bounds for privacy-preserving principal component analysis. In Proceedings of the forty-sixth annual ACM symposium on Theory of computing (STOC '14). Association for Computing Machinery, New York, NY, USA, 11–20. DOI:
Another option is to use the generator, which allows you to assign different generators based on column values.
If you need support for additional types, contact .
Primary key generators are not supported in the table mode. The process requires control over the key columns to make sure that all of the relationships are maintained.
Consistency
No, cannot be made consistent.
Linking
No, cannot be linked.
Differential privacy
No
Data-free
No
Allowed for primary keys
No
Allowed for unique columns
Yes
Uses format-preserving encryption (FPE)
No
Privacy ranking
3
Generator ID (for the API)
Consistency
No, cannot be made consistent.
Linking
No, cannot be linked.
Differential privacy
Yes
Data-free
Yes
Allowed for primary keys
No
Allowed for unique columns
No
Uses format-preserving encryption (FPE)
No
Privacy ranking
1
Generator ID (for the API)
Consistency
Yes, can be made self-consistent.
Linking
No, cannot be linked.
Differential privacy
No
Data-free
No
Allowed for primary keys
Yes
Allowed for unique columns
Yes
Uses format-preserving encryption (FPE)
Yes
Privacy ranking
3 if not consistent
4 if consistent
Generator ID (for the API)
Consistency
Determined by the selected sub-generators.
Linking
Determined by the selected sub-generators.
Differential privacy
Determined by the selected sub-generators.
Data-free
Determined by the selected sub-generators.
Allowed for primary keys
No
Allowed for unique columns
No
Uses format-preserving encryption (FPE)
No
Privacy ranking
5
Generator ID (for the API)
Consistency
No, cannot be made consistent.
Linking
No, cannot be linked.
Differential privacy
Yes
Data-free
Yes
Allowed for primary keys
No
Allowed for unique columns
Yes
Uses format-preserving encryption (FPE)
No
Privacy ranking
1
Generator ID (for the API)
Consistency
Yes, can be made self-consistent or consistent with another column.
Linking
No, cannot be linked.
Differential privacy
No
Data-free
No
Allowed for primary keys
No
Allowed for unique columns
No
Uses format-preserving encryption (FPE)
No
Privacy ranking
3 if not consistent
4 if consistent
Generator ID (for the API)
Consistency
Yes, can be made self-consistent or consistent with another column.
Linking
No, cannot be linked.
Differential privacy
Yes, if consistency is not enabled.
Data-free
Yes, if consistency is not enabled.
Allowed for primary keys
No
Allowed for unique columns
No
Uses format-preserving encryption (FPE)
No
Privacy ranking
1 if not consistent
4 if consistent
Generator ID (for the API)
Consistency
No, cannot be made consistent.
Linking
No, cannot be linked.
Differential privacy
Yes
Data-free
Yes
Allowed for primary keys
No
Allowed for unique columns
No
Uses format-preserving encryption (FPE)
No
Privacy ranking
1
Generator ID (for the API)
Consistency
Map the same input values to the same output values across multiple columns, tables, and databases.
Linking
Identify columns that use the same generator and that are inter-dependent or correlated.
Differential privacy
Ensures that the output does not reveal anything that is attributable to a specific member of the source data.
Data-free generators
Indicates that the generator output is completely unrelated to the input.
Column partitioning
Base the value of a column on other related columns.
Uniqueness constraints
Generators that you can use on columns that have uniqueness constraints.
Format-preserving encryption (FPE)
Encrypts data in such a way that the output is in the same format as the input.
Some data values require custom processing before or after the generator is applied.
If you require custom processing for data values, Tonic.ai can work with you to develop and deploy custom value processors for your Tonic Structural instance. Once a custom value processor is deployed, you can select the processor as part of the generator configuration for each column.
One common use case for custom processing is to decrypt source data before applying a generator, and encrypt destination data before writing it to the destination database.
Structural data encryption allows you to set up decryption and encryption to apply to columns. For more information, go to Configuring and using Structural data encryption.
Most Tonic Structural generators consume source data and perform an operation on it to produce destination data.
For example, the Character Scramble generator takes the original data from the source database, replaces the letters and numbers with random letters and numbers, and then writes the result to the destination database.
Composite generators do not generate data directly. Instead, they apply other generators, referred to as sub-generators, to specific sub-values within the column or based on conditions.
Structural provides the following composite generators:
Most composite generators treat the input as structured data that the generator parses using a domain-specific syntax, such as:
XPath for XML or HTML
JSONPath for JSON or a Spark StructType
Regular expressions for text
These generators allow you to select a sub-value of the input, and then configure a specific generator to apply to only that sub-value. This means that you can take your original structured data and selectively mask the content.
For example, for the following structured content:
{ name: { first: "Tj", last: "Bass" } }
You indicate to use the Name generator to replace the value of last
. The result is something like:
{ name: { first: "Tj", last: "Pine" } }
The Conditional generator is slightly different. It allows you to apply a specific generator when the column value matches a specific condition. For example, you can indicate to apply a Character Scramble generator only if the column value is something other than "test".
You cannot configure generator presets for composite generators from the Generator Presets view. The Generator Presets view does not have access to data to use for path expressions or conditions.
From a column configuration panel, you can save the current configuration as the new baseline configuration, and reset the configuration to the current baseline.
For any composite generator, when you select the generator to apply to a selected sub-value or based on a specified condition, you cannot select another composite generator.
For example, you cannot apply a Conditional or XML Mask generator to the value of a specified path expression.
For composite generators other than the Conditional or Regex Mask generators, you cannot configure a sub-generator to be consistent with another column.
Consistency | Determined by the selected sub-generators. |
Linking | Determined by the selected sub-generators. |
Differential privacy | Determined by the selected sub-generators. |
Data-free | Determined by the selected sub-generators. |
Allowed for primary keys | No |
Allowed for unique columns | No |
Uses format-preserving encryption (FPE) | No |
Privacy ranking | 5 |
Generator ID (for the API) |
Consistency | Yes, can be made self-consistent or consistent with another column. |
Linking | No, cannot be linked |
Differential privacy | No |
Data-free | No |
Allowed for primary keys | No |
Allowed for unique columns | Yes |
Uses format-preserving encryption (FPE) | No |
Privacy ranking |
|
Generator ID (for the API) |
Consistency | Determined by the selected sub-generators. |
Linking | Determined by the selected sub-generators. |
Differential privacy | Determined by the selected sub-generators. |
Data-free | Determined by the selected sub-generators. |
Allowed for primary keys | No |
Allowed for unique columns | No |
Uses format-preserving encryption (FPE) | No |
Privacy ranking | 5 |
Generator ID (for the API) |
These hints and tips can help you to choose generators and address some specific use cases.
Tonic Structural provides several options for de-identifying names of individuals names. The method that you select depends on the specific use case, including the required realism of the output and privacy needs.
The following are a few of the generator options and how and why you might use them.
Name generator Randomly returns a name from a dictionary of primarily Westernized names, unrelated to the original value. Can provide complete privacy, unless you use Consistency. The output is realistic because the values returned are real names.
Categorical generator This generator shuffles all of the values in the field while preserving the overall frequency of the values. It ensures that the output contains realistic-looking names, and that the output uses the names from the original data set. This can be beneficial if the original data contains, for example, names that are common to a particular region and that should be maintained. When you use this generator with the Differential Privacy option, it ensures the output is secure from re-identification. However, if the source data set is small or each name is highly unique, Structural might prevent you from using this option.
Custom Categorical Allows you to provide your own dictionary of values. These values are included in the output at the same frequency that the original values occur in the source data.
Character Scramble Randomly replaces characters with other characters. The output does not provide realistic looking names, but it provides a high level of privacy that prevents recovery of the original data. It does preserve whitespace, punctuation (such as hyphenated names), and capitalization. Because it is a character-level replacement, it preserves the length of the input string.
Character Substitution Similar to Character Scramble, but uses a single character mapping throughout the generated data. This reduces the privacy level, but ensures consistency and uniqueness. This generator also has more support for additional unicode blocks to ensure that the output characters more closely match the input. This might be helpful if the input includes names with characters outside of the basic Latin (a-z, A-Z) characters.
Rows of data often have multiple date or timestamp fields that have a logical dependency, such as START_DATE
and END_DATE
.
In this case, a randomly generated date is not viable, because it could produce a nonsensical output where events occur chronologically out of order.
The following generator options handle these scenarios:
Timestamp Shift generator (with Consistency)
To solve the problem described above, you ensure that two or more timestamps are randomly shifted by the same amount instead of independently from each other.
The key is to use the consistency option.
For example, a row of data represents an individual that is identified by a primary key of PERSON_ID
. The row also contains START_DATE
and END_DATE
columns. You can apply a timestamp shift to the START_DATE
and END_DATE
columns within a desired range, and make both columns consistent to PERSON_ID
.
Whenever the generator encounters the same PERSON_ID
value, it shifts the dates by the same amount.
Event Timestamps generator You can apply the Event Timestamps generator to multiple date columns on the same table. You can link them to follow the underlying distribution of dates. For more information, go to the blog post Simulating event pipelines for fun and profit (and for testing too).
Date Truncation generator This generator can sometimes address the described problem. You can configure this generator to truncate the input to the year, month, day, hour, minute, or second. It guarantees that a secondary event does not occur BEFORE a primary event. However, truncation might cause them to become the same date value or timestamp. Whether you can use this generator for this purpose depends on the typical time separation between the two events relative to the truncation option, and whether truncation provides an adequate level of privacy for the particular use case.
Free text refers to text fields in the source database that might come from an "uncontrolled" source such as user text entry. In these cases, any record might or might not contain sensitive information.
Some possible examples include:
Notes from a doctor or healthcare provider that contain Protected Health Information (PHI)
Other personally identifiable information, such as a Social Security number or telephone number, that a user enters into an open-ended text entry form
Structural provides several suitable options. The method that you select depends on the specific use case, including the required realism of the output and any privacy requirements.
Here are a few generator options for free text fields, with information on how and why you might use them.
Character Scramble generator Randomly replaces characters with other characters. The output does not contain meaningful text, but it provides a high level of privacy that prevents recovery of the original data. The Character Scramble generator does preserve whitespace, punctuation, and capitalization. Because it is a character-level replacement, it preserves the length of the input string.
Regex Mask generator
Uses regular expressions to parse strings. It then replaces specified substrings with the output of selected generators. The parts of the string to replace are specified in unnamed top-level capture groups.
The Regex Mask generator can preserve more realism of the underlying text, but introduces privacy risks. Any sensitive information that does not conform to a known and configured pattern is not captured and replaced.
As an example of matching specific formats, a configuration that includes the following two patterns would replace both telephone numbers that use the ###-###-####
format, and SSNs that use the ###-##-####
format, but leave the surrounding text unmodified:
SSN: ([0-9]{3}-[0-9]{2}-[0-9]{4})
Telephone Number: ([0-9]{3}-[0-9]{3}-[0-9]{4})
You can configure multiple regular expression patterns to handle all known or expected sensitive information formats. You cannot use this method to replace values that you cannot use a regular expression to reliably identify, such as names within free text.
When you use this option, make sure to enable Replace all matches for each pattern.
Constant, Custom Categorical, and Null generators Each of these options provides the highest level of privacy, because they completely remove or replace the original text. You might use each one for different reasons:
Null: If the field is nullable and the use case does not require any data in the field, you can use the Null generator to replace the values with NULL.
Constant: Allows you to provide a fixed value to replace all of the source value. For example, you could provide a "Lorem ipsum" string or other dummy value that is appropriate for your data set.
Custom Categorical: Similar to the Constant generator, it replaces the original value with a fixed value. To increase the cardinality of the output, you enter a list of possible values. The values are randomly used on the output records.
Most Structural generators preserve NULL values that are in the data.
They do not automatically preserve empty values.
To make sure that any empty values stay empty in the destination database:
Assign the Conditional generator to the column.
For the default generator, select the generator to apply to the non-empty values.
Create a condition to look for empty values. You can either:
Use the regex comparison against the regex whitespace value (\s*
).
Use the =
operator and leave the value empty or empty except for a single space.
If you are not sure which characters the empty strings use, the regex option is more flexible. However, it is less efficient.
For the empty value condition, set the generator to Passthrough.
You sometimes might want to apply the same generator to all of the text values in a JSON, HTML, or XML value. For example, you might want to apply the Character Scramble to all of the text.
Instead of creating separate path expressions for each path, you can use one or two path expressions that capture all of the values.
For the Array JSON Mask or JSON Mask generator, the path expression $..*
captures all of the text values. You can then select the generator to apply to the values.
For the HTML Mask and XML Mask generators, you create two path expressions:
//text()
gets all of the text nodes.
//@*
gets all of the attribute values.
You apply the generator to each expression.
Sub-generators are applied sequentially. You can apply the wildcard paths in addition to more specific paths and generators.
For example, one path expression references a specific name or address and uses the Name or Address generator. The wildcard path expressions use the Character Scramble generator to mask any unknown fields in the document that could contain sensitive information.
As another example, you might assign the Passthrough generator to specific known fields that never contain sensitive information.
When your XML includes namespaces, then to include the namespaces in the path expression, specify the elements as:
For example, for the following XML:
A working XPath to mask the name value is:
You might sometimes set default date values to the absolute minimum and maximum values that are allowed by the database. For example, for SQL Server, these values are January 1, 1753 and December 31, 9999.
When you assign the Timestamp Shift generator, the minimum value cannot be shifted backward and the maximum value cannot be shifted forward.
To skip those default values and shift the other values:
Assign the Conditional generator to the column.
For the default generator, select the Timestamp Shift generator.
Create conditions to look for the minimum or maximum values.
For those conditions, set the generator to Passthrough.
You might sometimes want to add values that are the output of a generator to the results of the transformation by another generator.
For example, you use Character Scramble to mask a username. You might also want to prefix the value with a fixed constant value, or append a sequential integer.
To accomplish this:
Apply the Regex Mask generator to the column.
In addition to the capture groups that are specific to your data:
Use (^)
as a capture group for a prefix.
Use ($)
as a capture group for a suffix.
Use ()
as an empty group at any point in the regex pattern.
Apply the relevant generators to each capture group.
So to implement the example above (prefix with a constant, scramble the value, append a sequential integer), you provide the expression (^)(.*)()($)
.
This produces four capture groups:
Group 0 is for the prefix. You assign the Constant generator and provide the value to use as the prefix.
Group 1 captures all of the original values. You assign the Character Scramble generator.
Group 2 captures any empty values. You assign the Constant generator to provide a value to use for those values.
Group 3 is for the suffix. You assign the Sequential Integer generator.
A table that contains user data might include both name and email address columns. If a user's email address is based on their name, then in the destination data, you might want to also tie the email addresses to the names.
For example, your email addresses might use the format firstName.lastName@mycompany.com
. In the source data, the email address for John Smith is John.Smith@mycompany.com. In the destination data, assuming John Smith is replaced by Michael Jones, you want the email address to be Michael.Jones@mycompany.com.
At a high level, to line up name and email address columns:
Assign the Name generator to the name fields. Make the Name generator consistent with an identifier column.
Assign the Regex Mask generator to the email address field.
Create a regular expression that extracts to capture groups the name portion of the email address. The specific expression varies based on the email address format.
Assign the Name generator to each name capture group. Make the Name generator consistent with the same identifier column.
In this example, the source data contains userId, firstName
, lastName
, and emailAddress
fields, and the email address is firstName
.lastName
@mycompany.com.
To ensure that the destination data email addresses are aligned to the destination data names:
For the firstName
field, assign the Name generator, configured to produce a first name. Make the generator consistent with the userId
column.
For the lastName
field, assign the Name generator, configured to produce a last name. Make the generator consistent with the userId
column.
For the emailAddress
field, assign the Regex Mask generator. Use the following regular expression to extract the parts of the email address to capture groups:
([a-zA-Z]+).([a-zA-Z]+)@(.*)
For the first name and last name capture groups:
Assign the Name generator, configured to produce the first and last names.
Make the Name generator consistent with the userId
column.
Consistency is an option for some generators that when turned on, maps the same input to the same output across an entire database.
Consistency can also be maintained across multiple databases of varying types. For example, if consistency is turned on for a name generator, it always maps the same input name (for example, Albert Einstein) to the same output (for example, Richard Feynman).
You can also view this video overview of consistency.
The primary reasons for using consistency are to:
Enable joining on columns that don't have explicit database constraints in the schema. This is often seen with values such as email addresses. With consistency, you can completely anonymize an email address and still use it in a join.
Preserve the approximate cardinality of a column. For example, a city column contains 50 different cities. To randomize this column but still have ~50 cities, you can use consistency to maintain the approximate cardinality. Because consistency does not guarantee uniqueness, the cardinality might change. However, it is guaranteed to not increase. If unique 1-to-1 mappings are required, a Key generator should be used.
Match duplicated data across 1 or more databases. For example, you have a user database that contains a username in both a column and a JSON blob, and another database that contains their website activity, identified by the same username values. To anonymize the username, but still have the username be the same in all locations/databases, use consistency.
Self-consistency indicates that the value in the destination database is consistent with the value of the same column in the source database.
For example, a column contains a first name. You make the assigned generator self-consistent. A given first name in the source database is always replaced by the same first name in the destination database. For example, the first name value John
is always replaced by the value Michael
.
Consistency with another column indicates that the value in the destination database is consistent with the value of a different column in the source database.
For example, a column contains an IP address. You make the assigned generator consistent with the username column. Every row that has the username User1
in the input database has the same IP address in the destination database.
When you select a generator as the sub-generator for a composite generator, in most cases you cannot configure the generator to be consistent with another column. Only the Conditional generator and the Regex Mask generator allow a sub-generator to be consistent with another column.
Note that consistency with another column cannot be configured in a generator preset. You can only configure it when you configure an individual column.
To enable consistency, on the generator configuration panel, toggle the Consistency switch.
Not all generators support consistency.
Consistency is a function of the both the data type and the value.
For example, a numeric field contains the value 123. A string/varchar field contains the value "123".
Both fields have consistent generators applied.
The output is not consistent between the two fields.
To demonstrate the effect of consistency on the output, we'll use a column that contains a first name, and that uses the Name generator.
Here is the sample input and output when consistency is not enabled:
In this sample data, the first name Melissa appears twice, but is mapped to Walton the first time and Linn the second time.
Here is the sample input and output when consistency is enabled:
In this case, the first name Melissa is mapped to Rosella both times.
A consistent generator ensures that the same input value always produces the same output value.
It does not guarantee that two different input values produce two different output values.
Consistent generators are not 1:1 mappings.
Consistency can reduce the privacy of your data, because it reveals something about the frequency of the data values.
For example, if someone is familiar with the source data values and frequency, they might be able to connect the source and destination values. For example, they know that Jane appears 20 times and Michael appears 3 times in the source. When they see 20 instances of Susan and 3 instances of John, they might infer that Susan is mapped from Jane and John from Michael.
However, this risk does require some knowledge of the source data. Tonic Structural does not store mappings of the source data to the destination data. In other words, someone can see that in the destination data the name Susan appears 20 times and the name John appears 3 times. But without any knowledge of the source data, they cannot determine that Susan is mapped from Jane and John is mapped from Michael.
Also, the mapping of source to destination values is not guaranteed to be unique. Both Jane and Michael could be mapped to John. In that case there would be 23 instances of John, which would not match the frequency of a specific source value. To guarantee unique values, use a primary key generator.
Any column, regardless of which table it resides in, is consistent with any other column that uses the same consistent generator.
For example, your database includes a Customers table and an Employees table. Each table contains a column for the first name of the customer or employee. You assign the Name generator to both columns to generate a first name, and make the generators consistent. The same first name value in either column is mapped to the same destination value. For example, the first name John is always mapped to Michael, whether the name John appears in the Customers table or the Employees table.
However, by default, consistency is not guaranteed between data generation runs, even if the run is on the same database.
By default, consistency is only guaranteed across a single data generation for a single workspace.
For example, for a column that contains a first name value, you assign the Name generator and configure the generator to be consistent. The first time you run data generation, all instances of the name John might be replaced with Michael. The next time you run data generation, all instances of the name John might instead be replaced with Gregory.
You can enable consistency across runs and workspaces so that, for example, every time you run a data generation, John is always replaced with Michael.
To do this, you configure a seed value. You can either:
Configure the Structural environment setting TONIC_STATISTICS_SEED
. This ensures consistency across all workspaces and data generation runs.
Configure a seed value for a workspace. This ensures consistency across all data generation runs for that workspace, as well as across other workspaces that have the same seed value.
Disable cross-data generation consistency for a workspace. This indicates to not have consistency across data generation runs or with other workspaces.
To ensure consistency across all data generations and workspaces, add the following environment setting to the Structural worker and web server containers:
TONIC_STATISTICS_SEED: <ANY 32-BIT SIGNED INTEGER>
When you configure a value for this environment setting, then consistency is across all data generations for all workspaces that do not either:
Have a workspace seed value configured.
Have disabled consistency across data generations.
For an individual workspace, you can override the Structural seed value. When you override the Structural seed value, you can either:
Disable consistency across data generation runs for the workspace.
Provide a seed value for the workspace.
When a workspace has a configured seed value, then consistency is across the data generation runs for that workspace.
Consistency is also across all of the data generations for all of the workspaces that have the same seed value.
On the workspace details view, to override the Structural seed value:
Toggle Override Statistics Seed to the on position.
To disable consistency across data generations, click Don't use consistency.
To provide a seed value for the workspace:
Click Consistency value.
In the field, enter the seed value. It must be a 32-bit signed integer. The value defaults to the current value of TONIC_STATISTICS_SEED
.
The following generators can be made consistent to themselves. This means that the same input value in the column always produces the same output value.
The following generators can be made consistent either to themselves or to other columns.
When a column is consistent to another column, the output value is based on the other column.
For example, a column contains a company name. You assign the Company Name generator, and make it consistent with the username column. Every row that has the username User1 in the input database has the same company name in the destination database.
Company Name (Deprecated)
Required license: Professional or Enterprise
Not available on Tonic Structural Cloud.
Required global permission: Configure Tonic Structural data encryption
A common use case for custom processing is encrypted source data. The data might need to be decrypted before a generator is applied, and encrypted before it is saved to the destination database.
Structural data encryption allows you to configure decryption and encryption to use during data generation. The data encryption process supports AES encryption, and allows you to use either the CBC, ECB, or CFB cipher modes.
When Structural data encryption is enabled, the configuration panel for each column includes a toggle to use Structural data encryption for that column.
For columns that use both Structural data encryption and a custom value processor:
Decryption occurs before a pre-processing custom value processor.
Encryption occurs after a post-processing custom value processor.
You enable and configure the data encryption from the Data Encryption tab of the Structural Settings view. To display the Structural Settings view, in the Structural heading, click Structural Settings.
To use Structural data encryption, you must provide:
A Base64-encoded decryption key as the value of the TONIC_DATA_DECRYPTION_KEY
environment setting.
A Base64-encoded encryption key as the value of the TONIC_DATA_ENCRYPTION_KEY
environment setting.
Both key values must use the same key size - either 128, 192, or 256.
For more information, go to Configuring environment settings.
Structural validates whether the values are set correctly. Structural enables the rest of the Data Encryption tab settings only if the keys are set correctly.
By default, Structural data encryption is disabled. To enable it, toggle Enable Data Encryption to the on position.
When you enable Structural data encryption, you choose whether to use decryption, encryption, or both.
You use decryption if the source data is encrypted and must be decrypted before the generators are applied.
You use encryption to encrypt the transformed data before saving it to the destination database.
To use decryption only, select Use Decryption.
To use encryption only, select Use Encryption.
To both decrypt and encrypt data, select Use Decryption and Encryption.
Structural only supports AES encryption. The AES Encryption setting shows the current key size.
The key size is based on the values you provided for the decryption and encryption key environment settings.
From the Cipher Mode dropdown list, select the cipher mode to use for Structural data encryption. The available cipher modes are:
CBC
ECB
CFB
Before it decrypts or encrypts data, Structural applies an initialization vector.
By default, Structural generates a random initialization vector, and Use custom Initialization Vector (IV) is in the off position.
To provide custom initialization vectors for Structural to use:
Toggle Use custom Initialization Vector (IV) to the on position.
If the Structural data encryption configuration includes encryption, then in the Encryption IV field, enter the static initialization vector to use to encrypt data.
If the Structural data encryption configuration includes decryption, then in the Decryption IV field, enter the static initialization vector to use to decrypt data.
After it encrypts the destination data, but before it stores it, Structural can prepend a string to the encrypted data.
To configure Structural data encryption to prepend a string:
Toggle Prepend value to encrypted data to the on position.
In the Custom Value field, enter the string to prepend.
After you complete the configuration, the Preview Results panel allows you to test the decryption and encryption.
If the configuration is incomplete, you cannot run the test.
If the configuration is for decryption only:
In the Ciphertext field, enter an encrypted text string.
Click Run Test.
Verify that the text in the Plaintext Result field is correct.
If the configuration is for encryption only:
In the Plaintext field, enter an unencrypted text string.
Click Run Test.
Verify that the text in the Ciphertext Result field is correct.
If the configuration is for both decryption and encryption, then you provide an encrypted string. The test decrypts the string into plain text, then re-encrypts that string.
In the Ciphertext field, enter an encrypted text string.
Click Run Test.
Verify that the text in the Plaintext Result field and the Ciphertext Result field is correct.
To save the configuration, click Save.
To revert any changes since you last saved the configuration, click Revert.
Privacy Hub, Database View, and Table View all provide an option to assign a generator to a column.
For self-hosted Enterprise instances, the selected generator is a generator preset. A generator preset provides a specific configuration for a generator. Whenever a user selects the preset, the generator automatically uses the saved configuration for the preset, which we call the baseline configuration. Tonic Structural provides a built-in preset for most generators. You can also create custom presets.
After you select the preset, you can:
Override the baseline generator preset configuration. For example, if the built-in preset for the Name generator uses the First Last format, but the column contains a first name, you can change the format to First.
Remove the overrides to the baseline configuration.
Save the updated configuration as the new baseline for the generator preset.
Save the updated configuration as a new custom generator preset.
For more information about generator presets, go to Managing generator presets.
Required license to manage generator presets: Enterprise
For Basic and Professional instances, users select and configure generators separately for each column.
Required workspace permission: Configure column generators
From the Generator Type dropdown, select the generator to assign to the column.
The list contains the names of the generators that can be applied to the column.
Use the filter field to search by generator name.
For self-hosted Enterprise instances, the generator names represent built-in and custom generator presets. When you select a generator preset, the configuration is updated to match the current baseline configuration for that preset.
To remove the selected generator and set the generator to Passthrough, click the delete icon next to the generator dropdown list.
After you select a generator preset, you can change the generator configuration. For details about the available configuration options for each generator, see the Generator reference.
Overriding the configuration does not affect the baseline configuration for the generator preset.
A column is also considered to have overrides when someone changed the baseline configuration of the generator preset after it was assigned to the column.
Note that the following configuration options are not part of the preset configuration:
On the column configuration panel, you use the Reset to baseline button to remove any overrides to the current baseline configuration for the generator preset.
From the column configuration panel, you can save the updated configuration as the baseline configuration for the generator preset.
To do this, click Preset Options, then select Update baseline configuration. On the confirmation panel, click Confirm.
When you update the baseline configuration for the generator preset, Structural does not change the configuration of other columns that use the previous baseline configuration.
Whenever you select a generator preset, it uses the current baseline configuration.
From the generator configuration panel, you can save the current configuration as a new custom generator preset.
When you create a new custom generator preset, it is selected as the generator preset for the column.
To do this:
Click Preset Options, then select Create a new generator preset.
On the Create New Preset dialog, in the New Preset Name field, provide a name for the new custom generator preset.
Click Create.
Required license for workspace inheritance: Enterprise
In a child workspace, the configuration panel indicates whether the column currently inherits the configuration from the parent workspace.
The inheritance stops if you select a different generator or change the generator configuration.
The inheritance stops if you select a different generator or generator preset (including the Passthrough generator) or change the configuration.
When the column overrides the parent configuration, to reset to the parent configuration and restore the inheritance, click Reset.
Required license: Enterprise
On Basic or Professional instances, you select and configure generators separately for each column.
Required global permission: Create and manage generator presets
Not available on Structural Cloud.
A generator preset is a saved configuration for a generator.
Tonic Structural provides a built-in preset for every generator. You can update the configuration of the built-in presets.
You can also create custom generator presets that have different configurations. For example, for the Address generator, you can have one generator preset to use for city columns, and another generator preset to use for full addresses. You can edit and delete the custom generator presets. The custom generator presets are available to assign to columns throughout the Structural instance.
Generator presets allow you to standardize the configuration for generators, and saves your users from having to replicate the same configuration selections across different columns, tables, and workspaces. For example, you might modify the generator preset for the Integer Key generator to enable consistency. Whenever a user assigns the Integer Key generator to a column, consistency is enabled.
For information about assigning and updating generator presets for a column, go to Assigning and configuring generators.
You can also view the video tutorial about generator presets.
The Generator Presets view contains the list of built-in generator presets for the entire Structural instance. The configured presets are not specific to a workspace or a user.
To display the Generator Presets view, in the Tonic heading, click Generator Presets.
For each generator preset, the list provides the following information:
The name of the generator preset. For the built-in presets, the generator preset name always matches the generator name.
Whether the generator preset is built-in or custom.
The number of occurrences. Includes the number of occurrences that use the current baseline configuration, and the number of occurrences that have overrides to the baseline configuration.
An occurrence has an override if, after a user assigns the generator preset to a column, one the following occurs:
A user changes the generator configuration options for that occurrence.
A user changes the baseline configuration for the generator preset.
When the preset configuration was most recently modified.
You cannot create or configure generator presets for generators that do not have any configuration options. For example, the Null generator does not have any configuration options.
For composite generators, you cannot create or configure generator presets from Generator Presets view. Generator Presets does not have access to data from which to create path expressions. You can create a new preset or update a preset baseline configuration from a column configuration panel in Privacy Hub, Database View, or Table View.
The list indicates when a generator does not allow you to configure a preset.
You can filter the list of generator presets by the preset name, whether it is built-in or custom, and by the underlying generator type.
To filter by the preset name, begin typing text from the name. As you type, Structural filters the list to only include the matching presets.
To filter the list based on whether the preset is built-in or custom:
Click Filter by Type.
In the dropdown list: To only include built-in presets, click Built-in. To only include custom presets, click Custom.
Tonic adds the selection to the selected filters.
Every generator preset is based on a Structural generator type. For example, there is a built-in generator preset for the Address generator, and you can also create custom generator presets based on the Address generator.
To filter the list based on the generator type:
Click Filter by Generator.
In the generator list, click a generator to include. You can use the search field to search for a specific generator. When you click the generator name, Structural adds the generator to the selected filters.
You can sort the generator preset list by the preset name and the by the modification date.
To sort the generator preset list by a column, click the column heading. To reverse the sort order, click the column heading again.
To create a new custom generator preset, you can either create a completely new preset, or copy an existing preset.
For composite generators such as JSON Mask, you cannot create a generator preset from Generator Presets view. Generator Presets view does not have access to data to use for path expressions. You can create presets for composite generators from a column configuration panel in Privacy Hub, Database View, or Table View.
You cannot create a custom preset at all for a generator that has no configuration options. For example, you cannot create a custom preset for the Null generator.
To create a completely new custom generator preset:
On the Generator Presets view, click Create Preset.
On the Create Preset panel, configure the generator preset.
Click Create.
When you copy an existing generator preset, the new generator preset by default inherits the configuration from the copied generator preset.
To copy an existing generator preset:
On the Generator Presets view, click the copy icon for the generator preset that you want to copy.
On the Copy Preset dialog, enter a name for the new generator preset, then click Copy. The new preset is added to the Generator Presets list, and the details panel is displayed to allow you to change the new preset configuration.
After you update the configuration, click Save and Apply.
On the confirmation panel, click Confirm.
To edit a preset, you must be either an editor or owner of at least one workspace in the Structural instance. If you are not an editor or owner of a workspace, then you can view the list of presets, but you cannot edit the presets.
When you change the configuration of a generator preset, the updated configuration becomes the new baseline configuration for the generator preset.
The baseline configuration is used whenever you select the generator preset. Existing occurrences of the generator preset keep their current configuration. You can reset those occurrences to use the current baseline configuration.
A change to the generator preset description is not considered a change to the baseline configuration.
For composite generators such as JSON Mask, you cannot update a generator preset from Generator Presets view. Generator Presets view does not have access to data to use for path expressions. You can update the baseline configuration from a column configuration panel in Privacy Hub, Database View, or Table View.
To update the baseline configuration of a generator preset:
On the Generator Presets view, click the edit icon for the preset.
On the Configuration tab of the Edit Preset panel, update the configuration. You cannot change the selected generator for the preset.
Click Save and Apply.
On the confirmation panel, click Confirm.
Each generator preset includes the following configuration:
Preset Name - The name of the generator preset. You can change the name of built-in presets. Built-in presets always use the generator name.
Preset Description - A longer description of the generator preset and how it is intended to be used.
Generator Type - Used to select the generator for a new generator preset. When you copy or edit a generator preset, you cannot change the selected generator type.
Generator configuration - The configuration options for the selected generator. For details on the specific configuration options for each generator, go to the Generator reference.
The following items are not included in the generator preset configuration. They are always configured for individual columns after you select the generator preset:
On the generator preset details panel, the Occurrences tab indicates where the generator preset is used. You can also see whether each occurrence overrides the current baseline configuration.
The Occurrences tab displays the list of workspaces that contain occurrences of the preset. Each workspace indicates the total number of occurrences that use the current baseline configuration and that have overrides to the current baseline configuration.
For workspaces that you have access to:
You can expand the workspace to display the list of columns that use the generator preset. For each column, the entry indicates whether the column uses the current baseline configuration.
You can click the Database View icon to navigate to Database View.
For workspaces that you do not have access to, you can only see the total number of occurrences. You cannot display the column list or navigate to Database View.
You can delete custom generator presets. You cannot delete built-in generator presets.
When you delete a custom generator preset, existing occurrences are assigned the built-in generator preset for that generator. If the current configuration does not match the baseline configuration for the built-in generator preset, then the occurrences also are marked as having overrides.
For example, a column is assigned a custom generator preset for the Name generator. The custom generator preset is deleted. The column is then assigned the built-in generator preset for the Name generator, and is marked as having overrides.
To delete a custom generator preset:
On the Generator Presets view, click the delete icon for the generator preset.
On the confirmation dialog, click Delete Preset.
Generates unique email addresses. Replaces the username with a randomly generated GUID, and masks the domain with a character scramble.
This generator only guarantees uniqueness if the underlying column is unique.
To configure the generator:
In the Email Domain field, enter a domain to use for all of the output values.
For example, use @mycompany.com
for all of the generated values.
If you do not provide a value, then the generator uses a character scramble on the domain.
In the Excluded Email Domains field, enter a comma-separated list of domains for which email addresses are not masked in the output values. This allows you, for example, to maintain internal or testing email addresses that are not considered sensitive.
Toggle the Replace invalid emails setting to indicate whether to replace an invalid email address with a generated valid email address. By default, invalid email addresses are not replaced. In the replacement values, the username is generated. If you specify a value for Email Domain, then that value is used for the domain. Otherwise, the domain is generated.
Toggle the Consistency setting to indicate whether to make the generator self-consistent. By default, consistency is disabled.
If Structural data encryption is enabled, then to use it for this column, toggle Use data encryption process to the on position.
Generates a new valid Canadian Social Insurance Number that preserves the formatting of the original value.
For example, the original value might be 123456789
, 123 456 789
, or 123-456-789
. The output value uses the same format.
To configure the generator, toggle the Consistency setting to indicate whether to make the generator self-consistent.
By default, the generator is not consistent.
If Structural data encryption is enabled, then to use it for this column, toggle Use data encryption process to the on position.
Consistency
Yes, can be made self-consistent.
Linking
No, cannot be linked.
Differential privacy
No
Data-free
No
Allowed for primary keys
No
Allowed for unique columns
Yes
Uses format-preserving encryption (FPE)
No
Privacy ranking
3 if not consistent
4 if consistent
Generator ID (for the API)
Consistency
Yes, can be made self-consistent.
Linking
No, cannot be linked.
Differential privacy
No, cannot be made differentially private.
Data-free
Yes, if consistency is not enabled.
Allowed for primary keys
No
Allowed for unique columns
Yes
Uses format-preserving encryption (FPE)
Yes
Privacy ranking
1 if not consistent
4 if consistent
Generator ID (for the API)