Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
The Array Character Scramble generator is intended for array values. It replaces letters with random other letters, and numbers with random other numbers. It preserves punctuation and whitespace from the original value.
The Array Character Scramble generator can be self-consistent, but not consistent with another column. You cannot configure differential privacy.
The metadata
object is populated from the BaseMetadata
object.
There is no generator-specific configuration.
The following example replacement configures a column to use the built-in generator preset for the Array Character Scramble generator. The generator is not consistent.
In this reference, each generator is identified by its name and, in parenthesis, its generator ID. You use the generator ID to identify the generator in the API.
For each generator, this reference shows the structure of a link object, and provides an example of a replacement object.
Additional resources:
The generator is a version of the Regex Mask generator that can be used for array values.
It uses regular expressions to parse strings and replace specified substrings with the output of specified generators. The parts of the string to replace are specified inside unnamed top-level capture groups.
In the Array Regex Mask generator, each link object identifies a regular expression and the generators to apply to the resulting capture groups.
The generator does not in itself support consistency or allow you to configure differential privacy.
The metadata
object for each link object is populated from the object, and includes:
Whether to replace all matches or only the first match.
The regular expression used to identify the capture groups to replace.
The list of generator types to apply to each capture group. The first sub-generator is applied to the first capture group, the second generator to the second group, and so on.
In the captureGroupMetadata
object, the configuration for each generator in captureGroupSubGenerators
. The sequence of the entries in captureGroupMetadata
must match the sequence of the generators in captureGroupSubGenerators
.
The following example provides a regex pattern that produces a single capture group.
For that capture group, the Constant generator is applied. The capture group value is replaced with test_value
.
The generator identifies the algebraic relationship between three or more numeric values and generates new values to match. At least one of the values must be a non-integer.
The Algebraic generator must be linked to at least two other columns.
The Algebraic generator does not support consistency. You cannot configure differential privacy.
There is no generator-specific configuration.
The following example replacement contains three linked columns that are assigned the Algebraic generator.
The generator can be applied to primary key columns. It generates unique alphanumeric strings of the same length as the input. For example, for the origin value ABC123, the output value is a six-character alphanumeric string such as D24N05.
The Alphanumeric String Key generator can be self-consistent, but not consistent with another column. You cannot configure differential privacy.
The metadata
object is populated from the object.
There is no generator-specific configuration.
The following example replacement configures a column to use the built-in generator preset for the Alphanumeric String Key generator. The generator is not consistent.
The generator is a composite generator. It is a version of the JSON Mask generator that can be used for array values. It runs a selected generator on values that match a specified JSONPath.
For the Array JSON Mask generator, you provide a link object for each sub-generator configuration.
The generator does not itself support consistency or differential privacy.
The metadata
object is populated from the object. For the Array JSON Mask generator, metadata
includes:
pathExpression
, which is the path expression that identifies the value to apply the sub-generator to.
The types of values to apply the sub-generator to.
The subGeneratorMetadata
object, which identifies and configures the sub-generator.
Here is the basic structure of a link object for an Array JSON Mask sub-generator.
The following example replacement applies the built-in generator preset for the Geo generator to the value at the specified path expression.
The configuration for the Geo generator indicates that it is a latitude value.
The generator replaces the source value with a random string based on the type of address data that the column contains.
The Address generator can be self-consistent or consistent with another column. You cannot configure differential privacy. It can be linked to other columns.
The metadata
object is populated from .
For the Address generator, you specify the type of address value that is in the source column. Here is the basic structure of a link object for the Address generator.
The following example replacement shows two linked columns that are assigned the built-in generator preset for the Address generator. One column contains city names, and the other contains zip codes.
Both columns have consistency disabled.
The generator can be applied to primary key columns. It generates unique alphanumeric strings based on any printable ASCII characters.
The ASCII Key generator can be configured to be self-consistent, but not consistent with another column. You cannot configure differential privacy.
The metadata
object is populated from the object.
There is no generator-specific configuration.
In the following example replacement for the ASCII Key generator, consistency is disabled. The output values do not include lowercase letters.
The generator generates a random company name-like string.
The Business Name generator does not support linking. It can be self-consistent or consistent with another column. You cannot configure differential privacy.
The metadata object is populated from the object.
There is no generator-specific configuration.
In the following replacement, the Business Name generator is applied to a company
column, and is consistent with a name
column.
The generator creates values at the same frequency and using the same values, including NULL values, as the underlying data. In other words, it shuffles the existing values within a field.
The Categorical generator does not support consistency. You can configure differential privacy. You can link columns.
The metadata
object is populated from the object. It contains the epsilon
field, which provides the .
The following example replacement shows a single, un-linked column. Differential privacy is enabled, and epsilon
is set to 1.
Structure of a generator assignment
How the Tonic Structural API represents a generator assignment and configuration.
Generated Structural API documentation
Generated document with detailed descriptions of the objects and fields.
Generator reference
Detailed descriptions of the generators and their available configuration options.
The Conditional generator applies different generators to the value conditionally based on any value in the table.
You do not configure consistency or differential privacy for the Conditional generator.
The metadata
object is populated from the ConditionalMetadata
object.
The defaultGenerator
object specifies the generator to apply if none of the conditions are met. It includes a condition
object with an empty list of conditions.
The conditionalGenerators
object contains the generators to apply based on one or more conditions. For each entry in conditionalGenerators
, you identify and configure the generator, and provide the conditions to meet in order to apply that generator. The conditions can be joined by AND
or OR
.
Each condition identifies the column for which to check the value, the type of check, the value to check, and the type of data in the column that is checked.
In the following example replacement for the Conditional generator, the default generator is the Address generator, which is configured with the zip code as the address type.
The column being configured is column1
.
If column1
contains the value VALUE
and column2
is not NULL, then the Random Integer generator is applied to column1
. It applies a value between 0 and 10.
If column4
contains a value that matches the regular expression .*
, then the Categorical generator is applied to column1
. epsilon
is 1, and differential privacy is disabled.
The Continuous generator generates a continuous distribution to fit the underlying data.
The Continuous generator supports linking. It cannot be made consistent, but you can configure differential privacy.
The metadata
object is populated from the BaseMetadata
object.
The Continuous generator does support partitioning, which is configured in the partitions
object outside of the links
object.
There is no generator-specific configuration.
In this example replacement for the Continuous generator, differential privacy is enabled. The capital-gain
column is partitioned by the native-country
and income
columns.
The Constant generator uses a single value to mask all of the values in the column.
The Constant generator does not support linking or consistency. You cannot configure differential privacy.
The metadata
object is populated from the ConstantMetadata
object.
The constant
field specifies the value to use to populate the column.
In the following example replacement, the education-num
column value is replaced by the value 10
.
The Company Name generator is deprecated. Use the Business Name generator instead.
The Company Name generator generates a random company name-like string.
The Company Name generator does not support linking. It can be self-consistent or consistent with another column. You cannot configure differential privacy.
The metadata
object is populated from the BaseMetadata
object.
There is no generator-specific configuration.
In the following replacement, the Company Name generator is applied to a company
column, and is consistent with a name
column.
The Date Truncation generator truncates a date value or a timestamp to a specific part. For a date or a timestamp, you can truncate to the year, month, or day. For a timestamp, you can also truncate to the hour, minute, or second.
The Date Truncation generator does not support linking or consistency. You cannot configure differential privacy.
The metadata
object is populated from the DateTruncationMetadata
object. The generator-specific configuration includes the part of the datetime value to truncate to, and whether to change all dates that are more than 90 years before the generation date to a date exactly 90 years before the generation date.
In the following example replacement for the Date Truncation generator, the values are truncated to the year. Date values that are older than 90 years before the generation date are not changed.
The Character Scramble generator replaces letters with random other letters and numbers with random other numbers. It preserves punctuation, whitespace, and mathematical symbols.
You can configure the Character Scramble generator to be self-consistent, but not consistent to another column. You cannot configure differential privacy.
The metadata
object is populated from the BaseMetadata
object.
There is no generator-specific configuration.
The following replacement for a Character Scramble generator has consistency disabled.
The Custom Categorical generator is a version of the Categorical generator that selects from values that you provide instead of shuffling the original values.
The Custom Categorical generator supports linking. It can be made self-consistent or consistent with another column. You cannot configure differential privacy.
The metadata
object is populated from the CustomCategoricalMetadata
object. You use the customCategories
field to provide a list of the values to use for the column in the destination database. The values are provided on a single line, separated with newline characters (\n
). For example, "Small\nMedium\nLarge"
. To include NULL as an available value, use {NULL}
.
In this example replacement for the Custom Categorical generator, the values to use are Red, Yellow, Blue, and White. The generator is not linked.
Consistency is disabled.
The Character Substitution generator performs a random character replacement that preserves formatting (spaces, capitalization, and punctuation).
Characters are replaced with other characters from within the same Unicode Block.
The Character Substitution generator is implicitly consistent. You cannot configure consistency or differential privacy. There is no generator-specific configuration.
The following example replacement assigns the Character Substitution generator to a column.
The Cross Table Sum generator sets the value of the column to the sum of the values of another column aggregated across rows that have a foreign key value that matches the primary key in the current record.
For example, in a users
table, a total_transactions
value is obtained from the transactions
table by combining all of the transaction_amount
values from rows that have a user_id
value that matches the primary key value for the current users
record.
The Cross Table Sum generator does not support linking or consistency. You cannot configure differential privacy.
The metadata
object is populated from the CrossTableAggregateMetadata
object.
The generator-specific configuration includes:
The schema and table that contain the column to sum against.
The foreign key column to compare against the primary key for the current table.
The column that contains the values to sum.
The primary key column in the current table.
In the following example replacement for the Cross Table Sum generator, the value of total_transactions
in the users
table is set to the sum of the values of the amount
column in the transactions
table for rows where user_id
has the same value as the id
column in the current users
table row.
The CSV Mask generator allows to assign specific generators to specific indexes. You can also use the generator that is assigned to a specific index as the default. This applies the generator to every index that does not have an assigned generator.
For the CSV Mask generator, there is a link object for each index to assign a generator to.
The generator does not itself support consistency or differential privacy.
The metadata
object is populated from the CsvMaskMetadata
object, and includes:
pathExpression
, which is the index to apply the sub-generator to.
The delimiter used to separate the CSV values.
Whether to apply that generator to indexes that are not assigned a generator.
The subGeneratorMetadata
object, which identifies and configures the sub-generator.
Here is the basic structure of a link object for a CSV Mask sub-generator.
This example replacement for the CSV Mask generator assigns generators to index 0 and index 1 of the column value. The delimiter is a comma.
For index 0, the Address generator is assigned, with an address type of City and consistency disabled.
For index 1, the Company Name generator is assigned, with consistency disabled.
Neither sub-generator is assigned as the default generator for other indexes.
The Event Timestamps generator generates timestamps that fit an event distribution. The source timestamp must include a date. It cannot be a time-only value.
You can link columns to create a sequence of events across multiple columns. This generator can be partitioned by other columns.
The Event Timestamps generator does not support consistency. You cannot configure differential privacy.
The metadata
object is populated from the EventMetadata
object. You use eventOrder
to specify the sequence of the generated datetime values in the linked columns.
The Event Timestamps generator does support partitioning, which is configured in the partitions
object outside of the links
object.
In this replacement example for the Event Timestamps generator, the date_event1
and date_event2
columns are linked. date_event1
occurs first, and date_event2
occurs second. The values are not partitioned.
The Geo generator is used to mask latitude or longitude values.
The Geo generator supports linking. Typically, the Geo generator is assigned to a latitude and longitude column and then the columns are linked.
The Geo generator does not support consistency. You cannot configure differential privacy.
The metadata
object is populated from the GeoMetadata
object. geoType
indicates the type of value (latitude or longitude) that is in the column.
In this example replacement for the Geo generator, the lat
and long
columns are assigned the Geo generator and linked.
The FNR generator transforms Norwegian national identity numbers.
The metadata
object is populated from the FnrMetadata
object.
preserveDate
indicates whether to preserve the birthdate values from the source database in the destination database. If the birthdate values are not preserved, the destination values are still within the same range as the source values.
preserveGender
indicates whether the destination value should reflect the same gender as the source value.
In the following example replacement for the FNR generator, the birthdate values in the source database are not preserved in the destination database.
The destination values use the same gender as the source values.
The generator is consistent with the name
column.
The Email generator scrambles the characters in an email address. It preserves formatting and keeps the @ and . characters.
The Email generator does not support linking. It can be self-consistent, but not consistent with another column. You cannot configure differential privacy.
The metadata
object is populated from the EmailMetadata
object. You can configure:
The domain to use for all of the email addresses in the destination database.
Domains for which to keep the email addresses as is in the destination database.
Whether to replace invalid email addresses with valid ones.
In the following example replacement for the Email generator, all of the destination email addresses use gmail.com as the domain. Source email addresses from yahoo.com are not changed. Invalid email addresses are replaced. The generator is not consistent.
The File Name generator scrambles characters while preserving formatting and keeping the file extension intact.
The File Name generator does not support linking. It can be self-consistent, but not consistent with another column. You cannot configure differential privacy.
The metadata
object is populated from the BaseMetadata
object.
There is no generator-specific configuration.
In this example replacement for the File Name generator, consistency is enabled.
The Find and Replace generator replaces all instances of a specified find string with a specified replace string.
The Find and Replace generator does not support linking or consistency. You cannot configure differential privacy.
The metadata
object is populated from the FindAndReplaceMetadata
object. The generator-specific configuration includes:
The find string
Whether the find string is a regular expression
The replace string
In this example replacement for the Find and Replace generator, the value yes
is replaced by the value no
. The find string is not a regular expression.
The HStore Mask generator runs selected generators on specified key values in an HStore column in a PostgreSQL database. HStore columns contain a set of key-value pairs.
For the HStore Mask generator, there is a link object for each path expression value to assign a generator to.
The generator does not itself support consistency or differential privacy.
The metadata
object is populated from the HStoreMaskMetadata
object. It includes:
pathExpression
, which is the path expression that identifies the value to apply the sub-generator to.
The subGeneratorMetadata
object, which identifies and configures the sub-generator.
In the following example replacement for the HStore Mask generator:
The Random Integer generator is assigned to the value of the pages
path expression. The generator uses values between 300 and 500.
The Character Scramble generator is assigned to the value of the title
path expression. Consistency is disabled.
The Hostname generator generates random host names, based on the English language.
The Hostname generator does not support linking. It can be either self-consistent or consistent with another column. You cannot configure differential privacy.
The metadata
object is populated from the BaseMetadata
object.
There is no generator-specific configuration.
In the following example replacement for the Hostname generator, consistency is disabled.
The generator masks text columns by parsing the contents as HTML, and applying sub-generators to specified path expressions.
If applying a sub-generator fails because of an error, the generator selected as the fallback generator is applied instead.
For the HTML Mask generator, there is a link object for each XPath expression value to assign a generator to.
The generator does not itself support consistency or differential privacy.
The metadata
object is populated from the object. It includes:
pathExpression
, which is the XPath expression that identifies the value to apply the sub-generator to.
The subGeneratorMetadata
object, which identifies and configures the sub-generator.
In the following example replacement for the HTML Mask generator:
The Character Scramble generator is assigned to the value of the XPath expression //p
. Consistency is disabled.
The Company Name generator is assigned to the value of the XPath expression //p/@data
. Consistency is disabled.
In the case of an error applying either of those generators, the fallback generator is the Constant generator, which sets the value to 10
.
The generator can generator the following international address values:
Canadian street name
Canadian postal code
United Kingdom (UK) postal code
The International Address generator can be self-consistent. You cannot configure differential privacy. It cannot be linked to other columns.
The metadata object is populated from .
For the International Address generator, you specify the country and the type of address value that is in the source column.
The following example replacement shows a column that is assigned the built-in generator preset for the International Address generator. The column contains a Canadian postal code. The fallback value is K1A.
The column has consistency disabled.
The generator can be used to generate cities, states, zip codes, and latitude/longitude values that follow HIPAA guidelines for safe harbor.
The HIPAA Address generator can be linked. It can be self-consistent, but not consistent with another column. You cannot configure differential privacy.
The metadata
object is populated from the object, which includes: The type of address value that is in the column How to generate zip codes. You can generate zip codes that replace the last two digits with zeros, or use a real zip code from the same state.
The following example replacement for the HIPAA Address generator contains a single, unlinked column that contains a zip code value. The generator is configured to be consistent, and to not use zeros in the generated zip code values.
The generator generates integer values that are between 0 and 2^32 - 1. The input values must be in the range 0 to 2^31 - 1.
The Integer Key generator does not support linking. It can be self-consistent, but not consistent with another column. You cannot configure differential privacy.
The metadata
object is populated from the object, which includes:
The minimum value
The maximum value
The underlying data type for the source values (for MySQL and MongoDB)
In the following example replacement for the Integer Key generator, the generator produces a value between 10 and 20. The original values are Int64. Consistency is enabled.
The generator generates a random string that is formatted as an IP address.
The IP Address generator does not support linking. It can be self-consistent or consistent with another column. You cannot configure differential privacy.
The metadata
object is populated from the object. The ratio
field specifies, as a decimal value, the percentage of values to format as IPV4. The remaining values are formatted as IPV6.
In the following example replacement for the IP Address generator, 90% of the generated addresses are IPV4. Consistency is disabled.
The generator generates a string that is formatted as a MAC address.
The MAC Address generator does not support linking. It can be self-consistent, but not consistent with another column. You cannot configure differential privacy.
The metadata
object is populated from the object. bytesPreserved
specifies the number of bytes to preserve in the generated address.
In the following example replacement for the MAC Address generator, the generated values preserve 4 bytes. Consistency is disabled.
The generator runs a selected generator on values that match a specified JSONPath.
For the JSON Mask generator, you provide a link object for each sub-generator configuration.
The generator does not itself support consistency or differential privacy.
The metadata
object is populated from the object, and includes:
pathExpression
, which is the JSONPath that identifies the value to apply the sub-generator to.
The types of values to apply the sub-generator to.
The subGeneratorMetadata
object, which identifies and configures the sub-generator.
Here is the basic structure of a link object for a JSON Mask sub-generator.
In the following example replacement for the JSON Mask generator:
The Date Truncation generator is applied to all values of the JSONPath expression $[*].start
. The value is truncated to the year, and the birthdate flag is off.
The Email generator is applied to all values of the JSONPath expression $[0].email
. The generated email addresses all use gmail.com as the domain, and no domains are excluded. Invalid email addresses are replaced. Consistency is disabled.
If there is an error applying those generators, then the fallback generator is the Null generator.
The generator generates a random name string from a dictionary of first and last names.
The Name generator cannot be linked. It can be self-consistent, but not consistent with another column. You cannot configure differential privacy.
The metadata
object is populated from the object, which includes:
The type of name value.
Whether to preserve the capitalization from the source value.
In the following example replacement for the Name generator, the name format is Last, First (Smith, John). Capitalization is preserved. Consistency is disabled.
The generator generates values to de-identify fields that contain MongoDB ObjectId
values. The column value must be 12 bytes long.
The ObjectId Key generator does not support linking. It can be self-consistent, but not consistent with another column. You cannot configure differential privacy.
The metadata
object is populated from the object. preserveTimetampAndCounter
indicates whether to only change the random value portion of the identifier, but keep the timestamp and incremented counter portions.
There is no generator-specific configuration.
In the following example replacement for the Mongo ObjectId Key generator, consistency is disabled. Only the random value portion of the identifier is changed.
The Noise Generator masks values in numeric columns. It adds or multiplies the original value by random noise.
The Noise Generator does not support linking. It can be either self-consistent or consistent with another column. You cannot configure differential privacy.
The metadata
object is populated from the NoiseMetadata
object. The generator configuration includes:
Whether to use additive or multiplicative noise.
For additive noise, the percentage of the underlying value to scale the noise to.
For multiplicative, the minimum and maximum value for the scaling factor.
In this example replacement for the Noise Generator, the additive noise strategy is used. It scales the noise to 10% of the underlying value. The generator is consistent with the name
column.
The Numeric String Key generator generates unique numeric strings of the same length as the input value.
The Numeric String Key generator does not support linking. It can be self-consistent, but not consistent with another column. You cannot configure differential privacy.
The metadata
object is populated from the BaseMetadata
object.
There is no generator-specific configuration.
In the following example replacement for the Numeric String Key generator, consistency is disabled.
The Phone generator generates a random telephone number that matches the country or region of the input telephone number while maintaining the format.
The Phone generator does not support linking. It can be made self-consistent, but not consistent with another column. You cannot configure differential privacy.
The metadata
object is populated from the PhoneNumberMetadata
object, which includes a setting to indicate whether to replace invalid telephone numbers with valid telephone numbers.
In the following replacement for the Phone generator, invalid phone numbers are replaced. Consistency is disabled.
The Passthrough generator is the default. It passes through the value from the source database to the destination database without masking it.
You do not usually retrieve or provide a replacement that assigns the Passthrough generator to a column. You might specifically assign the Passthrough as a sub-generator for a composite generator.
When you use the GET api/Workspace/{workspace ID}/replacements/{schema}/{table} to get the column configuration for a table, columns that are assigned Passthrough are not included in the results.
For the PUT /api/Workspace/{workspaceId}/update_replacements/{schema}/{table} endpoint, which replaces the configuration for an entire table, any column that is not included in the message body is automatically assigned Passthrough.
To revert an individual column to Passthrough, you use the DELETE api/Workspace/{workspace ID}/replacement/{replacement ID} endpoint to remove the replacement that contains the column configuration.
The Random Double generator generates a random double number between the specified minimum (inclusive) and maximum (exclusive).
The Random Double generator does not support linking or consistency. You cannot configure differential privacy.
The metadata
object is populated from the ContinuousDistributionMetadata
object. You specify the minimum and maximum values.
In this example replacement for the Random Double generator, the generator is configured to produce numbers between 2.5 and 10.75.
The Random Integer generator returns a random integer between a specified minimum (inclusive) and maximum (exclusive).
The Random Integer generator does not support linking or consistency. You cannot configure differential privacy.
The metadata
object is populated from DiscreteDistributionMetadata
, which includes the minimum and maximum values.
In this example replacement for the Random Integer generator, the returned value is between 0 and 5. Because max
is exclusive, the highest possible value is 4.
The Null generator generates NULL
values to fill the rows of the specified column.
The Null generator does not support linking or consistency. You cannot configure differential privacy.
There is no generator-specific configuration.
The following example replacement applies the Null generator to a column.
The Random Boolean generator assigns a random boolean value.
The Random Boolean generator does not support linking or consistency. You cannot configure differential privacy.
The metadata
object is populated from the RatioMetadata
object. The ratio
field indicates the percentage (as a decimal value between 0 and 1.0) of values to set to true
.
In the following example replacement for the Random Boolean generator, 40% of the destination values are true
, and 60% are false
.
The Random Timestamp generator generates random dates, times, and timestamps that fall within a specified range.
The Random Timestamp generator does not support linking or consistency. You cannot configure differential privacy.
The metadata
object is populated from the TimestampRangeMetadata
object, which includes:
For text columns, the format of the timestamp in the original data.
For integer columns, the unit to use for the value. The generator produces a Unix timestamp.
If the timestamp format includes dates, the minimum and maximum timestamps. Uses the format yyyy-MM-ddTHH:MM:SSZ
.
If the timestamp format is time-only, the minimum and maximum timestamps. The date part of the timestamp is ignored. Uses the format yyyy-MM-ddTHH:MM:SSZ
.
In the following example replacement for the Random Timestamp generator, the source column is a text column. The format for the timestamps in the original data is yyyy-MM-dd HH:mm:ss
. The timestamps occur between November 20, 2022 at 8:57:14 PM and November 21, 2022 at 8:57:14 PM.
The following example uses the same range as the previous example, but the column is an integer column. The generator is configured to produce Unix timestamps in seconds.
The following example generates time-only values between 8:00 AM and 5:00 PM. The date part of the minTime
and maxTime
values is ignored.
The Random Hash generator generates a random hash string.
The Random Hash generator does not support linking or consistency. You cannot configure differential privacy.
There is no generator-specific configuration.
Here is an example replacement for the Random Hash generator.
The Random UUID generator generates a random UUID-like string.
The Random UUID generator does not support linking or consistency. You cannot configure differential privacy.
There is no generator-specific configuration.
The following example replacement applies the Random UUID generator to a column.
The Regex Mask generator uses regular expressions to parse strings and replace specified substrings with the output of specified generators. The parts of the string to replace are specified inside unnamed top-level capture groups.
The Regex Mask generator does not in itself support linking or consistency or allow you to configure differential privacy.
Each link object identifies a regular expression and the generators to apply to the resulting capture groups.
The metadata
object for each link object is populated from the RegexMaskMetadata
object, which includes:
Whether to replace all matches or only the first match.
The regular expression used to identify the capture groups to replace.
The list of generator types to apply to each capture group. The first sub-generator is applied to the first capture group, the second sub-generator to the second group, and so on.
In the captureGroupMetadata
object, the configuration for each generator.
The following example replacement for the Regex Mask generator provides two expressions.
For the first expression, the generator is configured to only replace the first match. There are two capture groups. For the first capture group, the Address generator applies a country value, and consistency is disabled. For the second capture group, the Passthrough generator is applied.
For the second expression, the generator is configured to replace all of the matching values. The Business Name generator is applied with consistency disabled.
The Sequential Integer generator returns integer values that increment by 1 for each row in the destination database.
The Sequential Integer generator can be linked. You provide a link object for each linked column. The generator does not support consistency. You cannot configure differential privacy.
The metadata
object is populated from the UniqueIntegerMetadata
object. startingPoint
provides the first integer to apply.
The following example replacement for the Sequential Integer generator configures a single unlinked column. The values start with 4.
The Timestamp Shift generator shifts timestamps by a random amount of a specific unit of time within a set range.
The Timestamp Shift generator does not support linking. It can be self-consistent or consistent with another column. You cannot configure differential privacy.
The metadata
object is populated from the TimestampShiftMetadata
object, which includes:
For text source columns, the format of the datetime values in the original data.
For integer source columns (Unix timestamps), the unit to use.
The part of the timestamp to shift.
The minimum amount to shift the value by. Use negative numbers to move the value earlier.
The maximum amount to shift the value by.
The following example replacement for the Timestamp Shift generator updates text timestamps in the format yyyy-MM-dd
. The generator shifts the day anywhere from 3 days before the current day to 3 days after the current day. The generator is consistent with the order
column.
The Shipping Container generator generates values of ISO 6346 compliant shipping container codes. All generated codes are in the freight category ("U").
The Shipping Container generator does not support linking. It can be self-consistent or consistent with another column. You cannot configure differential privacy.
The metadata
object is populated from the BaseMetadata
object.
There is no generator-specific configuration.
In the following example of a replacement for the Shipping Container generator, consistency is disabled.
The Unique Email generator generates unique email addresses. It replaces the username with a randomly generated GUID, and either uses a specified domain or masks the domain with a character scramble.
The Unique Email generator does not support linking. It can be self-consistent, but not consistent with another column. You cannot configure differential privacy.
The metadata
object is populated from the EmailMetadata
object. You can configure:
The domain to use for all of the email addresses in the destination database. If not specified, a character scramble is applied to the domains.
Domains for which to keep the email addresses as is in the destination database.
Whether to replace invalid email addresses with valid ones.
In the following example replacement for the Unique Email generator, consistency is enabled. tonic.ai
is used as the domain for all of the email addresses, and invalid email addresses are not replaced.
The SSN generator generates a new valid United States Social Security Number.
The SSN generator does not support linking. It can be self-consistent or consistent with another column. You cannot configure differential privacy.
The metadata
object is populated from the RatioMetadata
object. For the SSN generator, ratio
indicates the percentage of values to format with dashes (123-45-6789). The percentage is provided as a decimal value between 0 and 1.0. The remaining values are formatted as 123456789.
In the following example replacement for the SSN generator, the generator is consistent with the name
column. None of the values are configured with dashes.
The Struct Mask generator applies selected generators to specific StructFields within a StructType in a Spark database.
For the Struct Mask generator, there is a link object for each path expression value to assign a sub-generator to.
The generator does not itself support consistency or differential privacy.
The metadata
object is populated from the JsonMaskMetadata
object, and includes:
pathExpression
, which is the expression that identifies the value to apply the sub-generator to.
The types of values to apply the sub-generator to.
The subGeneratorMetadata
object, which identifies and configures the selected sub-generator.
In the following example replacement for the StructMask generator:
The value at the path expression $.address.city
is assigned the Address generator. The generator is configured to produce a city value. Consistency is disabled.
The value at the path expression $.address.zip
is also assigned the Address generator. The generator is configured to produce a zip code value. Consistency is disabled.
The SIN generator generates a new valid Canadian Social Insurance Number that preserves the formatting of the original value.
The SIN generator does not support linking. It can be self-consistent, but not consistent with another column. You cannot configure differential privacy.
There is no generator-specific configuration.
In the following example replacement for the SIN generator, consistency is disabled.
The generator runs a selected generator on values that match a user specified path expression.
For the XML Mask generator, there is a link object for each path expression value to assign a sub-generator to.
The generator does not itself support consistency or differential privacy.
The metadata
object is populated from the object. It includes:
pathExpression
, which is the expression that identifies the value to apply the sub-generator to.
The subGeneratorMetadata
object, which identifies and configures the sub-generator.
In the following example replacement for the XML Mask generator:
The Name generator is assigned to the path expression //view/item-descriptor//@display-name
. The value is in the format First Name Last Name (John Smith), and capitalization is not preserved. Consistency is disabled.
The Constant generator is assigned to the path expression //view//object-class
. The constant value is object-class
.
The generator is a substitution cipher that preserves formatting, but keeps the URL scheme and top-level domain intact.
The URL generator does not support linking or consistency. You cannot configure differential privacy.
There is no generator-specific configuration.
Here is an example replacement that assigns the URL generator to a column.
The generator generates UUID values. It can be used on primary key columns.
The UUID Key generator does not support linking. It can be self-consistent, but not consistent with another column. You cannot configure differential privacy.
The metadata
object is populated from the object.
There is no generator-specific configuration.
In the following example replacement for the UUID Key generator, the version and variant are not preserved, and consistency is disabled.