Loading...
Loading...
Requires the Advanced API. The Advanced API requires an Enterprise license.
To get the current generator configuration, use:
GET /api/Workspace/{workspace ID}/replacements
The message body contains a set of replacement objects for columns in the specified table that have an assigned generator other than Passthrough. Columns that are assigned the Passthrough generator are not included in the results.
By default, columns are assigned the Passthrough generator, which copies the data as is from the source database to the destination database.
To specify and configure the assigned generators for columns in a specified table, use:
PUT /api/Workspace/{workspaceId}/update_replacements
Note that when you use this endpoint, you must always specify the configuration for all of the columns in the specified table for which to override the default Passthrough generator.
The request replaces all of the current column configuration in the specified table with the configuration that is in the request.
For columns that are not in the request, the assigned generator reverts to Passthrough.
To update a single generator configuration, use:
PUT /api/Workspace/{workspace ID}/replacement
The message body is a single replacement object. You must provide the entire replacement.
For linked columns, the replacement includes the configuration for all of the columns.
For a composite generator, the replacement includes the link objects for all of the sub-generators.
When you remove a replacement, the column reverts to the Passthrough generator. To remove a replacement, use:
DELETE /api/Workspace/{workspace ID}/replacement/{replacement ID}
If the replacement contains linked columns, then all of those columns revert to the Passthrough generator. To restore the configuration for any of the columns, you must create a new replacement.
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
The Business Name generator generates a random company name-like string.
The Business Name generator does not support linking. It can be self-consistent or consistent with another column. You cannot configure differential privacy.
The metadata object is populated from the BaseMetadata
object.
There is no generator-specific configuration.
In the following replacement, the Business Name generator is applied to a company
column, and is consistent with a name
column.
The JSON Mask generator runs a selected generator on values that match a specified JSONPath.
For the JSON Mask generator, you provide a link object for each sub-generator configuration.
The generator does not itself support consistency or differential privacy.
The metadata
object is populated from the JsonMaskMetadata
object, and includes:
pathExpression
, which is the JSONPath that identifies the value to apply the sub-generator to.
The types of values to apply the sub-generator to.
The subGeneratorMetadata
object, which identifies and configures the sub-generator.
Here is the basic structure of a link object for a JSON Mask sub-generator.
In the following example replacement for the JSON Mask generator:
The Date Truncation generator is applied to all values of the JSONPath expression $[*].start
. The value is truncated to the year, and the birthdate flag is off.
The Email generator is applied to all values of the JSONPath expression $[0].email
. The generated email addresses all use gmail.com as the domain, and no domains are excluded. Invalid email addresses are replaced. Consistency is disabled.
If there is an error applying those generators, then the fallback generator is the Null generator.
The generator generates a random hash string.
The Random Hash generator does not support linking or consistency. You cannot configure differential privacy.
There is no generator-specific configuration.
Here is an example replacement for the Random Hash generator.
In this reference, each generator is identified by its name and, in parenthesis, its generator ID. You use the generator ID to identify the generator in the API.
For each generator, this reference shows the structure of a link object, and provides an example of a replacement object.
Additional resources:
Get generator IDs and available metadata
Retrieve information about generators and their configuration options.
Update generator configuration
Change the generator configuration for a table or column
Structure of a generator assignment
How the Structural API represents a generator assignment and configuration.
Generator API reference
Details about the replacement and link structure for each generator.
Requires the Advanced API. The Advanced API requires an Enterprise license.
When using the API to assign generators, you use the generator identifier.
To retrieve the list of generators, use:
In the results, the message body is an array of GeneratorMetadataResponseModel
objects.
The information for each generator includes the generator ID. It also specifies whether the generator supports configuration options such as linking, consistency, differential privacy configuration, and partitioning.
The AI Synthesizer uses deep neural networks to learn models of your data, which can be sampled to generate new synthetic rows that faithfully mimic the statistical properties of your data.
You assign the AI Synthesizer generator to the columns that you want to use to generate the model, then provide the configuration details for the model.
By default, the AI Synthesizer is not available. To enable the AI Synthesizer, set the environment setting TONIC_NN_GENERATOR_ENABLED
to true
. See Configuring environment settings.
For the AI Synthesizer, the replacement contains a link object for each column that is in the model.
The metadata
object is populated from the NnMetadata
object. The model configuration is provided in the nnModelConfig
object outside of the links
object.
For each column, you specify the type of data in that column (Categorical, Numeric, Location).
The following example replacement configures a model based on three columns. Two contain categorical data, and the third contains numeric data.
The model does not contain event data.
The Random UUID generator generates a random UUID-like string.
The Random UUID generator does not support linking or consistency. You cannot configure differential privacy.
There is no generator-specific configuration.
The following example replacement applies the Random UUID generator to a column.
The Random Timestamp generator generates random dates, times, and timestamps that fall within a specified range.
The Random Timestamp generator does not support linking or consistency. You cannot configure differential privacy.
The metadata
object is populated from the TimestampRangeMetadata
object, which includes:
For text columns, the format of the timestamp in the original data.
For integer columns, the unit to use for the value. The generator produces a Unix timestamp.
If the timestamp format includes dates, the minimum and maximum timestamps. Uses the format yyyy-MM-ddTHH:MM:SSZ
.
If the timestamp format is time-only, the minimum and maximum timestamps. The date part of the timestamp is ignored. Uses the format yyyy-MM-ddTHH:MM:SSZ
.
In the following example replacement for the Random Timestamp generator, the source column is a text column. The format for the timestamps in the original data is yyyy-MM-dd HH:mm:ss
. The timestamps occur between November 20, 2022 at 8:57:14 PM and November 21, 2022 at 8:57:14 PM.
The following example uses the same range as the previous example, but the column is an integer column. The generator is configured to produce Unix timestamps in seconds.
The following example generates time-only values between 8:00 AM and 5:00 PM. The date part of the minTime
and maxTime
values is ignored.
In the Tonic Structural API, a generator assignment is referred to as a replacement.
A group of replacements makes up the message body for the response to get generator configuration details, and a request to update generator configuration details.
For details and examples of replacements for each Structural generator, go to Generator API reference.
At a very high level, the structure of a replacement object is:
Each Replacement
object contains:
The name of the replacement
The schema and table where the configured columns are located
Link objects for generator and sub-generator configurations
Columns to use for partitioning
Model configuration for the AI Synthesizer
Within a replacement, each link object contains the generator or sub-generator configuration for a single column.
For fallBackLinks
, the link object contains the generator configuration for the fallback generator.
In the link object, to identify the column, you provide the schema name, table name, and column name.
The schema
and table
values in the link object must match the schema
and table
values for the replacement.
For MongoDB, you also provide the data type.
Note that even if there isn't a schema (for example, for the Databricks data connector), you must still provide an empty value for schema
.
In the link object, the metadata
object identifies the generator and generator preset, and provides the generator configuration.
In the metadata
object, presetId
identifies the applied generator preset configuration. generatorId
identifies the type of generator. generatorId
must match the generator type for presetId
.
Generator presets require an Enterprise license. For Basic and Professional licenses, only generatorId
is provided.
For the built-in preset for a generator, presetId
and generatorId
are the same. If during configuration the generator preset specified by presetId
is not available - for example, if the generator preset was deleted - then the baseline version of the generator specified by generatorId
is applied.
For the generator configuration, metadata
contains fields from BaseMetadata
, which provides the fields to configure consistency and differential privacy.
metadata
can also contain additional objects and fields from generator-specific metadata objects.
In the metadata
object, if Structural data encryption is enabled for the instance, then to indicate to use the configured data encryption, set encryptionProcessor
to x-on
.
In the metadata
object, you can specify a custom value processor to apply to the generator (customValueProcessor
).
In the metadata
object, for composite generators other than Array Regex, Regex, or Conditional, pathExpression
identifies the value within the column to apply a sub-generator to.
The subGeneratorMetadata
object then identifies and configures the generator to apply to that value:
Within subGeneratorMetadata
:
presetId
identifies the generator preset to apply.
generatorId
identifies the type of generator.
customValueProcessor
identifies the custom value processor to apply.
subGeneratorMetadata
also contains any other fields used to configure the selected sub-generator.
The Continuous and Event Timestamps generators allow partitioning.
In the replacement object, the partitions
field contains a comma-separated list of columns to partition by.
In the replacement object, the NnModelConfig
object is used for the AI Synthesizer. It provides the model configuration.
modelType
indicates whether the model contains event data. If the model contains event data, modelType
is RNN_VAE
. If the model does not contain model data, modelType
is VAE
.
Several configuration fields apply to both types of models. Other fields are specific to one type of model.
For models that contain event data, the nnModelConfig
structure is:
For models that do not contain event data, the nnModelConfig
structure is:
The generator replaces the source value with a random string based on the type of address data that the column contains.
The Address generator can be self-consistent or consistent with another column. You cannot configure differential privacy. It can be linked to other columns.
The metadata
object is populated from .
For the Address generator, you specify the type of address value that is in the source column. Here is the basic structure of a link object for the Address generator.
The following example replacement shows two linked columns that are assigned the built-in generator preset for the Address generator. One column contains city names, and the other contains zip codes.
Both columns have consistency disabled.
The generator identifies the algebraic relationship between three or more numeric values and generates new values to match. At least one of the values must be a non-integer.
The Algebraic generator must be linked to at least two other columns.
The Algebraic generator does not support consistency. You cannot configure differential privacy.
There is no generator-specific configuration.
The following example replacement contains three linked columns that are assigned the Algebraic generator.
The generator is a composite generator. It is a version of the JSON Mask generator that can be used for array values. It runs a selected generator on values that match a specified JSONPath.
For the Array JSON Mask generator, you provide a link object for each sub-generator configuration.
The generator does not itself support consistency or differential privacy.
The metadata
object is populated from the object. For the Array JSON Mask generator, metadata
includes:
pathExpression
, which is the path expression that identifies the value to apply the sub-generator to.
The types of values to apply the sub-generator to.
The subGeneratorMetadata
object, which identifies and configures the sub-generator.
Here is the basic structure of a link object for an Array JSON Mask sub-generator.
The following example replacement applies the built-in generator preset for the Geo generator to the value at the specified path expression.
The configuration for the Geo generator indicates that it is a latitude value.
The generator can be applied to primary key columns. It generates unique alphanumeric strings of the same length as the input. For example, for the origin value ABC123, the output value is a six-character alphanumeric string such as D24N05.
The Alphanumeric String Key generator can be self-consistent, but not consistent with another column. You cannot configure differential privacy.
The metadata
object is populated from the object.
There is no generator-specific configuration.
The following example replacement configures a column to use the built-in generator preset for the Alphanumeric String Key generator. The generator is not consistent.
The generator is a version of the Regex Mask generator that can be used for array values.
It uses regular expressions to parse strings and replace specified substrings with the output of specified generators. The parts of the string to replace are specified inside unnamed top-level capture groups.
In the Array Regex Mask generator, each link object identifies a regular expression and the generators to apply to the resulting capture groups.
The generator does not in itself support consistency or allow you to configure differential privacy.
The metadata
object for each link object is populated from the object, and includes:
Whether to replace all matches or only the first match.
The regular expression used to identify the capture groups to replace.
The list of generator types to apply to each capture group. The first sub-generator is applied to the first capture group, the second generator to the second group, and so on.
In the captureGroupMetadata
object, the configuration for each generator in captureGroupSubGenerators
. The sequence of the entries in captureGroupMetadata
must match the sequence of the generators in captureGroupSubGenerators
.
The following example provides a regex pattern that produces a single capture group.
For that capture group, the Constant generator is applied. The capture group value is replaced with test_value
.
The generator is a substitution cipher that preserves formatting, but keeps the URL scheme and top-level domain intact.
The URL generator does not support linking or consistency. You cannot configure differential privacy.
There is no generator-specific configuration.
Here is an example replacement that assigns the URL generator to a column.
The Array Character Scramble generator is intended for array values. It replaces letters with random other letters, and numbers with random other numbers. It preserves punctuation and whitespace from the original value.
The Array Character Scramble generator can be self-consistent, but not consistent with another column. You cannot configure differential privacy.
The metadata
object is populated from the BaseMetadata
object.
There is no generator-specific configuration.
The following example replacement configures a column to use the built-in generator preset for the Array Character Scramble generator. The generator is not consistent.
The Character Substitution generator performs a random character replacement that preserves formatting (spaces, capitalization, and punctuation).
Characters are replaced with other characters from within the same Unicode Block.
The Character Substitution generator is implicitly consistent. You cannot configure consistency or differential privacy. There is no generator-specific configuration.
The following example replacement assigns the Character Substitution generator to a column.
The generator can be applied to primary key columns. It generates unique alphanumeric strings based on any printable ASCII characters.
The ASCII Key generator can be configured to be self-consistent, but not consistent with another column. You cannot configure differential privacy.
The metadata
object is populated from the object.
There is no generator-specific configuration.
In the following example replacement for the ASCII Key generator, consistency is disabled. The output values do not include lowercase letters.
The Categorical generator creates values at the same frequency and using the same values, including NULL values, as the underlying data. In other words, it shuffles the existing values within a field.
The Categorical generator does not support consistency. You can configure differential privacy. You can link columns.
The metadata
object is populated from the CategoricalMetadata
object. It contains the epsilon
field, which provides the privacy budget for differential privacy.
The following example replacement shows a single, un-linked column. Differential privacy is enabled, and epsilon
is set to 1.
The Constant generator uses a single value to mask all of the values in the column.
The Constant generator does not support linking or consistency. You cannot configure differential privacy.
The metadata
object is populated from the ConstantMetadata
object.
The constant
field specifies the value to use to populate the column.
In the following example replacement, the education-num
column value is replaced by the value 10
.
The Continuous generator generates a continuous distribution to fit the underlying data.
The Continuous generator supports linking. It cannot be made consistent, but you can configure differential privacy.
The metadata
object is populated from the BaseMetadata
object.
The Continuous generator does support partitioning, which is configured in the partitions
object outside of the links
object.
There is no generator-specific configuration.
In this example replacement for the Continuous generator, differential privacy is enabled. The capital-gain
column is partitioned by the native-country
and income
columns.
The Cross Table Sum generator sets the value of the column to the sum of the values of another column aggregated across rows that have a foreign key value that matches the primary key in the current record.
For example, in a users
table, a total_transactions
value is obtained from the transactions
table by combining all of the transaction_amount
values from rows that have a user_id
value that matches the primary key value for the current users
record.
The Cross Table Sum generator does not support linking or consistency. You cannot configure differential privacy.
The metadata
object is populated from the CrossTableAggregateMetadata
object.
The generator-specific configuration includes:
The schema and table that contain the column to sum against.
The foreign key column to compare against the primary key for the current table.
The column that contains the values to sum.
The primary key column in the current table.
In the following example replacement for the Cross Table Sum generator, the value of total_transactions
in the users
table is set to the sum of the values of the amount
column in the transactions
table for rows where user_id
has the same value as the id
column in the current users
table row.
The Conditional generator applies different generators to the value conditionally based on any value in the table.
You do not configure consistency or differential privacy for the Conditional generator.
The metadata
object is populated from the ConditionalMetadata
object.
The defaultGenerator
object specifies the generator to apply if none of the conditions are met. It includes a condition
object with an empty list of conditions.
The conditionalGenerators
object contains the generators to apply based on one or more conditions. For each entry in conditionalGenerators
, you identify and configure the generator, and provide the conditions to meet in order to apply that generator. The conditions can be joined by AND
or OR
.
Each condition identifies the column for which to check the value, the type of check, the value to check, and the type of data in the column that is checked.
In the following example replacement for the Conditional generator, the default generator is the Address generator, which is configured with the zip code as the address type.
The column being configured is column1
.
If column1
contains the value VALUE
and column2
is not NULL, then the Random Integer generator is applied to column1
. It applies a value between 0 and 10.
If column4
contains a value that matches the regular expression .*
, then the Categorical generator is applied to column1
. epsilon
is 1, and differential privacy is disabled.
The Character Scramble generator replaces letters with random other letters and numbers with random other numbers. It preserves punctuation, whitespace, and mathematical symbols.
You can configure the Character Scramble generator to be self-consistent, but not consistent to another column. You cannot configure differential privacy.
The metadata
object is populated from the BaseMetadata
object.
There is no generator-specific configuration.
The following replacement for a Character Scramble generator has consistency disabled.
The Company Name generator is deprecated. Use the Business Name generator instead.
The Company Name generator generates a random company name-like string.
The Company Name generator does not support linking. It can be self-consistent or consistent with another column. You cannot configure differential privacy.
The metadata
object is populated from the BaseMetadata
object.
There is no generator-specific configuration.
In the following replacement, the Company Name generator is applied to a company
column, and is consistent with a name
column.
The Custom Categorical generator is a version of the Categorical generator that selects from values that you provide instead of shuffling the original values.
The Custom Categorical generator supports linking. It can be made self-consistent or consistent with another column. You cannot configure differential privacy.
The metadata
object is populated from the CustomCategoricalMetadata
object. You use the customCategories
field to provide a list of the values to use for the column in the destination database. The values are provided on a single line, separated with newline characters (\n
). For example, "Small\nMedium\nLarge"
. To include NULL as an available value, use {NULL}
.
In this example replacement for the Custom Categorical generator, the values to use are Red, Yellow, Blue, and White. The generator is not linked.
Consistency is disabled.
The Email generator scrambles the characters in an email address. It preserves formatting and keeps the @ and . characters.
The Email generator does not support linking. It can be self-consistent, but not consistent with another column. You cannot configure differential privacy.
The metadata
object is populated from the EmailMetadata
object. You can configure:
The domain to use for all of the email addresses in the destination database.
Domains for which to keep the email addresses as is in the destination database.
Whether to replace invalid email addresses with valid ones.
In the following example replacement for the Email generator, all of the destination email addresses use gmail.com as the domain. Source email addresses from yahoo.com are not changed. Invalid email addresses are replaced. The generator is not consistent.
The Date Truncation generator truncates a date value or a timestamp to a specific part. For a date or a timestamp, you can truncate to the year, month, or day. For a timestamp, you can also truncate to the hour, minute, or second.
The Date Truncation generator does not support linking or consistency. You cannot configure differential privacy.
The metadata
object is populated from the DateTruncationMetadata
object. The generator-specific configuration includes the part of the datetime value to truncate to, and whether to change all dates that are more than 90 years before the generation date to a date exactly 90 years before the generation date.
In the following example replacement for the Date Truncation generator, the values are truncated to the year. Date values that are older than 90 years before the generation date are not changed.
The CSV Mask generator allows to assign specific generators to specific indexes. You can also use the generator that is assigned to a specific index as the default. This applies the generator to every index that does not have an assigned generator.
For the CSV Mask generator, there is a link object for each index to assign a generator to.
The generator does not itself support consistency or differential privacy.
The metadata
object is populated from the CsvMaskMetadata
object, and includes:
pathExpression
, which is the index to apply the sub-generator to.
The delimiter used to separate the CSV values.
Whether to apply that generator to indexes that are not assigned a generator.
The subGeneratorMetadata
object, which identifies and configures the sub-generator.
Here is the basic structure of a link object for a CSV Mask sub-generator.
This example replacement for the CSV Mask generator assigns generators to index 0 and index 1 of the column value. The delimiter is a comma.
For index 0, the Address generator is assigned, with an address type of City and consistency disabled.
For index 1, the Company Name generator is assigned, with consistency disabled.
Neither sub-generator is assigned as the default generator for other indexes.
The Event Timestamps generator generates timestamps that fit an event distribution. The source timestamp must include a date. It cannot be a time-only value.
You can link columns to create a sequence of events across multiple columns. This generator can be partitioned by other columns.
The Event Timestamps generator does not support consistency. You cannot configure differential privacy.
The metadata
object is populated from the EventMetadata
object. You use eventOrder
to specify the sequence of the generated datetime values in the linked columns.
The Event Timestamps generator does support partitioning, which is configured in the partitions
object outside of the links
object.
In this replacement example for the Event Timestamps generator, the date_event1
and date_event2
columns are linked. date_event1
occurs first, and date_event2
occurs second. The values are not partitioned.
The File Name generator scrambles characters while preserving formatting and keeping the file extension intact.
The File Name generator does not support linking. It can be self-consistent, but not consistent with another column. You cannot configure differential privacy.
The metadata
object is populated from the BaseMetadata
object.
There is no generator-specific configuration.
In this example replacement for the File Name generator, consistency is enabled.
The Find and Replace generator replaces all instances of a specified find string with a specified replace string.
The Find and Replace generator does not support linking or consistency. You cannot configure differential privacy.
The metadata
object is populated from the FindAndReplaceMetadata
object. The generator-specific configuration includes:
The find string
Whether the find string is a regular expression
The replace string
In this example replacement for the Find and Replace generator, the value yes
is replaced by the value no
. The find string is not a regular expression.
The Geo generator is used to mask latitude or longitude values.
The Geo generator supports linking. Typically, the Geo generator is assigned to a latitude and longitude column and then the columns are linked.
The Geo generator does not support consistency. You cannot configure differential privacy.
The metadata
object is populated from the GeoMetadata
object. geoType
indicates the type of value (latitude or longitude) that is in the column.
In this example replacement for the Geo generator, the lat
and long
columns are assigned the Geo generator and linked.
The FNR generator transforms Norwegian national identity numbers.
The metadata
object is populated from the FnrMetadata
object.
preserveDate
indicates whether to preserve the birthdate values from the source database in the destination database. If the birthdate values are not preserved, the destination values are still within the same range as the source values.
preserveGender
indicates whether the destination value should reflect the same gender as the source value.
In the following example replacement for the FNR generator, the birthdate values in the source database are not preserved in the destination database.
The destination values use the same gender as the source values.
The generator is consistent with the name
column.
The HIPAA Address generator can be used to generate cities, states, zip codes, and latitude/longitude values that follow HIPAA guidelines for safe harbor.
The HIPAA Address generator can be linked. It can be self-consistent, but not consistent with another column. You cannot configure differential privacy.
The metadata
object is populated from the HipaaAddressMetadata
object, which includes: The type of address value that is in the column How to generate zip codes. You can generate zip codes that replace the last two digits with zeros, or use a real zip code from the same state.
The following example replacement for the HIPAA Address generator contains a single, unlinked column that contains a zip code value. The generator is configured to be consistent, and to not use zeros in the generated zip code values.
The generator generates integer values that are between 0 and 2^32 - 1. The input values must be in the range 0 to 2^31 - 1.
The Integer Key generator does not support linking. It can be self-consistent, but not consistent with another column. You cannot configure differential privacy.
The metadata
object is populated from the object, which includes:
The minimum value
The maximum value
The underlying data type for the source values (for MySQL and MongoDB)
In the following example replacement for the Integer Key generator, the generator produces a value between 10 and 20. The original values are Int64. Consistency is enabled.
The generator generates random host names, based on the English language.
The Hostname generator does not support linking. It can be either self-consistent or consistent with another column. You cannot configure differential privacy.
The metadata
object is populated from the object.
There is no generator-specific configuration.
In the following example replacement for the Hostname generator, consistency is disabled.
The generator can generator the following international address values:
Canadian street name
Canadian postal code
United Kingdom (UK) postal code
The International Address generator can be self-consistent. You cannot configure differential privacy. It cannot be linked to other columns.
The metadata object is populated from .
For the International Address generator, you specify the country and the type of address value that is in the source column.
The following example replacement shows a column that is assigned the built-in generator preset for the International Address generator. The column contains a Canadian postal code. The fallback value is K1A.
The column has consistency disabled.
The generator generates a random string that is formatted as an IP address.
The IP Address generator does not support linking. It can be self-consistent or consistent with another column. You cannot configure differential privacy.
The metadata
object is populated from the object. The ratio
field specifies, as a decimal value, the percentage of values to format as IPV4. The remaining values are formatted as IPV6.
In the following example replacement for the IP Address generator, 90% of the generated addresses are IPV4. Consistency is disabled.
The generator runs selected generators on specified key values in an HStore column in a PostgreSQL database. HStore columns contain a set of key-value pairs.
For the HStore Mask generator, there is a link object for each path expression value to assign a generator to.
The generator does not itself support consistency or differential privacy.
The metadata
object is populated from the object. It includes:
pathExpression
, which is the path expression that identifies the value to apply the sub-generator to.
The subGeneratorMetadata
object, which identifies and configures the sub-generator.
In the following example replacement for the HStore Mask generator:
The Random Integer generator is assigned to the value of the pages
path expression. The generator uses values between 300 and 500.
The Character Scramble generator is assigned to the value of the title
path expression. Consistency is disabled.
The generator masks text columns by parsing the contents as HTML, and applying sub-generators to specified path expressions.
If applying a sub-generator fails because of an error, the generator selected as the fallback generator is applied instead.
For the HTML Mask generator, there is a link object for each XPath expression value to assign a generator to.
The generator does not itself support consistency or differential privacy.
The metadata
object is populated from the object. It includes:
pathExpression
, which is the XPath expression that identifies the value to apply the sub-generator to.
The subGeneratorMetadata
object, which identifies and configures the sub-generator.
In the following example replacement for the HTML Mask generator:
The Character Scramble generator is assigned to the value of the XPath expression //p
. Consistency is disabled.
The Company Name generator is assigned to the value of the XPath expression //p/@data
. Consistency is disabled.
In the case of an error applying either of those generators, the fallback generator is the Constant generator, which sets the value to 10
.
The Null generator generates NULL
values to fill the rows of the specified column.
The Null generator does not support linking or consistency. You cannot configure differential privacy.
There is no generator-specific configuration.
The following example replacement applies the Null generator to a column.
The Mongo ObjectId Key generator generates values to de-identify fields that contain MongoDB ObjectId
values. The column value must be 12 bytes long.
The ObjectId Key generator does not support linking. It can be self-consistent, but not consistent with another column. You cannot configure differential privacy.
The metadata
object is populated from the ObjectIdPkMetadata
object. preserveTimetampAndCounter
indicates whether to only change the random value portion of the identifier, but keep the timestamp and incremented counter portions.
There is no generator-specific configuration.
In the following example replacement for the Mongo ObjectId Key generator, consistency is disabled. Only the random value portion of the identifier is changed.
The Noise Generator masks values in numeric columns. It adds or multiplies the original value by random noise.
The Noise Generator does not support linking. It can be either self-consistent or consistent with another column. You cannot configure differential privacy.
The metadata
object is populated from the NoiseMetadata
object. The generator configuration includes:
Whether to use additive or multiplicative noise.
For additive noise, the percentage of the underlying value to scale the noise to.
For multiplicative, the minimum and maximum value for the scaling factor.
In this example replacement for the Noise Generator, the additive noise strategy is used. It scales the noise to 10% of the underlying value. The generator is consistent with the name
column.
The Name generator generates a random name string from a dictionary of first and last names.
The Name generator cannot be linked. It can be self-consistent, but not consistent with another column. You cannot configure differential privacy.
The metadata
object is populated from the NameClassifierMetadata
object, which includes:
The type of name value.
Whether to preserve the capitalization from the source value.
In the following example replacement for the Name generator, the name format is Last, First (Smith, John). Capitalization is preserved. Consistency is disabled.
The MAC Address generator generates a string that is formatted as a MAC address.
The MAC Address generator does not support linking. It can be self-consistent, but not consistent with another column. You cannot configure differential privacy.
The metadata
object is populated from the MacAddressMetadata
object. bytesPreserved
specifies the number of bytes to preserve in the generated address.
In the following example replacement for the MAC Address generator, the generated values preserve 4 bytes. Consistency is disabled.
The Numeric String Key generator generates unique numeric strings of the same length as the input value.
The Numeric String Key generator does not support linking. It can be self-consistent, but not consistent with another column. You cannot configure differential privacy.
The metadata
object is populated from the BaseMetadata
object.
There is no generator-specific configuration.
In the following example replacement for the Numeric String Key generator, consistency is disabled.
The Passthrough generator is the default. It passes through the value from the source database to the destination database without masking it.
You do not usually retrieve or provide a replacement that assigns the Passthrough generator to a column. You might specifically assign the Passthrough as a sub-generator for a composite generator.
When you use the GET api/Workspace/{workspace ID}/replacements/{schema}/{table} to get the column configuration for a table, columns that are assigned Passthrough are not included in the results.
For the PUT /api/Workspace/{workspaceId}/update_replacements/{schema}/{table} endpoint, which replaces the configuration for an entire table, any column that is not included in the message body is automatically assigned Passthrough.
To revert an individual column to Passthrough, you use the DELETE api/Workspace/{workspace ID}/replacement/{replacement ID} endpoint to remove the replacement that contains the column configuration.
The Phone generator generates a random telephone number that matches the country or region of the input telephone number while maintaining the format.
The Phone generator does not support linking. It can be made self-consistent, but not consistent with another column. You cannot configure differential privacy.
The metadata
object is populated from the PhoneNumberMetadata
object, which includes a setting to indicate whether to replace invalid telephone numbers with valid telephone numbers.
In the following replacement for the Phone generator, invalid phone numbers are replaced. Consistency is disabled.
The generator returns a random integer between a specified minimum (inclusive) and maximum (exclusive).
The Random Integer generator does not support linking or consistency. You cannot configure differential privacy.
The metadata
object is populated from , which includes the minimum and maximum values.
In this example replacement for the Random Integer generator, the returned value is between 0 and 5. Because max
is exclusive, the highest possible value is 4.
The Sequential Integer generator returns integer values that increment by 1 for each row in the destination database.
The Sequential Integer generator can be linked. You provide a link object for each linked column. The generator does not support consistency. You cannot configure differential privacy.
The metadata
object is populated from the UniqueIntegerMetadata
object. startingPoint
provides the first integer to apply.
The following example replacement for the Sequential Integer generator configures a single unlinked column. The values start with 4.
The Random Double generator generates a random double number between the specified minimum (inclusive) and maximum (exclusive).
The Random Double generator does not support linking or consistency. You cannot configure differential privacy.
The metadata
object is populated from the ContinuousDistributionMetadata
object. You specify the minimum and maximum values.
In this example replacement for the Random Double generator, the generator is configured to produce numbers between 2.5 and 10.75.
The Random Boolean generator assigns a random boolean value.
The Random Boolean generator does not support linking or consistency. You cannot configure differential privacy.
The metadata
object is populated from the RatioMetadata
object. The ratio
field indicates the percentage (as a decimal value between 0 and 1.0) of values to set to true
.
In the following example replacement for the Random Boolean generator, 40% of the destination values are true
, and 60% are false
.
The Shipping Container generator generates values of ISO 6346 compliant shipping container codes. All generated codes are in the freight category ("U").
The Shipping Container generator does not support linking. It can be self-consistent or consistent with another column. You cannot configure differential privacy.
The metadata
object is populated from the BaseMetadata
object.
There is no generator-specific configuration.
In the following example of a replacement for the Shipping Container generator, consistency is disabled.
The Regex Mask generator uses regular expressions to parse strings and replace specified substrings with the output of specified generators. The parts of the string to replace are specified inside unnamed top-level capture groups.
The Regex Mask generator does not in itself support linking or consistency or allow you to configure differential privacy.
Each link object identifies a regular expression and the generators to apply to the resulting capture groups.
The metadata
object for each link object is populated from the RegexMaskMetadata
object, which includes:
Whether to replace all matches or only the first match.
The regular expression used to identify the capture groups to replace.
The list of generator types to apply to each capture group. The first sub-generator is applied to the first capture group, the second sub-generator to the second group, and so on.
In the captureGroupMetadata
object, the configuration for each generator.
The following example replacement for the Regex Mask generator provides two expressions.
For the first expression, the generator is configured to only replace the first match. There are two capture groups. For the first capture group, the Address generator applies a country value, and consistency is disabled. For the second capture group, the Passthrough generator is applied.
For the second expression, the generator is configured to replace all of the matching values. The Business Name generator is applied with consistency disabled.
The generator generates a new valid Canadian Social Insurance Number that preserves the formatting of the original value.
The SIN generator does not support linking. It can be self-consistent, but not consistent with another column. You cannot configure differential privacy.
There is no generator-specific configuration.
In the following example replacement for the SIN generator, consistency is disabled.
The generator generates a new valid United States Social Security Number.
The SSN generator does not support linking. It can be self-consistent or consistent with another column. You cannot configure differential privacy.
The metadata
object is populated from the object. For the SSN generator, ratio
indicates the percentage of values to format with dashes (123-45-6789). The percentage is provided as a decimal value between 0 and 1.0. The remaining values are formatted as 123456789.
In the following example replacement for the SSN generator, the generator is consistent with the name
column. None of the values are configured with dashes.
The generator generates UUID values. It can be used on primary key columns.
The UUID Key generator does not support linking. It can be self-consistent, but not consistent with another column. You cannot configure differential privacy.
The metadata
object is populated from the object.
There is no generator-specific configuration.
In the following example replacement for the UUID Key generator, the version and variant are not preserved, and consistency is disabled.
The generator generates unique email addresses. It replaces the username with a randomly generated GUID, and either uses a specified domain or masks the domain with a character scramble.
The Unique Email generator does not support linking. It can be self-consistent, but not consistent with another column. You cannot configure differential privacy.
The metadata
object is populated from the object. You can configure:
The domain to use for all of the email addresses in the destination database. If not specified, a character scramble is applied to the domains.
Domains for which to keep the email addresses as is in the destination database.
Whether to replace invalid email addresses with valid ones.
In the following example replacement for the Unique Email generator, consistency is enabled. tonic.ai
is used as the domain for all of the email addresses, and invalid email addresses are not replaced.
The generator applies selected generators to specific StructFields within a StructType in a Spark database.
For the Struct Mask generator, there is a link object for each path expression value to assign a sub-generator to.
The generator does not itself support consistency or differential privacy.
The metadata
object is populated from the object, and includes:
pathExpression
, which is the expression that identifies the value to apply the sub-generator to.
The types of values to apply the sub-generator to.
The subGeneratorMetadata
object, which identifies and configures the selected sub-generator.
In the following example replacement for the StructMask generator:
The value at the path expression $.address.city
is assigned the Address generator. The generator is configured to produce a city value. Consistency is disabled.
The value at the path expression $.address.zip
is also assigned the Address generator. The generator is configured to produce a zip code value. Consistency is disabled.
The Timestamp Shift generator shifts timestamps by a random amount of a specific unit of time within a set range.
The Timestamp Shift generator does not support linking. It can be self-consistent or consistent with another column. You cannot configure differential privacy.
The metadata
object is populated from the TimestampShiftMetadata
object, which includes:
For text source columns, the format of the datetime values in the original data.
For integer source columns (Unix timestamps), the unit to use.
The part of the timestamp to shift.
The minimum amount to shift the value by. Use negative numbers to move the value earlier.
The maximum amount to shift the value by.
The following example replacement for the Timestamp Shift generator updates text timestamps in the format yyyy-MM-dd
. The generator shifts the day anywhere from 3 days before the current day to 3 days after the current day. The generator is consistent with the order
column.
The XML Mask generator runs a selected generator on values that match a user specified path expression.
For the XML Mask generator, there is a link object for each path expression value to assign a sub-generator to.
The generator does not itself support consistency or differential privacy.
The metadata
object is populated from the XmlMaskMetadata
object. It includes:
pathExpression
, which is the expression that identifies the value to apply the sub-generator to.
The subGeneratorMetadata
object, which identifies and configures the sub-generator.
In the following example replacement for the XML Mask generator:
The Name generator is assigned to the path expression //view/item-descriptor//@display-name
. The value is in the format First Name Last Name (John Smith), and capitalization is not preserved. Consistency is disabled.
The Constant generator is assigned to the path expression //view//object-class
. The constant value is object-class
.