The following table summarizes the available generators. The table includes generator characteristics that you might take into account when you select the generator to use for a column.
Generator hints and tips also provides some suggestions for generators to use for specific use cases.
Generator | Description | Supported features |
---|---|---|
Address API: AddressGenerator
Generates replacement values for U.S. mailing addresses. You select the address component or format for the replacement values. For example, the column might only contain a street address or a postal code, or it might contain a full address.
Consistency - Self and other Linkable Differential privacy if not consistent Data-free if not consistent Privacy ranking: - 1 if not consistent - 4 if consistent
Identifies the algebraic relationship between 3 or more numeric values, including at least one non-integer. Based on the relationship, generates new values to match. If there is no relationship, uses the Categorical generator.
Linkable - linking is required Privacy ranking: 3
Generates unique alphanumeric strings of the same length as the input.
For example, for the origin value ABC123
, the output value is a six-character alphanumeric string such as D24N05
.
Consistency - Self only Primary key generator Unique columns allowed Format-preserving encryption (FPE) Privacy ranking: - 3 if not consistent - 4 if consistent
Within an array, replaces letters with random other letters, and numbers with random other numbers. Preserves punctuation and whitespace.
Consistency - Self only Privacy ranking: - 3 if not consistent - 4 if consistent
Used to transform array values in JSON.
To identify values to transform, you provide a list of JSONPaths. For each JSONPath, you assign a sub-generator to apply to matching values.
Composite generator. Feature support is based on the sub-generators. Privacy ranking: 5
Used to transform values in an array. To identify values to transform, you provide a regular expression. For each capture group in an expression, you assign a sub-generator to apply to matching values.
Composite generator. Feature support is based on the sub-generators. Privacy ranking: 5
Generates unique alpha-numeric strings based on any printable ASCII characters. You can optionally exclude lowercase letters from the generated values. The replacement value does not preserve the length of the original value.
Consistency - Self only Primary key generator Unique columns allowed Format-preserving encryption (FPE) Privacy ranking: - 3 if not consistent - 4 if consistent
Generates a random company name-like string.
Consistency - Self or other Differential privacy if not consistent Data-free if not consistent Privacy ranking: - 1 if not consistent - 4 if consistent
Shuffles the original values for a column to different rows. Maintains the overall frequency of each value.
For example, a column contains the values Small
(3 times), Medium
(4 times), and Large (5 times).
In the transformed data, each value appears the same number of times, but the values are shuffled to different rows.
Linkable Differential privacy is configurable Privacy ranking: - 2 with differential privacy - 3 without differential privacy
Replaces letters with random other letters and numbers with random other numbers. Preserves punctuation, whitespace, and mathematical symbols.
Consistency - Self only Privacy ranking: - 3 if not consistent - 4 if consistent
Replaces characters with other random characters. Preserves punctuation, capitalization, and whitepace.
A replacement character is always from within the same Unicode Block as the source character.
A source character is always mapped to the same destination character. For example, M
might always map to V
.
Always self-consistent Unique columns allowed Privacy ranking: 4
Company Name (Deprecated) API: CompanyNameGenerator
This generator is deprecated. Use the Business Name generator instead. Generates a random company name-like string.
Consistency - Self or other Differential privacy if not consistent Data-free if not consistent Privacy ranking: - 1 if not consistent - 4 if consistent
Applies different generators to rows conditionally based on the column value. For example, apply the Character Scramble generator for values other than Test. You configure a list of conditions. Each condition performs a check against the column value. For each condition, you assign a sub-generator to apply to matching values.
Unique columns allowed Composite generator. Other feature support is based on the sub-generators. Privacy ranking: If a fallback generator is selected, then the lower of 5 or the fallback generator. 5 if no fallback generator is selected.
Uses a single specified value to replace all of the values in the column. The replacement value must be compatible with the column data type.
Differential privacy Data-free Privacy ranking: 1
Generates a continuous distribution to fit the underlying data. Can link to other columns to create multivariate distributions. Can also be partitioned by other columns.
Linkable Differential privacy is configurable Privacy ranking: - 2 with differential privacy - 3 without differential privacy
Populates the column using the sum of values from a column in another table. To select the rows to use, uses a foreign key value that matches the primary key value for the current row. For example, to transform the Total_Sales column in the Customers table, from the Transactions table, use the sum of the Amount values for rows where the Customer_ID value matches the primary key value for the current customer.
Privacy ranking: 3
CSV Mask API: CsvMaskGenerator
Used to mask text in a delimited format.
Parses the text as a row where the columns are delimited by a specified character. For each index, you assign a sub-generator to apply to the index value.
Composite generator. Feature support is based on the sub-generators. Privacy ranking: 5
Replaces the original column value with a value from list of values that you provide.
Consistency - Self and other Linkable Differential privacy if not consistent Data-free if not consistent Privacy ranking: - 1 if not consistent - 4 if consistent
Truncates dates or timestamps to a specific date or time component. For example, you might truncate a date value to the month or a timestamp to the hour.
Privacy ranking: 5
Email API: EmailGenerator
Scrambles characters in an email address.
Preserves the formatting and keeps the @
and .
.
You can identify specific email domains to not scramble.
Consistency - Self only Privacy ranking: - 3 if not consistent - 4 if consistent
Generates timestamps that fit an event distribution. You can link columns to create a sequence of events across multiple columns. You can also partition the generator by other columns.
Linkable Privacy ranking: 3
Scrambles characters in a file name.
Preserves the formatting and the file extension.
Consistency - Self only Privacy ranking: - 3 if not consistent - 4 if consistent
Replaces all instances of the find string with the replace string. For the find string, you can optionally provide a regular expression.
Privacy ranking: 5
FNR API: FnrGenerator
Transforms Norwegian national identity numbers. You can optionally preserve the gender and birthdate portions of the identifier values.
Consistency - Self and other Unique columns allowed Privacy ranking - 3 if not consistent - 4 if consistent
Geo API: GeoGenerator
Used to transform columns that contain latitude and longitude values.
Linkable Unique columns allowed Privacy ranking: 3
Can be used to generate cities, states, zip codes, and latitude/longitude values that follow HIPAA guidelines for safe harbor.
Consistency - Self only Privacy ranking: - 3 if not consistent - 4 if consistent
Generates random host names, based on the English language.
Consistency - Self and other Differential privacy if not consistent Data-free if not consistent Privacy ranking: - 1 if not consistent - 4 if consistent
Used to transform values in an HStore column in a PostgreSQL database. You specify a list of keys for which to transform the values. For each key, you assign a generator to apply to the key value.
Composite generator. Feature support is based on the sub-generators. Privacy ranking: 5
Used to transform columns that contain HTML content. To identify the values to transform, you provide a list of path expressions. For each path expression, you assign a generator to apply to the matching value.
Composite generator. Feature support is based on the sub-generators. Privacy ranking: 5
Generates unique integer values.
By default, the generated values are within the range of the column’s data type.
You can also specify a range for the generated values. The source values must be within that range.
Differential privacy if not consistent Data-free if not consistent Primary key generator Unique columns allowed Format-preserving encryption (FPE) Privacy ranking: - 1 if not consistent - 4 if consistent
For Canadian mailing addresses, can generate:
Street name
Postal code
For United Kingdom (UK) mailing addresses, can generate postal codes.
Consistency - Self only Differential privacy if not consistent Data-free if not consistent Privacy ranking: - 1 if not consistent - 4 if consistent
Generates a random IP address-formatted string. You specify the percentage of IPv4 addresses. The remaining addresses are IPv6.
Consistency - Self or other Differential privacy if not consistent Data-free if not consistent Privacy ranking: - 1 if not consistent - 4 if consistent
Used to transform values in JSON columns. To identify values to transform, you provide a list of JSONPaths.
For each JSONPath, you assign a sub-generator to apply to matching values.
Composite generator. Feature support is based on the sub-generators. Privacy ranking: 5
Generates a random MAC address formatted string.
Consistency - Self only Differential privacy if not consistent Data-free if not consistent Format-preserving encryption (FPE) Privacy ranking: - 1 if not consistent - 4 if consistent
Generates unique MongoDB objectId values. Can be assigned to text columns that contain MongoDB ObjectId values. The column value must be 12 bytes long.
Consistency - Self only Privacy ranking: - 3 if not consistent - 4 if consistent
Name API: NameGenerator
Generates a random name string from a dictionary of first and last names. You specify the name format. For example, a column might contain only a first name, or a full name that is last name first.
Consistency - Self or other Differential privacy if not consistent Data-free if not consistent Privacy ranking: - 1 if not consistent - 4 if consistent
Masks values in numeric columns.
Either adds or multiplies the original value by random noise.
Consistency - Self or other Privacy ranking: - 3 if not consistent - 4 if consistent
Null API: NullGenerator
Replaces all of the column values with NULL
values.
Differential privacy Data-free Unique columns allowed Privacy ranking: 1
Generates unique numeric strings of the same length as the input numeric string.
Consistency - Self only Primary key generator Unique columns allowed Format-preserving encryption (FPE) Privacy ranking: - 3 if not consistent - 4 if consistent
Default generator. Does not perform any transformation on the source data.
Unique columns allowed Privacy ranking: 6
Generates a random phone number that matches the country or region and format of the input phone number. For invalid phone numbers, either replaces individual numbers or generates a valid replacement number.
Consistency - Self only Privacy ranking: 3
Generates a random boolean value. You specify the percentage of true values. The remaining values are false.
Differential privacy Data-free Privacy ranking: 1
Generates a random double number that is between the specified minimum (inclusive) and maximum (exclusive) values.
Differential privacy Data-free Privacy ranking: 1
Generates a random hash string.
Differential privacy Data-free Privacy ranking: 1
Returns a random integer that is between the specified minimum (inclusive) and maximum (exclusive) values.
Differential privacy Data-free Privacy ranking: 1
Generates random dates, times, and timestamps that fall within a specified range.
Differential privacy Data-free Privacy ranking: 1
Random UUID API: UUIDGenerator
Generates a random new UUID string.
Differential privacy Data-free Unique columns allowed Privacy ranking: 1
To identify values to transform, you provide a regular expression.
For each capture group in an expression, you assign a sub-generator to apply to matching values.
Unique columns allowed Composite generator. Other feature support is based on the sub-generators. Privacy ranking: 5
Generates a column of unique integer values that start with specified value, and then increment by 1 for each processed row.
Linkable Unique columns allowed Privacy ranking: 3
Generates values of ISO 6346 compliant shipping container codes. The codes are all in the freight ("U") category.
Consistency - Self or other Differential privacy if not consistent Data-free if not consistent Privacy ranking: - 1 if not consistent - 4 if consistent
SIN API: SINGenerator
Generates a new valid Canadian Social Insurance Number. Preserves the formatting from the original value.
Consistency - Self only Data-free if not consistent Unique columns allowed Format-preserving encryption (FPE) Privacy ranking: - 1 if not consistent - 4 if consistent
SSN API: SsnGenerator
Generates a new valid United States Social Security Number. For numeric columns, the dashes (xxx-xx-xxxx) are always excluded. Otherwise, you can specify the percentage of values for which to include the dashes.
Consistency - Self or other Differential privacy if not consistent Data-free if not consistent Privacy ranking: - 1 if not consistent - 4 if consistent
Used to transform StructFields within a StructType in Spark databases (Databricks and Amazon EMR). To identify the StructField value to transform, you provide a path expression. For each path expression, you assign a sub-generator to apply to the matching values.
Composite generator. Feature support is based on the sub-generators. Privacy ranking: 5
Shifts timestamps by a random amount of a specific unit of time, within a set range. The range can start before the original value.
Consistency - Self or other Privacy ranking: - 3 if not consistent - 4 if consistent
Generates unique email addresses.
Replaces the username with a randomly generated GUID, and masks the domain with a character scramble.
Consistency - Self only Unique columns allowed Privacy ranking: - 3 if not consistent - 4 if consistent
URL API: UrlGenerator
Used to transform URLs. Preserves the formatting. Keeps the URL scheme and top-level domain intact.
Unique columns allowed Privacy ranking: 3
UUID Key API: UuidPkGenerator
Generates UUIDs.
Consistency - Self only Primary key generator Unique columns allowed Format-preserving encryption (FPE) Privacy ranking: - 3 if not consistent - 4 if consistent
XML Mask API: XmlMaskGenerator
Used to transform values in XML columns. To identify the values to transform, you provide XPaths. For each XPath, you assign a sub-generator to apply to the matching values.
Composite generator. Feature support is based on the sub-generators. Privacy ranking: 5