LogoLogo
Release notesAPI docsDocs homeStructural CloudTonic.ai
  • Tonic Structural User Guide
  • About Tonic Structural
    • Structural data generation workflow
    • Structural deployment types
    • Structural implementation roles
    • Structural license plans
  • Logging into Structural for the first time
  • Getting started with the Structural free trial
  • Managing your user account
  • Frequently Asked Questions
  • Tutorial videos
  • Creating and managing workspaces
    • Managing workspaces
      • Viewing your list of workspaces
      • Creating, editing, or deleting a workspace
      • Workspace configuration settings
        • Workspace identification and connection type
        • Data connection settings
        • Configuring secrets managers for database connections
        • Data generation settings
        • Enabling and configuring upsert
        • Writing output to Tonic Ephemeral
        • Writing output to a container repository
        • Advanced workspace overrides
      • About the workspace management view
      • About workspace inheritance
      • Assigning tags to a workspace
      • Exporting and importing the workspace configuration
    • Managing access to workspaces
      • Sharing workspace access
      • Transferring ownership of a workspace
    • Viewing workspace jobs and job details
  • Configuring data generation
    • Privacy Hub
    • Database View
      • Viewing and configuring tables
      • Viewing the column list
      • Displaying sample data for a column
      • Configuring an individual column
      • Configuring multiple columns
      • Identifying similar columns
      • Commenting on columns
    • Table View
    • Working with document-based data
      • Performing scans on collections
      • Using Collection View
    • Identifying sensitive data
      • Running the Structural sensitivity scan
      • Manually indicating whether a column is sensitive
      • Built-in sensitivity types that Structural detects
      • Creating and managing custom sensitivity rules
    • Table modes
    • Generator information
      • Generator summary
      • Generator reference
        • Address
        • Algebraic
        • Alphanumeric String Key
        • Array Character Scramble
        • Array JSON Mask
        • Array Regex Mask
        • ASCII Key
        • Business Name
        • Categorical
        • Character Scramble
        • Character Substitution
        • Company Name
        • Conditional
        • Constant
        • Continuous
        • Cross Table Sum
        • CSV Mask
        • Custom Categorical
        • Date Truncation
        • Email
        • Event Timestamps
        • File Name
        • Find and Replace
        • FNR
        • Geo
        • HIPAA Address
        • Hostname
        • HStore Mask
        • HTML Mask
        • Integer Key
        • International Address
        • IP Address
        • JSON Mask
        • MAC Address
        • Mongo ObjectId Key
        • Name
        • Noise Generator
        • Null
        • Numeric String Key
        • Passthrough
        • Phone
        • Random Boolean
        • Random Double
        • Random Hash
        • Random Integer
        • Random Timestamp
        • Random UUID
        • Regex Mask
        • Sequential Integer
        • Shipping Container
        • SIN
        • SSN
        • Struct Mask
        • Timestamp Shift Generator
        • Unique Email
        • URL
        • UUID Key
        • XML Mask
      • Generator characteristics
        • Enabling consistency
        • Linking generators
        • Differential privacy
        • Partitioning a column
        • Data-free generators
        • Supporting uniqueness constraints
        • Format-preserving encryption (FPE)
      • Generator types
        • Composite generators
        • Primary key generators
    • Generator assignment and configuration
      • Reviewing and applying recommended generators
      • Assigning and configuring generators
      • Document View for file connector JSON columns
      • Generator hints and tips
      • Managing generator presets
      • Configuring and using Structural data encryption
      • Custom value processors
    • Subsetting data
      • About subsetting
      • Using table filtering for data warehouses and Spark-based data connectors
      • Viewing the current subsetting configuration
      • Subsetting and foreign keys
      • Configuring subsetting
      • Viewing and managing configuration inheritance
      • Viewing the subset creation steps
      • Viewing previous subsetting data generation runs
      • Generating cohesive subset data from related databases
      • Other subsetting hints and tips
    • Viewing and adding foreign keys
    • Viewing and resolving schema changes
    • Tracking changes to workspaces, generator presets, and sensitivity rules
    • Using the Privacy Report to verify data protection
  • Running data generation
    • Running data generation jobs
      • Types of data generation
      • Data generation process
      • Running data generation manually
      • Scheduling data generation
      • Issues that prevent data generation
    • Managing data generation performance
    • Viewing and downloading container artifacts
    • Post-job scripts
    • Webhooks
  • Installing and Administering Structural
    • Structural architecture
    • Using Structural securely
    • Deploying a self-hosted Structural instance
      • Deployment checklist
      • System requirements
      • Deploying with Docker Compose
      • Deploying on Kubernetes with Helm
      • Enabling the option to write output data to a container repository
        • Setting up a Kubernetes cluster to use to write output data to a container repository
        • Required access to write destination data to a container repository
      • Entering and updating your license key
      • Setting up host integration
      • Working with the application database
      • Setting up a secret
      • Setting a custom certificate
    • Using Structural Cloud
      • Structural Cloud notes
      • Setting up and managing a Structural Cloud pay-as-you-go subscription
      • Structural Cloud onboarding
    • Managing user access to Structural
      • Structural organizations
      • Determining whether users can create accounts
      • Creating a new account in an existing organization
      • Single sign-on (SSO)
        • Structural user authentication with SSO
        • Enabling and configuring SSO on Structural Cloud
        • Synchronizing SSO groups with Structural
        • Viewing the list of SSO groups in Tonic Structural
        • AWS IAM Identity Center
        • Duo
        • GitHub
        • Google
        • Keycloak
        • Microsoft Entra ID (previously Azure Active Directory)
        • Okta
        • OpenID Connect (OIDC)
        • SAML
      • Managing Structural users
      • Managing permissions
        • About permission sets
        • Built-in permission sets
        • Available permissions
        • Viewing the lists of global and workspace permission sets
        • Configuring custom permission sets
        • Selecting default permission sets
        • Configuring access to global permission sets
        • Setting initial access to all global permissions
        • Granting Account Admin access for a Structural Cloud organization
    • Structural monitoring and logging
      • Monitoring Structural services
      • Performing health checks
      • Downloading the usage report
      • Tracking user access and permissions
      • Redacted and diagnostic (unredacted) logs
      • Data that Tonic.ai collects
      • Verifying and enabling telemetry sharing
    • Configuring environment settings
    • Updating Structural
  • Connecting to your data
    • About data connectors
    • Overview for database administrators
    • Data connector summary
    • Amazon DynamoDB
      • System requirements and limitations for DynamoDB
      • Structural differences and limitations with DynamoDB
      • Before you create a DynamoDB workspace
      • Configuring DynamoDB workspace data connections
    • Amazon EMR
      • Structural process overview for Amazon EMR
      • System requirements for Amazon EMR
      • Structural differences and limitations with Amazon EMR
      • Before you create an Amazon EMR workspace
        • Creating IAM roles for Structural and Amazon EMR
        • Creating Athena workgroups
        • Configuration for cross-account setups
      • Configuring Amazon EMR workspace data connections
    • Amazon Redshift
      • Structural process overview for Amazon Redshift
      • Structural differences and limitations with Amazon Redshift
      • Before you create an Amazon Redshift workspace
        • Required AWS instance profile permissions for Amazon Redshift
        • Setting up the AWS Lambda role for Amazon Redshift
        • AWS KMS permissions for Amazon SQS message encryption
        • Amazon Redshift-specific Structural environment settings
        • Source and destination database permissions for Amazon Redshift
      • Configuring Amazon Redshift workspace data connections
    • Databricks
      • Structural process overview for Databricks
      • System requirements for Databricks
      • Structural differences and limitations with Databricks
      • Before you create a Databricks workspace
        • Granting access to storage
        • Setting up your Databricks cluster
        • Configuring the destination database schema creation
      • Configuring Databricks workspace data connections
    • Db2 for LUW
      • System requirements for Db2 for LUW
      • Structural differences and limitations with Db2 for LUW
      • Before you create a Db2 for LUW workspace
      • Configuring Db2 for LUW workspace data connections
    • File connector
      • Overview of the file connector process
      • Supported file and content types
      • Structural differences and limitations with the file connector
      • Before you create a file connector workspace
      • Configuring the file connector storage type and output options
      • Managing file groups in a file connector workspace
      • Downloading generated file connector files
    • Google BigQuery
      • Structural differences and limitations with Google BigQuery
      • Before you create a Google BigQuery workspace
      • Configuring Google BigQuery workspace data connections
      • Resolving schema changes for de-identified views
    • MongoDB
      • System requirements for MongoDB
      • Structural differences and limitations with MongoDB
      • Configuring MongoDB workspace data connections
      • Other MongoDB hints and tips
    • MySQL
      • System requirements for MySQL
      • Before you create a MySQL workspace
      • Configuring MySQL workspace data connections
    • Oracle
      • Known limitations for Oracle schema objects
      • System requirements for Oracle
      • Structural differences and limitations with Oracle
      • Before you create an Oracle workspace
      • Configuring Oracle workspace data connections
    • PostgreSQL
      • System requirements for PostgreSQL
      • Before you create a PostgreSQL workspace
      • Configuring PostgreSQL workspace data connections
    • Salesforce
      • System requirements for Salesforce
      • Structural differences and limitations with Salesforce
      • Before you create a Salesforce workspace
      • Configuring Salesforce workspace data connections
    • Snowflake on AWS
      • Structural process overviews for Snowflake on AWS
      • Structural differences and limitations with Snowflake on AWS
      • Before you create a Snowflake on AWS workspace
        • Required AWS instance profile permissions for Snowflake on AWS
        • Other configuration for Lambda processing
        • Source and destination database permissions for Snowflake on AWS
        • Configuring whether Structural creates the Snowflake on AWS destination database schema
      • Configuring Snowflake on AWS workspace data connections
    • Snowflake on Azure
      • Structural process overview for Snowflake on Azure
      • Structural differences and limitations with Snowflake on Azure
      • Before you create a Snowflake on Azure workspace
      • Configuring Snowflake on Azure workspace data connections
    • Spark SDK
      • Structural process overview for the Spark SDK
      • Structural differences and limitations with the Spark SDK
      • Configuring Spark SDK workspace data connections
      • Using Spark to run de-identification of the data
    • SQL Server
      • System requirements for SQL Server
      • Before you create a SQL Server workspace
      • Configuring SQL Server workspace data connections
    • Yugabyte
      • System requirements for Yugabyte
      • Structural differences and limitations with Yugabyte
      • Before you create a Yugabyte workspace
      • Configuring Yugabyte workspace data connections
      • Troubleshooting Yugabyte data generation issues
  • Using the Structural API
    • About the Structural API
    • Getting an API token
    • Getting the workspace ID
    • Using the Structural API to perform tasks
      • Configure environment settings
      • Manage generator presets
        • Retrieving the list of generator presets
        • Structure of a generator preset
        • Creating a custom generator preset
        • Updating an existing generator preset
        • Deleting a generator preset
      • Manage custom sensitivity rules
      • Create a workspace
      • Connect to source and destination data
      • Manage file groups in a file connector workspace
      • Assign table modes and filters to source database tables
      • Set column sensitivity
      • Assign generators to columns
        • Getting the generator IDs and available metadata
        • Updating generator configurations
        • Structure of a generator assignment
        • Generator API reference
          • Address (AddressGenerator)
          • Algebraic (AlgebraicGenerator)
          • Alphanumeric String Key (AlphaNumericPkGenerator)
          • Array Character Scramble (ArrayTextMaskGenerator)
          • Array JSON Mask (ArrayJsonMaskGenerator)
          • Array Regex Mask (ArrayRegexMaskGenerator)
          • ASCII Key (AsciiPkGenerator)
          • Business Name (BusinessNameGenerator)
          • Categorical (CategoricalGenerator)
          • Character Scramble (TextMaskGenerator)
          • Character Substitution (StringMaskGenerator)
          • Company Name (CompanyNameGenerator)
          • Conditional (ConditionalGenerator)
          • Constant (ConstantGenerator)
          • Continuous (GaussianGenerator)
          • Cross Table Sum (CrossTableAggregateGenerator)
          • CSV Mask (CsvMaskGenerator)
          • Custom Categorical (CustomCategoricalGenerator)
          • Date Truncation (DateTruncationGenerator)
          • Email (EmailGenerator)
          • Event Timestamps (EventGenerator)
          • File Name (FileNameGenerator)
          • Find and Replace (FindAndReplaceGenerator)
          • FNR (FnrGenerator)
          • Geo (GeoGenerator)
          • HIPAA Address (HipaaAddressGenerator)
          • Hostname (HostnameGenerator)
          • HStore Mask (HStoreMaskGenerator)
          • HTML Mask (HtmlMaskGenerator)
          • Integer Key (IntegerPkGenerator)
          • International Address (InternationalAddressGenerator)
          • IP Address (IPAddressGenerator)
          • JSON Mask (JsonMaskGenerator)
          • MAC Address (MACAddressGenerator)
          • Mongo ObjectId Key (ObjectIdPkGenerator)
          • Name (NameGenerator)
          • Noise Generator (NoiseGenerator)
          • Null (NullGenerator)
          • Numeric String Key (NumericStringPkGenerator)
          • Passthrough (PassthroughGenerator)
          • Phone (USPhoneNumberGenerator)
          • Random Boolean (RandomBooleanGenerator)
          • Random Double (RandomDoubleGenerator)
          • Random Hash (RandomStringGenerator)
          • Random Integer (RandomIntegerGenerator)
          • Random Timestamp (RandomTimestampGenerator)
          • Random UUID (UUIDGenerator)
          • Regex Mask (RegexMaskGenerator)
          • Sequential Integer (UniqueIntegerGenerator)
          • Shipping Container (ShippingContainerGenerator)
          • SIN (SINGenerator)
          • SSN (SsnGenerator)
          • Struct Mask (StructMaskGenerator)
          • Timestamp Shift (TimestampShiftGenerator)
          • Unique Email (UniqueEmailGenerator)
          • URL (UrlGenerator)
          • UUID Key (UuidPkGenerator)
          • XML Mask (XmlMaskGenerator)
      • Configure subsetting
      • Check for and resolve schema changes
      • Run data generation jobs
      • Schedule data generation jobs
    • Example script: Starting a data generation job
    • Example script: Polling for a job status and creating a Docker package
Powered by GitBook
On this page
  • Why use consistency?
  • Types of consistency
  • Self-consistency
  • Consistency with another column
  • Enabling consistency
  • Consistency example
  • Consistency considerations
  • Consistency does not imply uniqueness
  • Consistency can reduce data privacy
  • Consistency is across an entire database
  • Enabling consistency across runs or multiple databases
  • Configuring a Structural seed value
  • Overriding the Structural seed value for a workspace
  • Generators that can only be made self-consistent
  • Generators that can be made self-consistent and to other columns

Was this helpful?

Export as PDF
  1. Configuring data generation
  2. Generator information
  3. Generator characteristics

Enabling consistency

Last updated 25 days ago

Was this helpful?

Consistency is an option for some generators that when turned on, maps the same input to the same output across an entire database.

Consistency can also be maintained across multiple databases of varying types. For example, if consistency is turned on for a name generator, it always maps the same input name (for example, Albert Einstein) to the same output (for example, Richard Feynman).

You can also view this .

Why use consistency?

The primary reasons for using consistency are to:

  • Enable joining on columns that don't have explicit database constraints in the schema. This is often seen with values such as email addresses. With consistency, you can completely anonymize an email address and still use it in a join.

  • Preserve the approximate cardinality of a column. For example, a city column contains 50 different cities. To randomize this column but still have ~50 cities, you can use consistency to maintain the approximate cardinality. Because , the cardinality might change. However, it is guaranteed to not increase. If unique 1-to-1 mappings are required, a generator should be used.

  • Match duplicated data across 1 or more databases. For example, you have a user database that contains a username in both a column and a JSON blob, and another database that contains their website activity, identified by the same username values. To anonymize the username, but still have the username be the same in all locations or databases, use consistency.

Types of consistency

Self-consistency

Self-consistency indicates that the value in the destination database is consistent with the value of the same column in the source database.

For example, a column contains a first name. You make the assigned generator self-consistent. A given first name in the source database is always replaced by the same first name in the destination database. For example, the first name value John is always replaced by the value Michael.

Consistency with another column

Consistency with another column indicates that the value in the destination database is consistent with the value of a different column in the source database.

For example, a column contains an IP address. You make the assigned generator consistent with the username column. Every row that has the username User1 in the input database has the same IP address in the destination database.

Enabling consistency

To enable consistency, on the generator configuration panel, toggle the Consistency switch.

Not all generators support consistency.

Consistency is a function of the both the data type and the value.

For example, a numeric field contains the value 123. A string/varchar field contains the value "123".

Both fields have consistent generators applied.

The output is not consistent between the two fields.

Consistency example

To demonstrate the effect of consistency on the output, we'll use a column that contains a first name, and that uses the Name generator.

Here is the sample input and output when consistency is not enabled:

In this sample data, the first name Melissa appears twice, but is mapped to Walton the first time and Linn the second time.

Here is the sample input and output when consistency is enabled:

In this case, the first name Melissa is mapped to Rosella both times.

Consistency considerations

Consistency does not imply uniqueness

A consistent generator ensures that the same input value always produces the same output value.

It does not guarantee that two different input values produce two different output values.

Consistent generators are not 1:1 mappings.

Consistency can reduce data privacy

Consistency can reduce the privacy of your data, because it reveals something about the frequency of the data values.

For example, if someone is familiar with the source data values and frequency, they might be able to connect the source and destination values. For example, they know that Jane appears 20 times and Michael appears 3 times in the source. When they see 20 instances of Susan and 3 instances of John, they might infer that Susan is mapped from Jane and John from Michael.

However, this risk does require some knowledge of the source data. Tonic Structural does not store mappings of the source data to the destination data. In other words, someone can see that in the destination data the name Susan appears 20 times and the name John appears 3 times. But without any knowledge of the source data, they cannot determine that Susan is mapped from Jane and John is mapped from Michael.

Consistency is across an entire database

Any column, regardless of which table it resides in, is consistent with any other column that uses the same consistent generator.

For example, your database includes a Customers table and an Employees table. Each table contains a column for the first name of the customer or employee. You assign the Name generator to both columns to generate a first name, and make the generators consistent. The same first name value in either column is mapped to the same destination value. For example, the first name John is always mapped to Michael, whether the name John appears in the Customers table or the Employees table.

However, by default, consistency is not guaranteed between data generation runs, even if the run is on the same database.

Enabling consistency across runs or multiple databases

By default, consistency is only guaranteed across a single data generation for a single workspace.

For example, for a column that contains a first name value, you assign the Name generator and configure the generator to be consistent. The first time you run data generation, all instances of the name John might be replaced with Michael. The next time you run data generation, all instances of the name John might instead be replaced with Gregory.

You can enable consistency across runs and workspaces so that, for example, every time you run a data generation, John is always replaced with Michael.

To do this, you configure a seed value. You can either:

  • Configure a seed value for a workspace. This ensures consistency across all data generation runs for that workspace, as well as across other workspaces that have the same seed value.

  • Disable cross-data generation consistency for a workspace. This indicates to not have consistency across data generation runs or with other workspaces.

Configuring a Structural seed value

To ensure consistency across all data generations and workspaces, add the following environment setting to the Structural worker and web server containers:

TONIC_STATISTICS_SEED: <ANY 32-BIT SIGNED INTEGER>

When you configure a value for this environment setting, then consistency is across all data generations for all workspaces that do not either:

  • Have a workspace seed value configured.

  • Have disabled consistency across data generations.

Overriding the Structural seed value for a workspace

For an individual workspace, you can override the Structural seed value. When you override the Structural seed value, you can either:

  • Disable consistency across data generation runs for the workspace.

  • Provide a seed value for the workspace.

When a workspace has a configured seed value, then consistency is across the data generation runs for that workspace.

Consistency is also across all of the data generations for all of the workspaces that have the same seed value.

In the Advanced Workspace Overrides section of the workspace details view, to override the Structural seed value:

  1. Toggle Override Statistics Seed to the on position.

  2. To disable consistency across data generations, click Don't use consistency.

  3. To provide a seed value for the workspace:

    1. Click Consistency value.

    2. In the field, enter the seed value. It must be a 32-bit signed integer. The value defaults to the current value of TONIC_STATISTICS_SEED.

Generators that can only be made self-consistent

The following generators can be made consistent to themselves. This means that the same input value in the column always produces the same output value.

Generators that can be made self-consistent and to other columns

The following generators can be made consistent either to themselves or to other columns.

When a column is consistent to another column, the output value is based on the other column.

For example, a column contains a company name. You assign the Company Name generator, and make it consistent with the username column. Every row that has the username User1 in the input database has the same company name in the destination database.

When you select a generator as the sub-generator for a , in most cases you cannot configure the generator to be consistent with another column. Only the Conditional generator and the Regex Mask generator allow a sub-generator to be consistent with another column.

Note that consistency with another column cannot be configured in a . You can only configure it when you configure an individual column.

Also, the mapping of source to destination values is not guaranteed to be unique. Both Jane and Michael could be mapped to John. In that case there would be 23 instances of John, which would not match the frequency of a specific source value. To guarantee unique values, use a .

Configure the Structural TONIC_STATISTICS_SEED. This ensures consistency across all workspaces and data generation runs.

If you also use , and configure a Textual statistics seed that matches a Structural statistics seed, then some generated values, such as telephone numbers and locations, can be also be consistent between Textual and Structural. For example, in both Textual and Structural, a source address value 123 Main Street might be replaced with 234 Oak Avenue. For more information, go to in the Textual documentation.

(Deprecated)

composite generator
generator preset
primary key generator
environment setting
Alphanumeric String Key
Array Character Scramble
ASCII Key
Character Scramble
Email
File Name
HIPAA Address
Integer Key
International Address
MAC Address
Mongo ObjectId Key
Numeric String Key
Phone
SIN
Unique Email
UUID Key
Address
Business Name
Company Name
Custom Categorical
FNR
Hostname
IP Address
Name
Noise Generator
Shipping Container
SSN
Timestamp Shift Generator
video overview of consistency
Key
consistency does not guarantee uniqueness
#enable-structural-consistency
Tonic Textual
Address Generator with Consistency Switch
Sample input and output for a first name field with consistency disabled for the Name generator
Sample input and output for a first name field with consistency enabled for the Name generator