Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
The Tonic Structural synthetic data platform combines sensitive data detection and data transformation to allow users to create safe, secure, and compliant datasets.
Common Structural use cases include creating staging and development environments and trying out a new cloud provider without complex data agreements. Structural allows you to reduce bug counts, shorten testing life cycles, and share data with partners, all while helping to ensure security and compliance with the latest regulations, from GDPR to CCPA.
You can use Structural APIs to integrate with CI/CD pipelines or to create automated processes that ensure that the generated data is available on demand.
Structural data generation workflow
Overview of the Structural steps to generate de-identified data
Structural deployment types
You can use Structural Cloud or set up a self-hosted Structural instance
Structural implementation roles
Functions that participate in a Structural implementation
Structural license plans
View the license options and their available features
Tonic Structural data generation combines sensitive data detection and data transformation to create safe, secure, and compliant datasets.
The Structural data generation workflow involves the following steps:
You can also view this video overview of the Structural data generation workflow.
To get started, you create a workspace. When you create a workspace, you identify the type of source data, such as PostgreSQL or MySQL, and establish the connections to the source database and the destination location. The source database contains the original data that you want to synthesize. The destination location is where Structural stores the synthesized data. It might be a database, a storage location, a container repository, or an Ephemeral database snapshhot.
Next, you analyze the results of the initial sensitivity scan. The sensitivity scan identifies columns that contain sensitive data. These columns need to be protected by a generator.
Based on the sensitivity scan results, you configure the data generation. The configuration includes:
Assigning table modes to tables. The table mode controls the number of rows and columns that are copied to the destination database.
Indicating column sensitivity. You can make adjustments to the initial sensitivity assignments. For example, you can mark additional columns as sensitive that the initial scan did not identify as sensitive.
Assigning and configuring column generators. To protect the data in a column, especially a sensitive column, you assign a generator to it. The generator replaces the source value with a different value in the destination database. For example, the generator might scramble the characters or assign a random value of the same type.
After you complete the configuration, you run the data generation job. The data generation job uses the configured table modes and generators to transform the data from the source database and write the transformed data to the destination location. You can track the job progress and view the job results.
A Tonic Structural implementation can involve the following roles - from those who set up the Structural environment to the consumers of the data that Structural processes.
Note that these roles are not related to role-based access (RBAC) within Structural, which is managed using .
For self-hosted instances of Structural.
Infrastructure engineers set up the Structural application and its relevant dependencies. They are typically DevOps, Site Reliability Engineering (SRE), or Kubernetes cluster administrators.
Infrastructure engineers perform the following Structural-related tasks:
Ensure that the proper infrastructure is ready for Structural installation based on the .
. Works with Tonic.ai support as needed.
Perform routine maintenance of Structural and the Structural environment. and its dependencies as needed.
Create Structural-processed data pipelines for development and testing workflows.
For both self-hosted instances of Structural and Structural Cloud.
Database administrators integrate Structural into your data architecture to support .
They ensure that source databases are available to Structural, and that Structural can write to destination databases.
perform the following Structural-related tasks:
Set up the required Structural access to source databases.
Set up destination databases for Structural to write transformed data to.
Structural users are the actual users of the Structural application.
Depending on the use case, Structural users might be compliance analysts, DevOps, or data engineers.
Tonic users perform the following Structural-related tasks:
Work with data consumers to produce usable data
Data consumers are the end users of transformed destination data.
They are typically QA testers, developers, or analysts.
Data consumers perform the following Structural-related tasks.
Validate the usability of the destination data.
Provide guidance on application-specific requirements for data.
Security and compliance ensure and validate that the data that Structural produces meets expectations, and that Structural is compliant with other security-related processes.
Security and compliance specialists perform the following Structural-related tasks:
Provide guidance on what data is sensitive.
Sign off on proposed approaches to mask sensitive data.
Approve data access and permissions.
Use the to configure the logic used to transform source data and to generate the transformed data.
You can deploy a self-hosted, on-premises instance of Tonic Structural.
For a self-hosted instance, Structural provides administrator tools that allow you to monitor Structural services and manage Structural users.
You can configure Structural environment settings to customize your instance.
On a self-hosted instance, based on your license plan, you have access to the full set of supported data connectors.
Structural Cloud is our secure hosted environment. On Structural Cloud, Tonic handles monitoring Structural services and updating Structural.
Structural Cloud does not include:
Environment setting configuration. Structural Cloud uses a single configuration.
Access to the following data connectors:
Structural Cloud also supports a pay-as-you-go plan, where free trial users can move on to set up a monthly subscription. For more information, go to Setting up and managing a Structural Cloud pay-as-you-go subscription.
Each Structural Cloud user belongs to a Structural Cloud organization, which is determined either by the user's email domain or by a workspace invitation. Structural Cloud users do not have any access to workspaces or users from other organizations.
Each free trial user is in a separate organization, along with any users that they invite to have access to a free trial workspace.
For information about Structural Cloud organizations, go to Structural organizations.
The Account Admin permission set allows a Structural Cloud user to manage organization users and workspaces. For information about granting access to the Account Admin permission set, go to Granting Account Admin access for a Structural Cloud organization.
The Tonic Structural platform creates safe, realistic datasets to use in staging environments or for local development. It includes a web application and API that can be used by engineers, data analysts, or security experts.
Structural connects to source databases that contain sensitive data such as personally identifiable information (PII) or protected health information (PHI). To protect that data, Structural transforms the sensitive values and writes the transformed data to a destination location.
New to Structural? Review the Tonic Structural workflow overview. Go to Getting started with the Structural free trial for information on how to create a Structural account and start a Structural free trial.
Want to know what's in the latest Structural releases? Go to the Tonic Structural release notes.
The Structural application heading includes a feature updates icon, which displays a summary of the newest features, including a link to the Structural release notes.
Need help with Structural? Contact support@tonic.ai.
Tonic Structural provides different license plans to accommodate organizations of different sizes who have more or less complex data architectures.
The Basic license is designed for very small organizations who have a very simple data architecture. It provides access to Structural's core de-identification and data generation features.
The Basic license allows access for a single user, with an option to purchase an additional two users.
There is no access to .
With a Basic license, you can create workspaces for one data connector type. The data connector type must be one of the following:
With a Basic license, your Structural instance can have only one Structural worker. This means that only one sensitivity scan or data generation job can run at the same time.
With a Basic license, you can create and configure workspaces, and run data generation for those workspaces.
The Basic license does NOT provide access to the following features:
Custom generators
With a Basic license, you only have access to the basic version of the Structural API.
You cannot use the basic Structural API to perform the following API tasks, which require the advanced API:
The Professional license is designed for larger organizations that have more complex data architectures. The organization might have a larger team that supports multiple databases.
The Professional license provides access to a larger set of Structural features than the Basic license.
The Professional license allows up to 10 users. You can purchase access for unlimited users as an add-on.
With a Professional license, you can create workspaces for up to two types of data connectors. You can purchase one additional data connector type as an add-on.
With a Professional license, your Structural instance can have more than one Structural worker.
This means that you can run multiple jobs from different workspaces at the same time. You can never run multiple jobs from the same workspace at the same time.
With a Professional license, you can do the following:
Create and configure workspaces, and run data generation for those workspaces
The Professional license does NOT provide access to the following features:
With a Professional license, you only have access to the basic version of the Structural API.
You cannot use the basic Structural API to perform the following API tasks, which require the advanced API:
The Enterprise license is ideal for very large organizations that have multiple teams that support very large and complex data structures, and that might have more requirements related to scale and compliance.
It provides full access to all Structural features.
An Enterprise instance does not limit the number of users.
You can use any number of any of the available data connectors.
The following features are exclusive to the Enterprise license:
The Enterprise license provides exclusive access to the advanced API.
The advanced Structural API provides access to all of the available API tasks, including the following tasks that are not available in the basic API:
The following table compares the available features for the Structural license plans.
You can use to view the current sensitivity status based on the current workspace configuration.
- Can view foreign keys from the data, but cannot add virtual foreign keys
The Professional license is also granted to .
You can use to manage your Structural users.
Those data connectors can be of any type except for and .
Use to view the current sensitivity status for your workspace configuration.
. The Professional license does not allow you to assign the built-in Viewer and Auditor permission sets.
. The comments can trigger email notifications.
Run and configure .
Use to generate a smaller destination database.
Create .
Use to add destination database records and update existing destination database records, but keep unchanged destination database records in place. The Professional license does not allow you to connect to migration scripts.
Use view to view and address both conflicting and non-conflicting changes to the source data schema.
Use to have Structural decrypt source data, encrypt destination data, or both.
Request , which are primarily developed to preserve encryption that can't be managed using Structural data encryption. You can also purchase custom generators.
The Enterprise license provides exclusive access to the and data connectors.
Feature | Basic | Professional | Enterprise |
---|
Workspaces
A workspace contains the data connections and data generation configuration.
Data connectors
Each data connector allows Structural to read from and write to a specific type of data source.
Privacy Hub
View and update the current protection status based on the sensitivity scan and workspace configuration.
Database View
Configure transformation options for tables and columns.
Generators
A generator is assigned to a column and performs a data transformation.
Subsetting
Configure a subset of source data to include in the transformed destination data.
Generate data
Run the data generation process to produce transformed destination data.
Schema changes
Review and address changes to the source data schema.
User access
Manage who has access to your instance.
Monitoring and logging
Monitor Structural services and share logs with Tonic.ai.
Updating Structural
Upgrade to the latest version of Structural.
When you go to Tonic Structural for the first time, you create an account. How you create an account depends on the type of user you are.
A new Structural user can be one of the following:
A completely new user who is starting a Structural 14-day free trial. Free trial users use Structural Cloud to explore and experiment with Structural before they decide whether to purchase it.
A new user on a self-hosted Structural instance. Self-hosted instances are installed on-premises. The customer administers the Structural users.
A new user in an existing Structural Cloud organization. New users are added to existing organizations based on their email domain.
Number of users | 1 2 additional users available as add-ons | 10 Unlimited users available as an add-on | Unlimited |
1 data connector PostgreSQL or MySQL | 2 data connectors 1 additional data connector available as an add-on Any data connector except for Oracle or Db2 for LUW | Unlimited number from any available data connector |
Manager | Manager, Editor | Manager, Editor, Auditor, Viewer |
Custom generators | Available for purchase | 2 included Additional ones available for purchase |
✓ | ✓ | ✓ |
✓ | ✓ | ✓ |
✓ | ✓ | ✓ |
✓ | ✓ | ✓ |
✓ | ✓ | ✓ |
✓ | ✓ | ✓ |
✓ | ✓ | ✓ |
✓ | ✓ | ✓ |
✓ | ✓ |
✓ | ✓ |
✓ | ✓ |
✓ | ✓ |
✓ | ✓ |
✓ | ✓ |
✓ | ✓ |
✓ | ✓ |
✓ | ✓ |
✓ | ✓ |
Concurrent jobs (more than 1 worker) | ✓ | ✓ |
✓ |
✓ |
✓ |
✓ |
✓ |
✓ |
Structural API |
From the User Settings view, you can manage settings for your individual Tonic Structural account.
To display the User Settings view:
Click your user image at the top right.
In the menu, click User Settings.
The User Settings view includes options to:
Change your Structural password (if your Structural instance does not use SSO).
You can select an image to associate with your account. The image is displayed next to your name and email address throughout Structural.
If your instance uses Google or Azure single sign-on (SSO) to manage Structural users, then by default your Structural account image is the image from the SSO.
Otherwise, the default image displays your initials.
To change your user image, click Upload, then select the image file.
Below your user image file name is the identifier of the organization that your account belongs to.
To copy the identifier, click the copy icon.
Required license: Professional or Enterprise
Structural allows users to provide comments on columns. You can do this from Privacy Hub and Database View.
From the Comment Notification Settings section of User Settings, you can configure when to receive email notifications for comments.
The available options are:
I am an owner, editor, auditor, or am being replied to This is the default option. You receive email notifications when comments are made on columns in a workspace that you are an owner, editor, or auditor for. You also receive an email notification when someone replies to a comment that you made.
I am @ mentioned You only receive an email notification if someone specifically mentions you in a comment.
Never You never receive email notifications for column comments.
Before you can use the Structural API, you must create an API token. From the User API Tokens section of the User Settings view, you can create and manage API tokens.
To create an API token:
Click Create Token.
On the Create New Token dialog, enter a name for the new token.
Click Confirm. The token is added to the list.
To copy an API token, click the copy icon for the token.
To revoke a token, click the Revoke option for the token.
If your Structural account is not managed using SSO, then from User Settings, you can change your Structural password.
If your Structural instance uses SSO to manage users, then your user credentials are managed in the SSO system. You cannot change your user password in Structural.
Under Password Change, to change your Structural password:
In the Old Password field, type your current Structural password.
In the New Password field, type your new Structural password.
In the Repeat New Password field, type your new Structural password again.
Click Confirm.
From User Settings, you can delete your Structural account. If your instance uses SSO to manage users, then deleting your account only affects your access to Structural.
You cannot delete your Structural account if you are the owner of a workspace for which other users are granted access. Before you can delete your Structural account, you must either:
To delete your Structural account, click Delete Account.
When you delete your account, you are logged out of Structural.
Use these tutorial videos to learn more about how to use Tonic Structural.
Provides an overview of the workflow to use Structural to de-identify and synthesize your data. For more information, go to .
The minimum screen width is 1120 pixels.
If the locally running database that you want to connect to runs in a Docker container:
Run: docker inspect
In the networks
section of the results, find the Gateway IP address.
Use this IP address as the server address in Structural.
If the locally running database does NOT run in a container, but runs on the machine, then:
On Windows or Mac, use host.docker.internal
On Linux, use 172.17.0.1
, which is the IP address of the docker0
interface.
If you use Structural Cloud, and your database only allows connections from allowlisted IP addresses, then you need to allowlist Structural static IP addresses.
This is not required for self-hosted instances of Structural.
For the United States-based instance (), the static IP addresses are:
54.92.217.68
52.22.13.250
The following IP addresses are used if needed for scaling or failover:
44.215.74.226
3.232.203.148
3.224.2.189
44.230.136.147
44.230.79.194
18.159.127.160
3.69.249.144
The following IP addresses are used if needed for scaling or failover:
18.159.179.95
3.120.214.225
3.75.12.1
16.16.71.42
16.170.51.237
The URL https://telemetry.tonic.ai/ is used for our Amplitude telemetry. https://telemetry.tonic.ai/logs is used specifically for log sharing.
Allowlist https://telemetry.tonic.ai/ or the following IP addresses:
75.2.74.76
99.83.246.105
Telemetry sharing is required. These metrics are valuable for us as we debug, make product roadmaps, and determine feature viability.
To support the one-click update option, Structural needs to be able to retrieve information about the latest Structural version.
Click your user image at the top right. The menu includes the Tonic version.
We recommend that you use a static copy of your production database that was restored from a backup.
If that's not possible, consider the following when you connect Structural to your source data:
Structural cannot guarantee referential integrity of the output data if the source database is written to while data is generated. For this reason we recommend that you connect to a static copy of production data.
Read replicas and fast followers can be problematic for Structural because of how long it takes some queries run. Read replicas tend to have short query timeout limits, which causes the queries to timeout. Read replicas also reflect recent writes, which means that we cannot guarantee the referential integrity of the output.
Provides an overview of what a Structural workspace is and how to create a new Structural workspace. For more information, go to .
Provides an overview of workspace owners, permissions, and permission sets. Explains how to share and transfer ownership of a workspace. For more information, go to .
Identifies the types of generators and transformations that you can use in Structural, and explains how to assign a generator to a column. For more information, go to .
Provides an overview of generator presets. Includes how to create and update them, and how to track where each generator preset is used. For more information, go to .
Provides an overview of the file connector and how to manage file groups in a file connector workspace. For more information, see .
Provides an overview of the consistency generator property and how it works. For more information, go to .
Provides an overview of subsetting, how it is configured, and how Structural uses the configuration to generate a subset. For more information, go to .
Provides an overview of upsert data generation. Includes how it works and how to enable and run it for a workspace. For more information, go to .
Provides an overview of how to write destination data to a container repository instead of a database server. For more information, go to .
For the Europe-based instance (), the static IP addresses are:
No customer data is included. For more information about the specific telemetry data that we collect, go to .
For more information on how to verify that telemetry is shared, go to .
For more information, go to .
For details about the types of data that Tonic.ai does and does not collect, go to .
Every workspace includes the following settings to identify the workspace and to select the type of data connector.
All workspaces have the following fields that identify the workspace:
In the Workspace name field, enter the name of the workspace.
In the Workspace description field, provide a brief description of the workspace. The description can contain up to 200 characters.
In the Tags field, provide a comma-separated list of tags to assign to the workspace. For more information on managing tags, go to Assigning tags to a workspace.
Under Connection Type, select the type of data connector to use for the workspace data. You cannot change the connection type on a child workspace.
The Basic and Professional licenses limit the number and type of data connectors you can use.
A Basic instance can only use one data connector type, which can be either PostgreSQL or MySQL. After you create your first workspace, any subsequent workspaces must use the same data connector type.
A Professional instance can use up two different data connector types, which can be any type other than Oracle or Db2 for LUW. After you create workspaces that use two different data connector types, any subsequent workspaces must use one of those data connector types.
If you don't see the database that you want to connect to, or you want to have different database types for your source and destination database, contact support@tonic.ai.
When you select a connector type, Structural updates the view to display the connection fields used for that connector type. The specific fields vary based on the connector type.
If you are a user who wants to set up an account in an existing Tonic Structural Cloud or self-hosted organization, go to Creating a new account in an existing organization.
The Structural 14-day free trial allows you to explore and experiment in Structural Cloud before you decide whether to purchase Structural.
When you sign up for a free trial, Structural automatically creates a sample workspace for you to use. You can also create a workspace that uses your own database or files.
The free trial provides tools to introduce you to Structural and to guide you through configuring and completing a data generation.
Structural tracks and displays the amount of time remaining in your free trial. You can request a demonstration and contact support.
When the free trial period ends, you can continue to use Structural to configure workspaces. You can no longer generate data or train models. Contact Tonic.ai to discuss purchasing a Structural license, or select the option to start a Structural Cloud pay-as-you-go subscription.
To start a new free trial of Structural:
Go to app.tonic.ai.
Click Create Account.
On the Create your account dialog, to create an account, either:
To use a corporate Google email address to create the account, click Create account using Google.
To create a new Structural account, enter your email address, create and confirm a Structural password, then click Create Account. You cannot use a public email address for a free trial account.
Structural sends an activation link to your email address.
After you activate your account and log in, Structural next prompts you to select the use case that best matches why you are exploring Structural. If none of the provided use cases fits, use the Other option to tell us about your use case.
After you select a use case, click Next. The Create Your Workspace panel displays.
When you sign up for a free trial, Structural automatically creates a sample PostgreSQL workspace that you can use to explore how to configure and run data generation.
You can also choose to create a workspace that uses your own data, either from local files or from a database.
On the Create your workspace panel:
To use the sample workspace, click Use a sample workspace, then click Next. Structural displays Privacy Hub, which summarizes the protection status for the source data. It also displays the Getting Started Guide panel and the quick start checklist.
To create a workspace that uses local files as the source data, click Upload Files, then click Next. Go to #uploading-files.
To create a new workspace that uses your own data, click Bring your own data, then click Next. Go to #connecting-to-a-database.
The Upload files option creates a local files file connector workspace. The source data consists of groups of files selected from a local file system. The files in a file group must have the same type and structure. Each file group becomes a "table" in the source data.
For other workspaces that you create during the free trial, you can also create a file connector workspace that uses files from cloud storage ( Amazon S3 or Google Cloud Storage).
After you select Upload files and click Next, you are prompted to provide a name for the workspace.
In the field provided, enter the name to use for the workspace, then click Next.
Structural displays the File Groups view, where you can set up the file groups for the workspace.
It also displays the Getting Started Guide panel with links to resources to help you get started.
After you create at least one file group, you can start to use the other Structural features and functions.
If you choose to create a workspace with your own data, then the first step is to provide a name for the workspace.
In the field provided, enter the name to use for your first workspace, then click Next.
The Invite others to Tonic panel displays.
Under Invite others to Tonic, you can optionally invite other users with the same corporate email domain to start their own Structural free trial. The users that you invite are able to view and edit your workspace.
For example, you might want to invite other users if you don't have access to the connection information for the source data. You can invite a user who does have access. They can then update the workspace configuration to add the connection details.
To continue without inviting other users, click Skip this step.
To invite users:
For each user to invite, enter the email address, then press Enter. The email addresses must have the same corporate email domain as your email address.
After you create the list of users to invite, click Next.
The Add source data connection view displays.
The final step in the workspace creation is to provide the source data to use for your workspace.
Structural provides data connectors that allow you to connect to an existing database. Each data connector allows you to connect to a specific type of database. Structural supports several types of application databases, data warehouses, and Spark data solutions.
For the first workspace that you create using the free trial wizard, you can choose:
For subsequent workspaces that you create from Workspaces view, you can also choose Databricks and Salesforce.
To connect to an existing database, on the Add source data connection panel, click the data connector to use, then click Add connection details.
The panel also includes a Local files option, which creates a local files file connector workspace, the same as the Upload files option.
Use the connection details fields to provide the connection information for your source data. The specific fields depend on the type of data connector that you select.
After you provide the connection details, to test the connection, click Test Connection.
To save your workspace, click Save.
Structural displays Privacy Hub, which summarizes the protection status for the source data.
It also displays the Getting Started Guide panel with links to resources to help you get started.
The Structural free trial includes a couple of resources to introduce you to Structural and to guide you through the tasks for your first data generation.
The Getting Started Guide panel provides access to Structural information and support resources.
The Getting Started Guide panel displays automatically when you first start the free trial. To display the Getting Started Guide panel manually, in the Structural heading, click Getting Started.
The Getting Started Guide panel provides links to Structural instructional videos and this Structural documentation. It also contains links to request a Structural demo, contact Tonic.ai support, and purchase a Structural Cloud pay-as-you-go subscription.
For each free trial workspace, Structural provides access to a workspace checklist.
The checklist displays at the bottom left of the workspace management view. It displays automatically when you display the workspace management view. To hide the checklist, click the minimize icon. To display the checklist again, click the checklist icon.
The checklist provides a basic list of tasks to perform in order to complete a Structural data generation.
Each checklist task is linked to the Structural location where you can complete that task. Structural automatically detects and marks when a task is completed.
The checklist tasks are slightly different based on the type of workspace.
For workspaces that are connected to a database, including the sample PostgreSQL workspace and workspaces that you connect to your own data, the checklist contains:
Connect a source database - Set the connection to the source database. In most cases, you set the source connection when you create the workspace. When you click this step, Structural navigates to the Source Settings section of the workspace details view.
Connect to destination database - Set the location where Structural writes the transformed data. When you click this step, Structural navigates to the Destination Settings section of the workspace details view.
Apply generators to modify dataset - Configure how Structural transforms at least one column in the source data. When you click this step:
If there are available generator recommendations, then Structural navigates to Privacy Hub and displays the generator recommendations panel.
If there are no available generator recommendations, then Structural navigates to Database View.
Generate data - Run the data generation to produce the destination data. When you click this item, Structural navigates to the Confirm Generation panel.
For workspaces that use data from local files, the checklist contains:
Create a file group - Create a file group with files that you upload from a local file system. Each file group becomes a table in the workspace. When you click this step, Structural navigates to the File Groups view for the workspace.
Apply generators to modify dataset - Configure how Structural transforms at least one column in the source files. When you click this step:
If there are available generator recommendations, then Structural navigates to Privacy Hub and displays the generator recommendations panel.
If there are no available generator recommendations, then Structural navigates to Database View.
Generate data - Run the data generation to produce transformed versions of the source files. When you click this step, Structural navigates to the Confirm Generation panel.
Download your dataset - Download the transformed files from the Structural application database.
For workspaces that use data from files in cloud storage (Amazon S3 or Google Cloud Storage), the checklist contains:
Configure output location - Configure the cloud storage location where Structural writes the transformed files. When you click this step, Structural navigates to the Output location section of the workspace details view.
Create a file group - Create a file group that contains files selected from cloud storage. When you click this step, Structural navigates to the File Groups view for the workspace.
Apply generators to modify dataset - Configure how Structural transforms at least one column in the source data. When you click this step:
If there are available generator recommendations, then Structural navigates to Privacy Hub and displays the generator recommendations panel.
If there are no available generator recommendations, then Structural navigates to Database View.
Generate data - Run the data generation to produce transformed versions of the source files. When you click this step, Structural navigates to the Confirm Generation panel.
In addition to the workspace checklists, Structural uses next step hints to help guide you through the workspace configuration and data generation.
When a next step hint is available, it displays as an animated marker next to the suggested next action.
When you hover over the highlighted action, Structural displays a help text popup that explains the recommended action.
When you click the highlighted action, the hint is removed, and the next hint is displayed.
For a file connector workspace, to identify the source data, you create file groups. A file group is a set of files of the same type and with the same structure. Each file group becomes a table in the workspace. For CSV files, each column becomes a table column. For XML and JSON file groups, the table contains a single XML or JSON column.
On the File Groups view, click Create File Group.
For a file connector workspace that uses local files, you can either drag and drop files from your local file system to the file group, or you can search for and select files to add. For more information, go to #adding-files-from-a-local-file-system.
For a file connector workspace that uses cloud storage, you select the files to include in the file group. For more information, go to #adding-files-from-amazon-s3-or-gcs.
For files that contain CSV content, you also configure the delimiters and other file settings. For more information, go to #configuring-delimiters-and-file-settings-for-.csv-files.
To get value out of the data generation process, you assign generators to the data columns.
A generator indicates how to transform the data in a column. For example, for a column that contains a name value, you might assign the Name generator, which indicates how to generate a replacement name in the generation output.
For sensitive columns that Structural detects, Structural can also provide a recommended generator configuration.
When there are recommendations available, Privacy Hub displays a link to review all of the recommendations.
The Recommended Generators by Sensitivity Type panel displays a list of sensitive columns that Structural detected, along with the suggested generators to apply.
After reviewing, to apply all of the suggested generators, click Apply All. For more information about using this panel, go to Reviewing and applying recommended generators.
You can also choose to apply an individual generator manually. You can do this from Privacy Hub, Database View, or Table View.
To display Database View, on the workspace management view, click Database View.
On Database View, in the column list, the Applied Generator column lists the currently assigned generator for each column. For a new workspace, the columns are all assigned the Passthrough generator. The Passthrough generator simply passes the source value through to the destination data without masking it.
Click a column that is marked as Passthrough, and that is not marked as sensitive. For example, in the sample workspace, the customers.Last_Transaction
column. The column configuration panel displays. To select a generator, click the generator dropdown. The list contains generators that can be assigned to the column based on the column data type. For customers.Last_Transaction
, the Timestamp Shift generator is a good option.
For Passthrough columns that Structural identified as containing sensitive data, the column includes an icon to indicate that there is a recommended generator.
In Database View, click one of those columns. For example, in the sample workspace, the customers.email
column is marked as containing an email address.
For customers.Email
, click the Email label. Instead of the column configuration panel, you see a panel that indicates the recommended generator. For customers.Email
, the recommended generator is Email. To assign the Email generator, click Apply. The column configuration panel displays with the generator assigned.
To run a data generation, Structural must have a destination for the transformed data.
For a local files workspace, Structural saves the transformed files to the application database.
For workspaces that use data from a database, and for workspaces that use cloud storage files, you configure where Structural writes the output data.
The destination location for data generation output can be one of the following:
If the data connector supports Tonic Ephemeral, then the default option is to write the output data to Ephemeral.
For database-based data connectors, you can write the transformed data to a destination database.
For some Structural data connectors, Structural can write the transformed data to a data volume in a container repository.
For file connector workspaces that transform files from cloud storage (Amazon S3 or Google Cloud Storage), you configure the cloud storage location where Structural writes the transformed files.
To display the destination configuration for the workspace:
Click the Settings tab.
Scroll to the Destination Settings section or, for a file connector workspace that uses cloud storage files, scroll to the Output location section.
For data connectors that Ephemeral supports, the default option is to write the output to Ephemeral.
For the Ephemeral option, the default configuration is:
Structural writes the output to Ephemeral Cloud. If you do not have an Ephemeral Cloud account, then we create an Ephemeral free trial account for you. If your organization has a self-hosted Ephemeral instance, then you can choose to write the output to that instance. Note that all workspaces in the same organization or for the same self-hosted Structural instance must use the same Ephemeral instance.
Structural uses the output data to create an Ephemeral user snapshot. You can use the user snapshot to create Ephemeral databases.
When Structural creates the user snapshot in Ephemeral, it creates a temporary Ephemeral database to use as the basis for the user snapshot. There is an option to keep that temporary database. For a free trial workspace, this option is enabled by default. The database expires after 48 hours.
To write the data to a destination database, click Database Server. Structural displays the configuration fields for the destination database.
For information on how to configure the destination information for a specific data connector, go to the workspace configuration information for that data connector. The data connector summary contains a list of the available data connectors, and provides a link to the documentation for each data connector.
To write the data to a data volume in a container repository, click Container Repository. Structural displays the configuration fields to select a base image and provide the details about the repository.
For more information, go to Writing output to a container repository.
For a file connector workspace that uses files from cloud storage (Amazon S3 or Google Cloud Storage), you configure the cloud storage output location where Structural writes the transformed files. The configuration includes the required credentials to use.
For more information, go to Configuring the file connector storage type and output options.
After you complete the workspace and generator configuration, you can run your first data generation.
The data generation process uses the assigned generators to transform the source data. It writes the transformed data to the configured destination location.
For a local files workspace, it writes the files to the Structural application database.
The Generate Data option is at the top right of the Tonic heading.
When you click Generate Data, Structural displays the Confirm Generation panel.
The Confirm Generation panel provides access to the current destination configuration, along with other advanced generation options such as subsetting and upsert. It also indicates if there are any issues that prevent you from starting the data generation. For example, if the workspace does not have a configured destination, then Structural cannot run the data generation.
To start the data generation, click Run Generation. For more information about running data generation, go to Running data generation jobs.
For a new Tonic Ephemeral account, the first time that you run data generation, you also receive an activation email message for the account.
To view the job status and details:
Click Jobs.
In the list, click the data generation job.
For a data generation that writes the output to an Ephemeral database, the Data Available in Tonic Ephemeral panel provides access to the database connection information.
To display the connection details, click Connecting to your database.
The connection details include:
The database location and credentials. Each field contains a copy icon to allow you to copy the value.
SSH tunnel information, including instructions on how to create an SSH tunnel from your local machine to the Ephemeral database.
The first time that you complete all of the steps in a checklist, Structural displays a panel with options to chat with our sales team, schedule a demo, or purchase a subscription.
You can also continue to get to know Structural and experiment with other Structural features such as subsetting or using composite generators to mask more complex values such as JSON or XML.
If your free trial has expired, to get an extension, you can reach out to us using either the in-app chat or an email message.
When you create a new workspace, you can either:
Create a copy of an existing workspace. The copy initially uses the configuration from the original workspace. After the copy is created, it is completely independent from the original workspace.
Create a child of an existing workspace. Child workspaces inherit configuration from the parent workspace. They continue to be updated automatically when the parent workspace is updated. For more information, go to About workspace inheritance.
You can also view this video overview of how to create a workspace.
Required global permission: Create workspaces
To create a completely new workspace, on Workspaces view, click Create Workspace > New Workspace.
Required workspace permission: Copy workspace (in the workspace to copy)
Or
Required global permission: Copy any workspace
To create a workspace based on an existing workspace, either:
On the workspace management view of the workspace to copy, from the workspace actions menu, select Duplicate Workspace.
On Workspaces view, click the actions menu for the workspace, then select Duplicate Workspace.
When you create a copy of a workspace, the copy initially inherits the following workspace configuration:
Source and destination database connections
Sensitivity designations, including manual designations that override the sensitivity scan results
Table mode assignments
Generator configuration
Subsetting configuration
Post-job scripts
Required license: Enterprise
Required workspace permission: Create child workspaces (in the parent workspace)
You can create a workspace that is a child of an existing workspace. You cannot create a child workspace of another child workspace.
The parent workspace must have a source database configured. You cannot create a child workspace from a workspace that uses the Databricks, Spark (Amazon EMR or self-managed Spark cluster), or MongoDB data connector.
To create a child workspace, either:
On Workspaces view:
Click Create Workspace > Child Workspace.
Click the actions menu for the parent workspace, then select Create Child Workspace.
On the workspace management view, from the workspace actions menu, select Create Child Workspace.
On the New Workspace view, under Child Workspace, Parent Workspace identifies the parent workspace.
If you used the Create Workspace > Child Workspace option to create the child workspace, then Parent Workspace is not populated. From the Parent Workspace dropdown list, select the parent workspace for the new child workspace.
If you selected the child workspace option for a specific workspace, then Parent Workspace is set to that workspace.
If you originally chose to create a completely new workspace, then on the New Workspace view:
To change to a child workspace, select Create Child Workspace from the Create a child workspace panel at the right. Structural adds the Child Workspace panel to the New Workspace view.
From the Parent Workspace dropdown list, select the parent workspace for the new child workspace.
Required workspace permission: Configure workspace settings
To edit the configuration for an existing workspace, either:
On the workspace management view:
On the workspace navigation bar, click Settings.
From the workspace actions menu, select Settings.
On Workspaces view, click the actions menu for the workspace, then select Settings.
Required workspace permission: Delete workspace
You can delete workspaces that you no longer need.
You cannot delete a parent workspace. You must first delete all of its child workspaces.
To delete a workspace:
On the workspace management view, from the workspace actions menu, select Delete Workspace.
On the Workspaces view, click the actions menu for the workspace, then select Delete.
Workspaces view lists the workspaces that you have access to. To display Workspaces view, in the Tonic Structural heading, click Workspaces.
The workspace list contains:
Workspaces that you own
Workspaces that you are granted access to
If you have the global permission Copy any workspace or Manage user access to Tonic and to any workspace, then you see the complete list of workspaces.
The Permissions column lists the workspace permission sets that you are granted in each workspace. The permission sets include both permission sets that were granted to you directly as a user, and permission sets that were granted to an SSO group that you are a member of.
Child workspaces always display under their parent workspace. You can only see child workspaces that you have access to. If you have access to a child workspace, but not to its parent workspace, then the parent workspace is grayed out. You cannot select it.
You can filter the workspaces based on the following information:
Name - In the filter field, begin to type text that is in the name of the workspaces to display in the list.
Owner - From the Filter by Owner dropdown list, select the owner of the workspaces to display in the list.
Database type - From the Filter by Database Type dropdown list, select the type of database for the workspaces to display in the list.
Generation status - In the Generation Status column heading, click the filter icon. Check the checkbox next to the generation status values for the workspaces to display in the list.
Tags - In the Tags column heading, click the filter icon. By default, the workspaces are not filtered by tag, and all of the checkboxes are unchecked. To only include workspaces that have specific tags, check the checkbox next to each tag to include. To uncheck all of the selected tags, click Reset Tags. When you filter by tag, Structural checks whether each workspace contains any of the selected tags.
Permissions - In the Permissions column heading, click the filter icon. You can check and uncheck checkboxes to include or exclude specific permission sets. For example, you can filter the list to only display workspaces for which the Editor permission set is granted either to you or to an SSO group that you belong to. For users that have the global permission Copy any workspace, the Permissions filter panel also contains an Any permissions checkbox. By default, Any permissions is unchecked, and the list includes workspaces for which you are not assigned any workspace permission sets. To display all of the workspaces for which you have any assigned workspace permission sets, check Any permissions. If you filter the list based on a specific permission set, to clear the filter and show all workspaces for which you have any permission set, check Any permissions. To display all workspaces, including workspaces that you do not have any permissions for, uncheck Any permissions.
You can combine different filters. For example, you can filter the list to only include workspaces that use PostgreSQL and for which the generation status is Canceled or Failed.
Child workspaces always display under their parent workspace, even if the parent workspace does not match the filter.
You can sort the workspace list by name, status, or owner.
By default, the list is sorted alphabetically by name.
To sort by a column, click the column heading. To reverse the order of the sort, click the column heading again.
Child workspaces always display under their parent workspace. The child workspaces are sorted within the parent.
Workspaces view provides the following information about each workspace:
Name - Contains the name and database type for the workspace. To view the workspace description, hover over the name.
Generation status - The status for the most recent generation job. To display the job details for the job, click the job status. To display more details about the date, time, and duration for the job, hover over the generation timestamp. If a job failed recently, you are given additional information about how long this job has been failing (the date of the first failure occurrence among a continuous series of failures).
Schema changes - Indicates whether Structural detected changes to the source database schema. If there are changes, the column shows the number of changes. Hover over the column value to display additional details, and to navigate to the Schema Changes page. See Viewing and resolving schema changes.
Tags - The tags that are assigned to the workspace.
Permissions - The permission sets that are assigned to you for the workspace.
Owner - The name and email address of the workspace owner.
On Workspaces view, when you click the workspace name, the workspace management view for the workspace is displayed. The Privacy Hub tab is selected.
The Name column also provides access to a menu of workspace configuration options. When you select an option, the workspace management view is displayed, open to the view for the selected option.
The last column in the workspaces list provides additional workspace options:
Subsetting icon - Displays the subsetting configuration for the workspace. See Viewing the current subsetting configuration.
Post-job actions icon - Displays the post-job actions for the workspace. For more information, go to Post-job scripts and Webhooks.
Actions menu - Provides access to additional options.
The Actions menu at the top left of the workspaces list allows you to to perform bulk actions on multiple workspaces. It is enabled when you check one or more of the checkboxes in the first column of each row. The Actions menu provides options for the selected workspaces.
A Tonic Structural workspace provides a context within which to configure and generate transformed data.
A workspace represents a path between the source data and the transformed output data. For example, postgres-prod-copy
to postgres-staging
.
A workspace includes:
Where to find the source data to transform during data generation
Where to write the transformed data
The rules for the transformation
Most workspaces that connect to a database have a Block data generation if schema changes detected toggle. The setting is usually in the Source Settings section.
By default, the option is turned off. When the option is off, Structural only blocks data generation when there are conflicting schema changes. Structural does not block data generation when there are non-conflicting schema changes.
If this option is turned on, then if Structural detects any changes at all to the schema, then data generation is blocked until you resolve the schema changes. For more information, go to .
For generators where consistency is enabled, a statistics seed enables consistency across data generation runs. The Structural-wide statistics seed value ensures consistency across both data generation runs and workspaces.
You use the Override Statistics Seed setting to override the Structural-wide statistics seed value. For workspaces that connect to a database, the setting is under Destination Settings. For a file connector workspace, the setting is under Output Location.
You can either disable consistency across data generations, or provide a seed value for the workspace. The workspace seed value ensures consistency across data generation runs for that workspace, and across other workspaces that have the same seed value.
For details about using seed values to ensure consistency across data generation runs and databases, go to .
After you select the connector type, you configure:
Where to find the source data
Where to write the data generation output
For data connectors that connect to a database, the Source Settings section provides connection information for the source database.
You cannot change the source data configuration for a .
For information about the source connection fields for a specific data connector, go to the workspace configuration topic for that .
For data connectors that support upsert, the workspace configuration includes an Upsert section to allow you to enable and configure upsert. Upsert adds and updates rows in the destination database, but keeps all other existing rows intact.
If you enable upsert, then you cannot write output to an Ephemeral database or to a container repository. You must write the output to a destination database.
For more information, go to .
For data connectors that connect to a database, the Destination Settings section provides information about where and how Structural writes the output data from data generation.
Depending on the data connector type, you might be able to write to either:
Destination database - Writes the output data to a destination database on a database server.
Ephemeral snapshot - Writes the output data to a Tonic Ephemeral user snapshot.
Container repository - Writes the output data to a data volume in a container repository.
When you write the output to a destination database, the destination database must be of the same type as the source database.
Structural does not create the destination database. It must exist before you generate data.
If available, the Copy Settings from Source allows you to copy the source connection details to the destination database, if both databases are in the same location. Structural does not copy the connection password.
If Ephemeral supports your workspace database type, then you can write the destination data to a snapshot in Ephemeral. For data larger than 10 GB, this option is recommended instead of writing to a container repository.
From Ephemeral, you can use the snapshot to start new Ephemeral databases.
Some data connectors allow you to write the transformed data to a data volume in a container repository instead of to a database server.
You can use the resulting data volume to create a database in Tonic Ephemeral. If you do plan to use the data to start an Ephemeral database, and the size of the data is larger than 10 GB, then the recommendation is to write the data to an Ephemeral user snapshot instead.
When you provide connection details for a database server, Structural provides a Test Connection button to test the connection, and verify that Structural can use the connection details to connect to the database. Structural uses the connection details to try to reach the database, and indicates whether it succeeded or failed. We strongly recommend that you test the connections.
For file connector workspaces, the File Location section indicates where the source files are obtained from - either a local file system or a cloud storage solution (Amazon S3 or Google Cloud Storage).
When the files come from cloud storage, the Output Location section indicates where to write the transformed files. You must also provide the cloud storage connection credentials.
Only available for PostgreSQL, MySQL, SQL Server, and Oracle.
Not compatible with upsert.
Not compatible with Preserve Destination or Incremental table modes.
Tonic Ephemeral is a separate Tonic.ai product that allows you to create temporary databases to use for testing and demos. For more information about Ephemeral, go to the .
If Ephemeral supports your workspace database type, then you can write the destination data to a snapshot in Ephemeral. You can then use the snapshot to start Ephemeral databases.
To write the transformed data to Ephemeral, under Destination Settings, click Ephemeral Database.
Structural can write the data snapshot to either Ephemeral Cloud or to a self-hosted instance of Ephemeral. By default, Structural writes the data snapshot to Ephemeral Cloud.
All workspaces on the same self-hosted Structural instance or in the same Structural Cloud organization must write to the same instance of Ephemeral. When you change the Ephemeral output configuration in one workspace, it is automatically changed in other workspaces that write to Ephemeral.
For Ephemeral Cloud, Structural writes the snapshot to the account for the user who runs the data generation job. If that user has an Ephemeral account on Ephemeral Cloud, then Structural uses that account. If the user does not have an account, then Structural creates a two-week Ephemeral free trial account for the user.
Note that if you are on a self-hosted instance of Ephemeral, then you must always provide an Ephemeral API key.
To write a snapshot to Ephemeral Cloud:
Click Tonic Ephemeral cloud.
If you are on a self-hosted instance of Structural:
In the API Key field, provide an Ephemeral API key from your Ephemeral account.
To test the connection, click Test Connection.
To write the snapshot to a self-hosted instance of Ephemeral:
Click Tonic Ephemeral self-hosted.
In the API Key field, provide an Ephemeral API key from your Ephemeral account. Structural writes the snapshot to the Ephemeral account that is associated with the API key.
In the Tonic Ephemeral URL field, provide the URL to your self-hosted Ephemeral instance.
To test the connection, click Test Connection.
For Oracle, you select the base image to use when it creates the data snapshot.
If you are writing to Ephemeral Cloud, then you must use the Oracle 23c base image that comes with Ephemeral. This image has the following limitations:
A maximum of 12GB of user data
A maximum of 2CPU cores and 2GB of RAM
If you are writing to a self-hosted instance of Ephemeral, then you can also select a custom image that you created in Ephemeral.
If you do not configure any advanced settings, then:
The snapshot uses the same name as the workspace, and has no description.
The snapshot size allocation is determined by the source data size.
Structural discards the temporary Ephemeral database that is created during the data generation.
To change any of these settings, click Advanced settings.
By default, the snapshot name uses the workspace name.
When you run data generation, if a snapshot with the same name already exists in Ephemeral, then Structural overwrites that snapshot with the new snapshot.
Under Advanced settings:
In the Snapshot name field, provide the name of the snapshot. The snapshot name can use the following placeholder values to help identify the snapshot:
{workspaceName}
- Inserts the name of the workspace.
{workspaceId}
- Inserts the identifier of the workspace.
{jobId}
- Inserts the identifier of the data generation job that created the snapshot.
{timestamp}
- Inserts the timestamp when the snapshot was created.
Including the job ID or timestamp ensures that a data generation job does not overwrite a previous snapshot.
Optionally, in the Snapshot description field, provide a longer description of the snapshot.
By default, the resources used for the snapshot are based on the size of the source data.
For source data that is 25 GB or less, Nano is used.
For source data larger than 25 GB, Micro is used.
To select a specific option:
Toggle Custom pod resources to the on position.
From the dropdown list, select the option to use for the combination of vCPUs and memory:
Nano - 0.125 vCPU with 0.5 GB RAM
Micro - 0.5 vCPU with 2 GB RAM
Small - 1 vCPU with 4 GB RAM
Medium - 2 vCPU with 8 GB RAM
Large - 4 vCPU with 16 GB RAM
By default, the Ephemeral size allocation for the snapshot is based on the size of the source data.
To instead provide a custom data size allocation, under Advanced settings:
Toggle Custom data size allocation to the on position.
In the field, enter the size allocation in gigabytes.
When Structural creates the Ephemeral snapshot, it creates a temporary Ephemeral database.
By default, Structural deletes that database when the data generation is complete.
To instead keep the database, under Advanced settings, toggle Keep database active in Tonic Ephemeral after data generation to the on position.
For a MySQL or PostgreSQL workspace, you can provide a customization file that helps to ensure that the temporary Ephemeral database is configured correctly.
To provide the customization details:
Toggle Use custom configuration to the on position.
In the text area, paste the contents of the customization file.
You use the workspace management view to configure and run data generation for an individual workspace.
When you log in to Tonic Structural, it displays the workspace management view for the workspace that was selected when you logged out.
The workspace management view includes the following components.
The top left of the workspace management view provides information about the workspace, including:
The workspace name
When the workspace was last updated
The user who last updated the workspace
The top right of the workspace management view provides general options for working with the workspace, including:
Undo and redo options for configuration changes
The workspace download menu to:
Download sensitivity scan and privacy reports
The workspace actions menu
The workspace navigation bar provides access to workspace configuration options.
To display the workspace management view for a workspace:
On Workspaces view, in the Name column either:
Click the workspace name. The workspace management view opens to Privacy Hub.
Click the dropdown icon, then select a workspace management option.
Click the search field at the top. A list of available type the name of the workspace. As you type, Tonic displays a list of matching workspaces. In the list, click the workspace name.
To reduce the amount of vertical space used by the heading of the workspace management view, you can collapse it.
To collapse the heading, click the collapse icon in the Structural heading.
When you collapse the workspace management heading:
The workspace information is hidden. The workspace name is displayed in the search field.
The workspace options are moved up into the Structural heading.
The workspace navigation bar remains visible.
When you collapse the heading, the collapse icon changes to an expand icon. To restore the full heading, click the expand icon.
Required license: Professional or Enterprise
Not compatible with writing output to a container repository or a Tonic Ephemeral snapshot.
By default, Tonic Structural data generation replaces the existing destination database with the transformed data from the current job.
Upsert adds and updates rows in the destination database, but keeps all of the other existing rows intact. For example, you might have a standard set of test records that you do not want to replace every time you generate data in Structural.
If you enable upsert, then you cannot write the destination data to a container repository or to a Tonic Ephemeral snapshot. You must write the data to a database server.
Upsert is currently only supported for the following data connectors:
MySQL
Oracle
PostgreSQL
SQL Server
For an overview of upsert, you can also view the .
When upsert is enabled, the data generation job writes the generated data to an intermediate database. Structural then runs the upsert job to write the new and updated records to the destination database.
The destination database must already exist. Structural cannot run an upsert job to an empty destination database.
The upsert job adds and updates records based on the primary keys.
If the primary key for a record already exists in the destination database, the upsert job updates the record.
If the primary key for a record does not exist in the destination database, the upsert job inserts a new row.
To only update or insert records that Structural creates based on source records, and ignore other records that are already in the destination database, ensure that the primary keys for each set of records operate on different ranges. For example, allocate the integer range 1-1000 for existing destination database records that you add manually. Then ensure that the source database records, and by extension the records that Structural creates during data generation, use a different range.
Also note that when upsert is enabled, the Truncate table mode does not actually truncate the destination table. Instead, it works more like Preserve Destination table mode, which preserves existing records in the destination table.
To enable upsert, in the Upsert section of the workspace details, toggle Enable Upsert to the on position.
When you enable upsert for a workspace, you are prompted to configure the upsert processing and provide the connection details for the intermediate database.
When you enable upsert, Structural displays the following settings to configure the upsert process.
Required license: Enterprise
The intermediate database must have the same schema as the destination database. If the schemas do not match, then the upsert process fails.
To ensure that schema changes are automatically reflected in the intermediate database, you can connect the workspace to your own database migration script or tool. Structural then runs the migration script or tool whenever you run upsert data generation.
When you start an upsert data generation job:
If migration is enabled, Structural calls the endpoint to start the migration.
Structural cannot start the upsert data generation until the migration completes successfully. It regularly calls the status check endpoint to check whether the migration is complete.
When the migration is complete, Structural starts the upsert data generation.
Required. Structural calls this endpoint to start the migration process specified by the provided URL.
The request includes:
Any custom parameter values that you add.
The connection information for the intermediate database.
The request uses the following format:
The response contains the identifier of the migration task.
The response uses the following format:
Required. Structural calls this endpoint to check the current status of the migration process.
The request includes the task identifier that was returned when the migration process started. The request URL must be able to pass the request identifier as either a path or query parameter.
The response provides the current status of the migration task. The possible status values are:
Unknown
Queued
Running
Canceled
Completed
Failed
The response uses the following format:
Optional. Structural calls this endpoint to retrieve the log entries for the migration process. It adds the migration logs to the upsert logs.
The request includes the task identifier that was returned when the migration process started. The request URL must be able to pass the request identifier as either a path or query parameter
The response body of the request should be 'text/plain'.
It contains the raw logs.
Optional. Structural calls this endpoint to cancel the migration process.
The request includes the task identifier that was returned when the migration process started. The request URL must be able to pass the request identifier as either a path or query parameter.
To enable the migration process, toggle Enable Migration Service to the on position.
When you enable the migration process, you must configure the POST Start Schema Changes
and GET Status of Schema Change
endpoints.
You can optionally configure the GET Schema Change Logs
and DELETE Cancel Schema Changes
endpoints.
To configure the endpoints:
To configure the POST Start Schema Changes
endpoint:
In the URL field, provide the URL of the migration script.
Optionally, in the Parameters field, provide any additional parameter values that your migration scripts need.
To configure the GET Status of Schema Change
endpoint, in the URL field, provide the URL for the status check.
The URL must include an {id}
placeholder. This is used to pass the identifier that is returned from the Start Schema Changes
endpoint.
To configure the GET Schema Change Logs
endpoint, in the URL field, provide the URL to use to retrieve the logs.
The URL must include an {id}
placeholder. This is used to pass the identifier that is returned from the Start Schema Changes
endpoint.
To configure the DELETE Cancel Schema Changes
endpoint, in the URL field, provide the URL to use for the cancellation.
The URL must include an {id}
placeholder. This is used to pass the identifier that is returned from the Start Schema Changes
endpoint.
When you enable upsert, you must provide the connection information for the intermediate database.
For details, go to the workspace configuration information for the data connector.
Requires Kubernetes.
For self-hosted Docker deployments, you can install and configure a separate Kubernetes cluster to use. For more information, go to .
For information about required Kubernetes permissions, go to .
Not compatible with upsert.
Not compatible with Preserve Destination or Incremental table modes.
Only supported for PostgreSQL, MySQL, and SQL Server.
You can configure a workspace to write destination data to a container repository instead of to a database server.
Under Destination Settings, to indicate to write the destination data to container artifacts, click Container Repository.
You can switch between writing to a database server and writing to a container repository at any time. Structural preserves the configuration details for both options. When you run data generation, it uses the currently selected option for the workspace.
From the Database Image dropdown list, select the image to use to create the container artifacts.
Select an image version that is compatible with the version of the database that is used in the workspace.
For a MySQL workspace, you can provide a customization file that helps to ensure that the temporary destination database is configured correctly.
To provide the customization details:
Toggle Use customization to the on position.
In the text area, paste the contents of the customization file.
To provide the location where Structural publishes the container artifacts:
In the Registry field, type the path to the container registry where Structural publishes the data volume.
In the Repository Path field, provide the path within the registry where Structural publishes the data volume.
You next provide the credentials that Structural uses to read from and write to the registry.
When you provide the registry, Structural detects whether the registry is from Amazon Elastic Container Registry (Amazon ECR), Google Artifact Registry (GAR), or a different container solution.
It displays the appropriate fields based on the registry type.
For a registry other than an Amazon ECR or a GAR registry, the credentials can be either a username and access token, or a secret.
The option to use a secret is not available on Structural Cloud.
In general, the credentials must be for a user that has read and write permissions for the registry.
To use a username and access token:
Click Access token.
In the Username field, provide the username.
In the Access Token field, provide the access token.
To use a secret:
Click Secret name.
In the Secret Name field, provide the name of the secret.
For ACR, the provided credentials must be for a service principal that has sufficient permissions on the registry.
Structural only supports Google Artifact Registry (GAR). It does not support Google Container Registry (GCR).
For a GAR registry, you upload a service account file, which is a JSON file that contains credentials that provide access to Google Cloud Platform (GCP).
The associated service account must have the Artifact Registry Writer role.
For Service Account File, to search for and select the file, click Browse.
For an Amazon ECR registry, you can either:
Provide the AWS access and secret key that is associated with the IAM user that will connect to the registry
Provide an assumed role
(Self-hosted only) Use the credentials configured in the Structural environment settings TONIC_AWS_ACCESS_KEY_ID
and TONIC_AWS_SECRET_ACCESS_KEY
.
(Self-hosted only) If Structural is deployed in Amazon Elastic Kubernetes Service (Amazon EKS), then you can use the AWS credentials that live on the EC2 instance.
To provide an AWS access key and secret key:
Click Access Keys.
In the Access Key field, enter an AWS access key that is associated with an IAM user or role.
In the Secret Key field, enter the secret key that is associated with the access key.
To provide an assumed role:
Click Assume Role.
In the Role ARN field, provide the Amazon Resource Name (ARN) for the role.
In the Session Name field, provide the role session name.
If you do not provide a session name, then Structural automatically generates a default unique value. The generated value begins with TonicStructural
.
In the Duration (in seconds) field, provide the maximum length in seconds of the session.
The default is 3600
, indicating that the session can be active for up to 1 hour.
The provided value must be less than the maximum session duration that is allowed for the role.
For the assumed role, Structural generates the external ID that is used in the assume role request. Your role’s trust policy must be configured to condition on your unique external ID.
Here is an example trust policy:
On a self-hosted instance, to use the credentials configured in the environment settings, click Environment Variables.
On a self-hosted instance, to use the AWS credentials from the EC2 instance, click Instance Profile.
The IAM user must have permission to list, push, and pull images from the registry. The following example policy includes the required permissions.
For additional security, a repository name filter allows you to limit access to only the repositories that are used in Structural. You need to make sure that the repositories that you create for Structural match the filter.
For example, you could prefix Structural repository names with tonic-
. In the policy, you include a filter based on the tonic-
prefix:
In the Tags field, provide the tag values to apply to the container artifacts. You can also change the tag configuration for individual data generation jobs.
Use commas to separate the tags.
A tag cannot contain spaces. Structural provides the following built-in values for you to use in tags:
{workspaceId}
- The identifier of the workspace.
{workspaceName}
- The name of the workspace.
{timestamp}
- The timestamp when the data generation job that created the artifact completed.
{jobId}
- The identifier of the data generation job that created the artifact.
For example, the following creates a tag that contains the workspace name, job identifier, and timestamp:
{workspaceName}_{jobId}_{timestamp}
To also tag the artifacts as latest, check the Tag as "latest" in your repository checkbox.
You can also optionally configure custom resource values for the Kubernetes pods. You can specify the ephemeral storage, memory, and CPU millicores.
To provide custom resources:
Toggle Set custom pod resources to the on position.
Under Storage Size:
In the field, provide the number of megabytes or gigabytes of storage.
From the dropdown list, select the unit to use.
The storage can be between 32MB and 25GB.
Under Memory Size:
In the field, provide the number of megabytes or gigabytes of RAM.
From the dropdown list, select the unit to use.
The memory can be between 512MB and 4 GB.
Under Processor Size:
In the field, provide the number of millicores.
From the dropdown list, select the unit.
The processor size can be between 250m and 1000m.
Only available for PostgreSQL and SQL Server. Not available for MySQL.
In the Custom Database Name field, provide the name to use for the destination database.
If you do not provide a custom database name, then the destination database uses the same name as the source database.
In the Custom Password field, provide the password for the destination database user.
If you do not provide a password, then Structural generates a password.
If your Kubernetes nodes are configured with taints, then on a self-hosted instance, you can configure the tolerations that enable the datapacker pods to be scheduled on the nodes. The datapacker pod hosts the temporary database that Structural uses during the data generation.
CONTAINERIZATION_POD_NODE_TOLERATION_KEY
- The toleration key value to apply to the datapacker pods. This setting is required. If you do not configure this setting, then Structural ignores the other settings.
CONTAINERIZATION_POD_NODE_TOLERATION_VALUES
- A comma-separated list of toleration values to apply to the datapacker pods.
CONTAINERIZATION_POD_NODE_TOLERATION_EFFECT
- The toleration effect to apply to the datapacker pods.
CONTAINERIZATION_POD_NODE_TOLERATION_OPERATOR
- The toleration operator to apply to the datapacker pods.
For details about how to configure Structural to write output to Ephemeral, go to Writing output to Tonic Ephemeral. For more information about Ephemeral, go to the .
In Destination Settings, you provide the connection information for the destination database. For information about the destination database connection fields for a specific data connector, go to the workspace configuration topic for that .
Tonic Ephemeral is a separate Tonic.ai product that allows you to create temporary databases to use for testing and demos. For more information about Ephemeral, go to the .
For more information, go to .
For more information, go to .
The TONIC_TEST_CONNECTION_TIMEOUT_IN_SECONDS
determines the number of seconds before a connection test times out. You can configure this setting from the Environment Settings tab on Structural Settings. By default, the connection test times out after 15 seconds.
A workspace uses files as its source data and produces transformed versions of those files as its output.
For more information, go to .
For information about how to create and manage custom images for Oracle, go to .
Whether the workspace is a
The workspace share icon, to
The Generate Data button, to
When Structural writes data generation output to a repository, it writes the destination data to a container volume. From the list of container artifacts, you can copy the volume digest, and download a Docker Compose file that provides connection settings for the database on the volume. Structural generates the Compose file when you make the request to download it. For more information about getting access to the container artifacts, go to .
You can also use the data volume to start a Tonic Ephemeral database. However, if the data is larger than 10 GB, we recommend that you write the data to an Ephemeral user snapshot instead. For information about writing to an Ephemeral snapshot, go to .
For an overview of writing destination data to container artifacts, you can also view the .
For a Structural instance that is deployed on Docker, unless you , the Container Repository option is hidden.
The secret is the name of a Kubernetes secret that lives on the pod that the Structural worker runs on. The secret type must be kubernetes.io/dockerconfigjson
. The Kubernetes documentation provides information on .
For Structural, the service principal must at least have the permissions that are associated with the.
For an overview of taints and tolerations, go to the .
To configure the tolerations, you configure the following . You can add these settings to the Environment Settings list on Structural Settings.
Identification and connection type
Settings to identify the workspace and to select the data connector.
Data connection settings
Connect to source and destination databases or, for the file connector, local or cloud storage files.
Data generation settings
Block data generation on schema changes. Enable cross-run consistency.
Enable and configure upsert
Add new destination records and update changed destination records. Ignore other unchanged destination records.
Write output to Tonic Ephemeral
Use the data generation output to create an Ephemeral user snapshot.
Write output to a container repository
Use the data generation output to populate a container data volume.
Workspaces view
View the list of workspaces that you have access to.
Create, edit, and delete workspaces
Add and remove workspaces, or update a workspace configuration.
Export and import workspace configuration
Save an existing workspace configuration. Apply a saved configuration to a workspace.
Assign workspace tags
Use tags to identify and organize your workspaces.
Workspace settings
Includes identifying information, data connection settings, and data generation settings.
Workspace management view
Provides access to workspace configuration and generation tools.
Workspace inheritance
Create child workspaces that inherit source data and configuration from their parent workspace.
Required permission
Global permission: View organization users. This permission is only required for the Tonic Structural application. It is not needed when you use the Structural API.
Either:
Workspace permission: Transfer workspace ownership
Global permission: Manage access to Tonic Structural and to any workspace
To grant yourself access after the transfer:
Workspace permission: Share workspace access
Every workspace has an owner. The owner is always a user.
The user who creates the workspace is automatically the owner of the workspace.
By default, the workspace owner is assigned the built-in Manager workspace permission set. On Enterprise instances, you can choose a different workspace permission set to assign to all workspace owners.
You cannot remove that permission set from the workspace owner.
You can transfer a workspace to a different owner. The new owner is assigned the owner permission set. If the previous owner does not otherwise have access to the owner permission set, then that permission set is removed.
To transfer workspace ownership:
To transfer ownership of a single workspace, from the workspace actions menu, select Transfer Ownership.
To transfer ownership of multiple workspaces:
Check the checkbox for each workspace to grant access to.
From the Actions menu, select Transfer Ownership.
On the transfer ownership panel, from the User dropdown list, select the new owner.
If you are the current owner of the workspace, then to grant yourself non-owner access after you transfer the ownership:
Toggle Receive access to workspace to the on position.
Select the workspace permission set to assign to yourself.
Click Transfer Ownership.
Required workspace permission: Configure workspace settings
You can associate custom tags with each workspace. Tags can help to organize and provide a quick glance into the workspace configuration.
Tags can be seen by every user that has access to the workspace.
Tags are stored in the workspace JSON, and are included in the workspace export. You can also use the API to get access to tags.
You can add and edit tags in the Tags field on the New Workspace and Settings pages.
To add tags, enter a comma-separated list of the tags to add.
To remove a tag, click its delete icon.
You can also manage tags directly from Workspaces view.
To add tags to a workspace that does not currently have tags:
Hover over the Tags column for the workspace.
Click Add Tags.
In the tag input field, type a comma-separated list of tags to apply.
Press Enter.
To edit the assigned tags:
Click the Tags column for the workspace.
In the tag input field, to remove tag, click its delete icon.
To add tags, type a comma-separated list of the tags to add.
To save the tag changes, press Enter.
Required license: Enterprise
If you have multiple workspaces, then it is likely that many of the workspace components and configurations are the same or similar. It can be difficult to maintain that consistency across separate, independent workspaces.
When you copy a workspace, the new workspace is completely independent of the original workspace. There is no visibility into or inheritance of changes from the original workspace.
Workspace inheritance allows you to create workspaces that are children of a selected workspace. Unlike a copy of a workspace, a child workspace remains tied to its parent workspace.
By default, a child workspace configuration is synchronized with the configuration of the parent. In other words, any changes to the parent workspace are copied to the child workspaces. Child workspaces can also override some of the parent configuration. You can track the child workspaces and how they are customized from the parent workspace.
For example, you might want separate workspaces for different development teams. Each team can make adjustments to suit their specific projects - such as different subsets - but inherit everything else.
By default, a child workspace inherits all of the configuration from the parent workspace, except for the following:
Workspace name - A child workspace has its own name.
Workspace description - A child workspace has its own description.
Tags - A child workspace has its own tags.
Destination database - A child workspace writes output data to its own destination database. You can copy the destination database from the parent workspace.
Intermediate database - For upsert, a child workspace does not inherit the intermediate database.
Webhooks - A child workspace has its own webhooks.
When you change the configuration of a parent workspace, the configuration is also updated in the child workspaces.
The exception is when a child workspace overrides the configuration. If the configuration is overridden, then the child workspace does not inherit the change.
Tonic Structural indicates on both the parent and child workspaces when the configuration is overridden.
A child workspace can override the following configuration items.
Table modes - A child workspace can override the table mode for individual tables. The other tables continue to inherit the table mode that is configured in the parent workspace.
Column generators - A child workspace can override the generator for individual columns. The other columns continue to inherit the generator that is configured in the parent workspace. For linked columns, a change to any of the linked columns overrides the inheritance for all of the columns.
Subsetting - A child workspace can override the subsetting configuration from the parent workspace. Any change in the child workspace means that the child workspace no longer inherits any changes to the subsetting configuration from the parent workspace. For example, if you change the percentage setting on a single target table from 5 to 6, that eliminates the subsetting inheritance. The child workspace keeps the subsetting configuration that it already has, but it is not updated when the parent workspace is updated.
Post-job scripts - A child workspace can override the post-job scripts. Any change to the post-job scripts in the child workspace means that the child workspace no longer inherits any changes to the post-job scripts configuration.
Statistics seed - A child workspace can override the statistics seed configuration.
From each view, you can eliminate the overrides and restore the inheritance.
A child workspace cannot override the following configuration items:
Data connector type and source database - A child workspace always uses the same source data as the parent workspace.
Foreign keys - A child workspace always uses the same foreign key configuration as the parent workspace.
Sensitivity designation for a column - A child workspace cannot change whether a column is marked as sensitive.
For removed tables and columns, when a child workspace overrides the parent workspace configuration for the table or column, you must resolve the change in the child workspace.
If there is a conflicting change for the removed table or column in the parent workspace configuration, then regardless of whether the configuration is inherited, you must resolve that change in the parent workspace before the change is resolved for the child workspace.
For changes to column nullability or data type, you resolve the change separately in the child and parent workspaces.
You also dismiss notifications (new tables and columns) separately in the parent and child workspaces.
Disable Triggers | Indicates whether to disable any user-defined triggers before the upsert job runs. This prevents duplicate rows from being added to the destination database. By default, this is enabled. |
Automatically Start Upsert After Successful Data Generation |
Persist Conflicting Data Tables | When an upsert job cannot process rows with unique constraint conflicts, as well as rows that have foreign keys to those rows, this setting indicates whether to preserve the temporary tables that contain those rows. By default, this is disabled. Structural only keeps the applicable temporary tables from the most recent upsert job. |
Warn on Mismatched Constraints | Indicates whether to treat mismatched foreign key and unique constraints between the source and destination databases as warnings instead of errors, so that the upsert job does not fail. By default, this is disabled. |
When you create a workspace, you become the owner of the workspace, and by default are assigned the built-in Manager workspace permission set for the workspace. The Manager permission set provides full access to the workspace configuration, data, and results.
With a Professional or Enterprise license, you can also assign workspace permission sets to other users and to SSO groups. You can also transfer a workspace to a different owner.
If you are granted access to any workspace permission set for a workspace, then you can see all of the workspace management views for that workspace. However, you can only perform tasks that you have permission for in that workspace.
Workspace access is managed from the Workspaces view. You cannot assign workspace permission sets from Structural Settings view.
You can also view an overview video tutorial about workspace access.
Required license: Professional or Enterprise
Required permission
Global permission: View organization users. This permission is only required for the Tonic Structural application. It is not needed when you use the Structural API.
Either:
Workspace permission: Share workspace access
Global permission: Manage user access to Tonic and to any workspace
Tonic Structural uses workspace permission sets for role-based access (RBAC) of each workspace.
A workspace permission set is a set of workspace permissions. Each permission provides access to a specific workspace feature or function.
Structural provides built-in workspace permission sets. Enterprise instances can also configure custom permission sets.
You can assign workspace permission sets to users and to SSO groups, if you use SSO to manage Structural users. Before you assign a workspace permission set to an SSO group, make sure that you are aware of who is in the group. The permissions that are granted to an SSO group automatically are granted to all of the users in the group. For information on how to configure Structural to filter the allowed SSO groups, go to Synchronizing SSO groups with Tonic Structural.
You cannot remove the owner workspace permission set from the workspace owner. By default, the owner permission set is the built-in Manager permission set.
To change the current access to the workspace:
To manage access to a single workspace, either:
On the workspace management view, in the heading, click the share icon.
On Workspaces view, click the actions menu for the workspace, then select Share.
To manage access for multiple workspaces:
Check the checkbox for each workspace to grant access to.
From the Actions menu, select Share Workspaces.
The workspace access panel contains the current list of users and groups that have access to the workspace. To add a user or group to the list of users and groups, begin to type the user email address or group name. From the list of matching users or groups, select the user or group to add. Free trial users can invite other users to start their own free trial. Provide the email addresses of the users to invite. The email addresses must have the same corporate email domain as your email address. When the invited users sign up for the free trial, they are added to the Structural organization for the free trial user that invited them and have access to the workspace.
For a user or group, to change the assigned workspace permission sets:
Click Access. The dropdown list is populated with the list of custom and built-in workspace permission sets. If you selected multiple workspaces, then on the initial display of the workspace sharing panel, for each permission set that a user or group currently has access to, the list shows the number of workspaces for which the user or group has that permission set. For example, you select three workspaces. A user currently has Editor access for one workspace and Viewer access for the other two. The Editor permission set has 1 next to it, and the Viewer permission set has 2 next to it.
Under Custom Permission Sets, check the checkbox next to each workspace permission set to assign to the user or group. Uncheck the checkbox next to each workspace permission set to remove from the user or group.
Under Built-In Permission Sets, check the workspace permission set to assign to the user or group. You can only select one built-in permission set to assign. By default, for an added user or group, the Editor permission set is selected. To select a built-in workspace permission set that is lower in access than the currently selected permission set, you must first uncheck the selected permission set. For example, if Editor is currently checked, then to change the selection to Viewer, you must first uncheck Editor.
To remove all access for a user or group, and remove the user or group from the list, click Access, then click Revoke.
To save the new access, click Save.
Required workspace permission: Export and import workspace
You can export a workspace configuration to a JSON file, and import configuration from a workspace configuration JSON file.
For example, you might want to preserve a version of the workspace configuration before you test other changes. You can then use the exported file to restore the original configuration.
Or you might want to use a script to make changes to an exported configuration file. You can then import the updated file to update the workspace configuration.
The workspace JSON configuration file includes the following information:
Sensitivity designations that you assigned to columns
Assigned table modes
Assigned column generators
Subsetting configuration
Post-job script configuration
To export the workspace configuration, either:
On the workspace management view, from the download menu, select Export Workspace.
On Workspaces view, click the actions menu for the workspace, then select Export.
When you export a child workspace, the exported workspace does not retain any of the inheritance information. The exported information is the same for all exported workspaces.
To import a workspace configuration file:
Select the import option. Either:
On the workspace management view, from the download menu, select Import Workspace.
On Workspaces view, click the actions menu for the workspace, then select Import.
On the Import Workspace dialog, to select the file to import, click Browse.
After you select the file, click Import.
When you import a workspace configuration into a child workspace, Tonic Structural only updates the configuration that can be overridden. If a configuration must be inherited from the parent workspace, then it is not affected by the imported configuration. For more information, go to About workspace inheritance.
Indicates whether to immediately run the upsert job after the initial data data generation to the intermediate database. By default, this is enabled. If you turn this off, then after the initial data generation, you must start the upsert job manually. For more information, go to .
Assign workspace permission sets
The assigned permission sets determine the level of access to the workspace.
Transfer ownership of a workspace
Make another user the workspace owner. You can also assign yourself workspace permission sets.
Database View provides a complete view of your source database structure and configuration.
To display Database View, either:
On the workspace management view, in the workspace navigation bar, click Database View.
On Workspaces view, from the dropdown menu in the Name column, select Database View.
Database View consists of:
On the left, the list of tables in the source database.
On the right, the list of columns in those tables.
For an individual column in Database View, you can configure the assigned generator and determine the column sensitivity.
From the column list, to display the generator configuration panel, in the Applied Generator column, click the generator name tag.
Required workspace permission: Configure column sensitivity
The Structural sensitivity scan provides an initial indication of whether a column is sensitive and, if it is sensitive:
The type of sensitive data that is in the column.
The confidence level of the sensitivity detection.
For more information, go to Identifying sensitive data.
In a child workspace, you cannot configure whether a column is sensitive. A child workspace always inherits the sensitivity designations from its parent workspace.
From the Status column, to confirm or change the column sensitivity, click the Status value.
The status panel indicates whether the column is sensitive. It identifies the sensitivity type, and indicates how the sensitivity was determined - by a sensitivity scan or by a user.
For a column that matches a built-in sensitivity type, the first time that you display the panel, the Sensitive data? setting displays Yes and No options for you to confirm or change the sensitivity.
To indicate that the column is sensitive, click Yes.
To indicate that the column is not sensitive, click No.
When you click Yes or No, the Yes and No options change to a simple toggle. When you click Yes, the sensitivity confidence level changes to full.
After that:
To indicate that the column is sensitive, toggle Sensitive data? to the on position.
To indicate that the column is not sensitive, toggle Sensitive data? to the off position.
When a column matches a sensitivity rule, the sensitivity panel indicates that the column matched a sensitivity rule.
You use the Sensitive data? toggle to indicate whether the column is actually sensitive.
When a column does not match a built-in sensitivity type or a custom sensitivity rule, the sensitivity panel indicates that column is not sensitive.
The Sensitive data? setting displays Yes and No options for you to confirm or change the sensitivity.
To indicate that the column is sensitive, click Yes.
To confirm that the column is not sensitive, click No.
When you click Yes or No, the Yes and No options change to a simple toggle.
If you click Yes:
The panel updates to indicate that a user confirmed that the column is sensitive.
The sensitivity confidence level is set to full confidence.
After that:
To indicate that the column is sensitive, toggle Sensitive data? to the on position.
To indicate that the column is not sensitive, toggle Sensitive data? to the off position.
You can also configure the sensitivity from the generator configuration panel. On the generator configuration panel, the sensitivity setting is at the top right.
To indicate that a column is sensitive, toggle the sensitivity setting to the on position.
To indicate that the column is not sensitive, toggle the sensitivity setting to the off position.
When you change the sensitivity from the generator configuration panel, the Sensitive data? setting on the sensitivity panel also changes from the Yes and No options to the toggle.
Required workspace permission: Configure column generators
When a sensitivity scan identifies a column, Structural recommends a generator for the column. For example, when the sensitivity scan identifies a column as a first name, Structural recommends the Name generator configured to generate a first name value.
In the Assigned Generator column on Database View, columns that do not have an assigned generator, and that have a recommended generator, display the available recommendation icon.
To review and either apply or ignore the recommended generator, click the generator dropdown.
The generator recommendation panel contains the following information:
The sensitivity confidence level
The recommended generator
Sample source and destination values based on the recommended generator
From the panel, you choose whether to assign or ignore the recommended generator for that type.
To assign the recommended generator, click Apply. Structural displays the generator configuration panel with the recommended generator selected. You can then adjust the configuration or select a different generator.
To ignore the recommendation, click Ignore. Structural displays the generator configuration panel to allow you to select the generator to assign to the column.
Required workspace permission: Configure column generators
To change the generator that is assigned to a selected column:
Click the generator name tag for the column.
To assign a different generator to the column, from the Generator Type dropdown list, select the generator.
Configure the generator options.
To reset an assigned generator to Passthrough, which indicates to not transform the data:
Click the generator name tag.
On the generator configuration panel, click the delete icon next to the generator dropdown.
For details about the configuration options for each generator, go to the Generator reference.
For more information about selecting and configuring generators and generator presets, go to Assigning and configuring generators.
Tonic Structural runs the following types of jobs on a workspace:
Sensitivity scans, which analyze the source database to identify sensitive data.
Collection scans, which analyze the source data for a MongoDB workspace to determine the available fields in each collection, the field types, and how prevalent the fields are.
Data generation, data pipeline generation, and containerized generation jobs, which generate the destination data from the source data.
Upsert data generation jobs, which generate the intermediate database from the source database.
Upsert jobs, which use data from the intermediate database to add new rows to and update changed rows in the destination database. If the migration process is enabled, then it is a step in the upsert job.
SDK table statistics jobs. These jobs only run when you use the SDK to generate data in a Spark workspace, and the assigned generators require the statistics.
You can view a list of jobs that ran on the workspace, and view details for individual jobs.
The Jobs view displays the list of jobs that ran on the workspace. The list includes the 100 most recent jobs.
To display the Jobs view:
On the workspace management view, in the workspace navigation bar, click Jobs.
On Workspaces view, from the dropdown menu in the Name column, select Jobs.
For each job, the job list includes the following information:
Job ID - The identifier of the job. To copy the job ID, click the icon at the left of the row.
Type - The type of job.
Submitted - The date and time when the job was submitted.
Completed - The date and time when the job finished running.
A job can have one of the following statuses:
Queued - The job is queued to run, but has not yet started. A job is queued for one of the following reasons:
Another job is currently running on the same workspace. For example, you cannot run a sensitivity scan and a data generation, or multiple data generations, at the same time on the same workspace. This is true regardless of the number of workers on the instance. On Structural Cloud, there is also a limit on the number of concurrent running jobs for each organization. When that maximum is reached, a new job remains queued until a current running job completes.
There isn't an available worker on the instance to run the job. A Structural instance with one worker can only run one job at a time. If a job from one workspace is currently running, a job from another workspace cannot start until the first job is finished.
To view information about why a job is queued, click the status value.
Running - The job is in progress.
Canceled - The job is canceled.
Completed - The job completed successfully.
Failed - The job failed to complete.
Each of these statuses has a corresponding "with warnings" status. For example, Running with warnings, Completed with warnings. A "with warnings" status indicates that the job had at least one warning at the time of the request.
You can filter the list by either the type or the status.
To filter the list by the job type:
Click the filter icon in the Type column heading. By default, all types are included, and none of the checkboxes are checked.
To only include specific types of jobs, check the checkbox next to each type to include. Checking all of the checkboxes has the same effect as unchecking all of the checkboxes.
To filter the list by the job status:
Click the filter icon in the Status column heading. The status panel displays all of the statuses that are currently in the list. For example, if there are no Queued jobs, then the Queued status is not in the list. By default, all of the statuses are included, and none of the checkboxes are checked.
To only include jobs that have specific statuses, check the checkbox next to each status to include. Checking all of the checkboxes has the same effect as unchecking all of the checkboxes.
You can sort the jobs by either the submission or completion timestamp.
To sort by submission date, click the Submitted column heading. To reverse the sort order, click the heading again.
To sort by completion date, click the Completed column heading. To reverse the sort order, click the heading again.
For jobs other than Queued jobs, you can display details about the workspace and the job progress.
From the Jobs view, to display the details for a job, click the job row.
The left side of the job details view contains the workspace information.
For a sensitivity scan, the workspace information is limited to the owner, database type, and worker version.
For a data generation job, the workspace information also includes:
Whether subsetting, post-job scripts, or webhooks are used.
The number of schemas, tables, and columns in the source database.
The number of schemas, tables and columns in the destination database.
The Job Log tab shows the start date, start time, and duration of the job, followed by the list of job process steps.
For data generation jobs, the Privacy Report tab displays the number of at-risk, protected, and not sensitive columns in the source database.
At-risk columns contain sensitive data, but still have Passthrough as the assigned generator.
Protected columns have an assigned generator other than Passthrough.
Not sensitive columns have Passthrough as the assigned generator, but do not contain sensitive data.
A workspace can write output to a Tonic Ephemeral snapshot, with an option to preserve the temporary Ephemeral database that is used to create the snapshot.
For data generation jobs that write to Ephemeral, the Data available in Tonic Ephemeral panel displays. It contains a link to Ephemeral, and access to either the snapshot or the database.
When the temporary database is not preserved, the Data available in Tonic Ephemeral panel provides access to the snapshot.
To navigate to Ephemeral and view the details for an Ephemeral snapshot, click View Snapshot in Tonic Ephemeral.
When the temporary database is preserved, the Data available in Tonic Ephemeral panel provides access to the database.
To display the connection details for the Ephemeral database, click View connection info.
For an Ephemeral database, the connection details include:
The database location and credentials. Each field contains a copy icon to allow you to copy the value.
SSH tunnel information, including instructions on how to create an SSH tunnel from your local machine to the Ephemeral database.
The job identifier is a unique identifier for the job. To copy the job ID, either:
You can cancel Queued or Running jobs.
For jobs with those statuses, the rightmost column in the job list contains a cancel icon.
To cancel the job, click the icon.
For workspaces that are configured to write destination data to container artifacts, the Jobs view also provides access to those artifacts. For more information, go to Viewing and downloading container artifacts.
Required workspace permission: Download job logs
To download diagnostic logs, you must have the Enable diagnostic logging global permission.
For all jobs, the job logs provide detailed information about the job processing. Tonic.ai support might request the job logs to help diagnose issues.
For a failed data generation to Ephemeral, the job logs include the Ephemeral logs and the destination database pod logs.
For upsert jobs where the migration process is enabled, and you configured the GET Schema Change Logs
endpoint, the upsert job logs include the migration process logs.
You can download the job logs from the Jobs view or the job details view. The download includes up to 1MB of log entries.
On the Jobs view, to download the logs for a job, click the download icon in the rightmost column.
On the job details view, to download the logs for a job, click Reports and Logs, then select Job Logs.
By default, Structural redacts sensitive values from the job logs. To help support troubleshooting, you can configure data connectors or an individual data generation job to create unredacted versions of the log files, referred to as diagnostic logs. For more information, go to Redacted and diagnostic (unredacted) logs.
To access diagnostic log files, you must have the Enable diagnostic logging global permission.
If you do not have the Enable diagnostic logging global permission, then you cannot download the logs for that job. The download option is disabled.
Required workspace permission: View and download Privacy Report
From the job details view, you can download a Privacy Report file that provides an overview of the current protection status of the database columns based on the workspace configuration at the time that the job ran.
You can download either:
The Privacy Report .csv file, which provides details about the table columns, the column content, and the current protection configuration.
The Privacy Report PDF file, which provides charts that summarize the privacy ranking scores for the table columns. It also includes the table from the .csv file.
To display the download options, click Reports and Logs. In the menu:
To download the Privacy Report .csv file, click Privacy Report CSV.
To download the Privacy Report PDF file, click Privacy Report PDF.
For more information about the Privacy Report files and their content, go to Using the Privacy Report to verify data protection.
For a workspace that writes the output to a container repository, the job includes the following additional logs:
Database logs - Logs for the database container that is used as the destination.
Datapacker logs - Logs for creating the OCI artifact and uploading it to an OCI registry.
To download these logs for a data generation job, on the job details view, click Reports and Logs, then select Database Logs or Datapacker Logs.
For workspaces that are connected to Amazon Redshift or Snowflake on AWS databases, the data generation job requires multiple calls to a Lambda function. For these data generation jobs, the CloudWatch logs monitor the progress of and display errors for these Lambda function calls.
To download the CloudWatch logs for a data generation job, on the job details view, click Reports and Logs, then select CloudWatch Logs.
The CloudWatch Logs option only displays for Amazon Redshift and Snowflake on AWS data generation jobs.
Required workspace permission: Download SqlLdr Files
For an Oracle data generation, if both of the following are true:
The data generation job ran SQL Loader (sqlldr).
sqlldr either failed or succeeded with errors.
Then to download the sqlldr log files, click Reports and Logs, then select sqlldr Logs.
For a data generation from a file connector workspace that uses local files, you can download the transformed files for that job.
The download is a .zip file that contains the files for a selected file group.
On the job details view, when files are available to download, the Data available for file groups panel displays.
To download the files for a file group:
Click Download Results.
From the list, select the file group. Use the filter field to filter the list by the file group name.
Required workspace permission: Download job logs
For workspaces that use the newer data generation processing, users can configure a data generation job to also generate performance metrics. This is usually done for troubleshooting purposes.
On the job details view, to download the performance metrics for the job, click Reports and Logs, then click Performance Metrics.
Privacy Hub tracks the current protection status of source data columns based on:
, either from the most recent sensitivity scan or from manual assignments
Assigned
Assigned
To display Privacy Hub, either:
On the workspace management view, in the workspace navigation bar, click Privacy Hub.
On Workspaces view, click the workspace name.
From Privacy Hub, you can:
Review and apply the recommended generators for all detected sensitive columns
View the current protection status of columns
Manually mark columns as sensitive or not sensitive
Configure protection for sensitive columns
Download a preview Privacy Report
Run a new sensitivity scan
The sensitivity scan detects specific types of sensitive data.
If your workspace contains any columns that the sensitivity scan identified, and for which you have not either:
Assigned a generator
Marked as not sensitive
Then Tonic Structural displays a Sensitivity Recommendations banner that contains a count of those columns.
The count only includes sensitive columns that the sensitivity scan detects. If you manually mark a column as sensitive, it is not included in the list.
On the banner, the Review Recommendations option allows you to review the detected columns and the recommended generators for each detected sensitive data type.
You can then apply the recommended generators or ignore the recommendation. When you ignore a recommendation, you either:
Indicate to remove the generator recommendation for the column.
Indicate that the column data is not sensitive.
The protection status panels at the top of Privacy Hub provide an overview of the current protection status of the columns in the source data.
Each panel displays:
The number of columns that are in that category
The estimated percentage of columns that are in that category
The column counts do not include columns that do not have data in the destination database. For example, if a table is assigned Truncate table mode, then Privacy Hub ignores the columns in that table.
The information on these panels updates automatically as you change whether columns are sensitive and assign generators to columns.
The At-Risk Columns panel reflects columns that:
Are populated in the destination database.
Are marked as sensitive.
Have the generator set to Passthrough, which indicates that Structural does not perform any transformation on the data.
For each column, the At-Risk Columns panel also indicates the sensitivity confidence, from full confidence (completely red) to low confidence (a small percentage of red).
The goal is to have 0 at-risk columns.
The Protected Columns panel reflects columns that:
Are populated in the destination database.
Are assigned a generator other than Passthrough.
It includes both sensitive and non-sensitive columns.
Note that a column is considered protected based solely on the assigned generator. Some more complex generators, such as JSON Mask or Conditional, allow you to apply different generators to specific portions of a value or based on a specific condition. However, the protection status does not reflect these sub-generators. An applied sub-generator could be Passthrough.
The Not Sensitive Columns panel reflects columns that:
Are populated in the destination database.
Are marked as not sensitive.
Have the generator set to Passthrough.
The Database Tables list shows the protection status for each table in the source database. You can view the number of columns that have each protection status, and update the column configuration.
The list does not include tables where the table mode is Truncated or Preserve Destination. Truncated tables are not populated in the destination database. For Preserve Destination tables, the existing data in the destination database does not change.
For each table, Database Tables provides the following information:
Privacy Status - Indicates the current protection status of the columns in the table. It provides the same view and configuration options as the protection status panels at the top of Privacy Hub.
You can filter the Database Tables list either by the table name or by the schema.
To filter the list by table name, in the filter field, begin typing text in the table name. As you type, Structural updates the list to only display matching tables.
To filter the list to only include tables that belong to a specific schema:
Click Filter by Schema.
From the schema dropdown list, select the schema.
When you select a schema, Structural adds it to the filter field.
You can sort the Database Tables list by any column except for the Privacy Status column.
To sort by a column, click the column heading. To reverse the sort order, click the heading again.
The Privacy Status column in the Database Tables list indicates the protection status of the columns in the table.
Each protection status panel displays a series of boxes to represent the columns that apply to that status. For example, if the source data contains four columns that are at-risk, then the At-Risk Columns panel displays four boxes, one for each column.
The Privacy Status column in the Database Tables list displays the same set of boxes for the columns in an individual table.
If the number of columns is too large to fit, then the last box shows the number of additional columns that apply. For example, if there are 15 columns that don't fit, then the last box is labeled +15.
When you hover over a box, the column name displays in a tooltip.
When you click a box, the details panel for that column displays.
When you click the box for remaining columns, the details panel for the first column in the remaining columns displays.
You can use the next and previous icons at the bottom right of the details panel to display the details for the next or previous column.
The column details panel opens to the settings view. The settings view contains the following information:
The table and column name.
Whether the column is flagged as sensitive.
The type of PII that the column contains.
The data type for the column data.
The generator that is assigned to the column.
For a child workspace, whether the column configuration is inherited from the parent workspace. For columns that have overrides, you can reset to the parent configuration.
Required workspace permission: Configure column sensitivity
From the settings view of the column details, you can configure the column sensitivity.
As you change the column sensitivity, Structural updates the protection status panels.
To change whether the column is sensitive, toggle the Sensitive option. The column is moved if needed to reflect its new status. However, you remain on the current panel.
For example, from the At-Risk Columns panel, you change a column to be not sensitive. The column is moved to the Not Sensitive Columns panel. When you click the next or previous icons, you view the details for the next or previous column on the At-Risk Columns panel.
Required workspace permission: Configure column generators
From the column details, you can assign and configure the column generator.
When you change the column generator, Structural updates the protection status panels.
If the column generator was previously Passthrough, then the column is moved to the Protected Columns panel. However, you remain on the current panel. For example, you assign a generator to a column that is on the At-Risk Columns panel. The column is moved to the Protected Columns panel, but when you click the next or previous icons, you view the details for the next or previous column on the At-Risk Columns panel.
For sensitive columns that are not protected, Structural displays the recommended generator as a button.
For self-hosted instances that have an Enterprise license, the recommended generator is the built-in generator preset.
To assign the recommended generator to the column, click the button.
Otherwise, select the generator from the Generator Type dropdown list.
If the selected generator requires additional configuration, then below the Generator Type dropdown list is an Edit Generator Options link.
To display the configuration fields for the generator, click Generator Options.
After you configure the generator, to return to the settings view, click Back.
Required workspace permission:
Source data: Preview source data
Destination data: Preview destination data
From the column details, you can display sample data for the column. The sample data allows you to compare the source and destination versions of the column values.
To display the sample data, click the view sample (magnifying glass) icon.
On the sample data view of the column details:
The Original Data tab shows the values in the source data.
The Protected Output tab shows the values that the generator produced.
Required license: Professional or Enterprise
From the column details, you can view and add comments on the column. You might use a comment to explain why you selected a particular generator or marked a column as sensitive or not sensitive.
From the column details, to display the comments for the column, click the comment icon.
The comments view displays any existing comments on the column. The most recent comment is at the bottom of the list. Each comment includes the name of the user who made the comment.
To add the first comment to a column, type the comment into the comment text area, then click Comment.
To add an additional comment, type the comment into the comment text area, then click Reply.
Required license: Enterprise
The Privacy Report files that you download from Privacy Hub or the workspace download menu provide an overview of the current protection status based on the current configuration.
This is different from the Privacy Report files that you download from the data generation job details, which show the protection status after the data generation.
You can download either:
The Privacy Report .csv file, which provides details about the table columns, the column content, and the current protection configuration.
The Privacy Report PDF file, which provides charts that summarize the privacy ranking scores for the table columns. It also includes the table from the .csv file.
From the workspace management view, click the download icon. In the download menu:
To download the Privacy Report PDF file, click Download Privacy Report PDF.
To download the Privacy Report .csv file, click Download Privacy Report CSV.
From Privacy Hub, click Reports and Logs, then:
To download the Privacy Report .csv file, click Privacy Report CSV.
To download the Privacy Report PDF file, click Privacy Report PDF.
Required workspace permission: Run sensitivity scan
You add columns to the source database. The new scan identifies whether the new columns contain sensitive data.
The data in a column changes significantly, and a column that Structural originally marked as not sensitive might now contain sensitive data.
To run a new sensitivity scan, click Run Sensitivity Scan.
When Structural runs a new sensitivity scan:
Structural analyzes and determines the sensitivity of any new columns.
It does not change the sensitivity of existing columns that you marked as sensitive or not sensitive.
For existing columns that you did not change the sensitivity of:
Structural does not change the sensitivity of existing columns that the original scan marked as sensitive.
It can change the sensitivity of existing columns that the original scan marked as not sensitive.
The protection status panels are updated to reflect the results of the new scan.
The table list at the left of Database View contains the list of tables in the source database. You can filter the table list and assign tables modes to the tables.
The table list is grouped by schema. You can expand and collapse the list of tables in each schema. This does not affect the displayed columns.
For a workspace, each table corresponds to a file group.
For each table, the table list includes the following information:
The name of the table.
The number of columns that have an assigned generator (a generator other than Passthrough). The number does not display if none of the table columns has an assigned generator.
D = De-identify
S = Scale
T = Truncate
P = Preserve Destination
I = Incremental
For a child workspace, if the selected table mode overrides the parent workspace configuration, then the override icon displays.
As you filter the table list, the column list also is filtered to only include the columns for the filtered tables.
To filter the table list by name, in the filter field, begin to type text that is in the table name.
As you type, Tonic Structural filters the list to only display tables with names that contain the filter text.
To filter the table list based on the assigned table mode:
Click Filters.
On the filter panel, check the checkbox next to each table mode to include. By default, the list includes all of the table modes. As you check and uncheck the table mode checkboxes, Structural adds and removes the associated tables from the list.
You can filter the table list to only display tables that have no assigned generators:
Click Filters.
On the filter panel, to only show tables that do not have assigned generators, check the No Generators Applied checkbox.
Required workspace permission: Assign table modes
To change the assigned table mode for a single table:
Click the table mode dropdown next to the table name.
From the table mode dropdown list, select the table mode.
For a child workspace, the table mode selection panel indicates whether the selected table mode is inherited from the parent workspace. If the child workspace currently overrides the parent workspace configuration, then to reset the table mode to the table that is assigned in the parent workspace, click Reset.
To change the assigned table mode for multiple tables:
Check the checkbox for each table to change the table mode for. To select a continuous range of tables, click the first table in the range, then Shift-click the last table in the range. To select all of the tables in a schema, click the schema name.
Click Bulk Edit.
On the panel, click the radio button for the table mode to assign to the selected tables.
Status - The current status of the job, and how long ago the job reached that status. When you hover over the status, a tooltip displays the actual timestamp for the status change, and a summary of how long the job ran. For queued jobs, to display a panel with information about why the job is queued, click the status value.
From the Jobs view, click the copy () icon in the leftmost column.
From the job details view, click the copy () icon next to the job ID.
You can also track the history of changes to column sensitivity and the assigned column generators. For more information, go to .
For more information, go to .
From each panel, you can .
Click Open in Database View to navigate to . The column list is filtered to show columns that are at risk.
Click Open in Database View to navigate to . The column list is filtered to show all included columns that are protected.
Click Open in Database View to navigate to . The column list is filtered to show included columns that are not sensitive and are not protected.
Name - The table name. For a workspace, each table corresponds to a file group.
Not Sensitive - The number of not sensitive columns in the table. Not sensitive columns are not marked as sensitive and have Passthrough as the generator. Click the value to navigate to , filtered to display the not sensitive columns for the table.
Protected - The number of protected columns in the table. Protected columns have an assigned generator. A protected column can be either sensitive or not sensitive. Click the value to navigate to , filtered to display the protected columns for the table.
At-Risk - The number of at-risk columns in the table. These columns are marked as sensitive, but have Passthrough as the generator. The goal is to have 0 unprotected sensitive columns. Click the value to navigate to , filtered to display the at-risk columns for the table.
This column provides the same as the protection status panels at the top of Privacy Hub, but is limited to the columns in a specific table.
You cannot change the sensitivity of columns in a child workspace. A child workspace always inherits the sensitivity from its parent workspace. For more information, go to .
For more information about selecting a generator, go to .
For information about configuring a selected generator or generator preset, go to .
For more information about the Privacy Report files and their content, go to .
Privacy Hub provides an option to manually start a new . For example, you might want to run a new sensitivity scan when:
You cannot run a sensitivity scan on a . Child workspaces always inherit the sensitivity results from their parent workspace.
The . The table list only shows the first letter of the table mode:
To display for a table, click the arrow icon to the right of the table entry.
You can filter the table list and . You can also filter the tables based on .
The table mode determines the number of rows and columns in the destination database. For details about the available table modes and how they work, go to .
View and configure tables
Filter the table list, and assign table modes to tables
View the column list
Apply filters to and sort the list of columns
Configure an individual column
Assign a generator and determine the column sensitivity
Configure multiple columns
Use the bulk edit option to update multiple columns
Comment on columns
Add and respond to column comments
View sample data
View example source and destination data for a column
Required workspace permission:
Source data: Preview source data
Destination data: Preview destination data
For each column on Database View, you can display a sample list of the column values.
For columns that have an assigned generator, the sample shows both the current values and the possible values after the generator is applied.
To display the sample values, in the Column column, click the magnifying glass icon.
If the generator is Passthrough, then the sample data panel contains only Original Data.
If a different generator is assigned, then the sample data panel contains both Original Data and Protected Output.
Required license: Professional or Enterprise
From Database View, you can add comments to columns. For example, you might use a comment to explain why you selected a particular generator or marked a column as sensitive or not sensitive.
If a column does not have any comments, then to add a comment:
In the Applied Generator column, click the comment icon.
In the comment field, type the comment text.
Click Comment.
When a column has existing comments, the comment icon is green. To add comments:
Click the comment icon. The comments panel shows the previous comments. Each comment includes the comment user.
In the comment field, type the comment text.
Click Reply.
The bulk edit option on Database View allows you to configure multiple columns at the same time. From the bulk editing panel, you can:
Mark the selected columns as sensitive or not sensitive.
Assign a generator to the selected columns.
Apply the recommended generator to the selected columns.
Reset the generator configuration to the baseline. Requires that all of the selected columns are assigned the same preset.
Depending on the column selection, you can also create a new sensitivity rule.
To select the columns and display the bulk edit option:
Check the checkbox next to each column to update.
Click Bulk Edit.
Required workspace permission: Configure column sensitivity
On the Bulk Edit panel, under Sensitivity:
To mark the selected columns as sensitive, click Sensitive.
To mark the selected columns as not sensitive, click Not Sensitive.
Required workspace permission: Configure column generators
On the Bulk Edit panel, under Bulk Edit Applied Generator, select and configure the generator to assign to the selected columns.
Required workspace permission: Configure column generators
If any of the selected columns have a recommended generator, then on the Bulk Edit panel, the Generator recommendations found panel displays. The panel indicates the number of selected columns that have a recommendation.
To assign the recommended generators to those columns, click Apply.
Required workspace permission: Configure column generators
For a generator preset, the baseline configuration is the configuration that is saved for that preset. The baseline configuration determines the default configuration is used when you assign the preset to a column. After you select the preset, you can override the baseline configuration.
If all of the selected columns are assigned the same preset, then to restore the baseline configuration for all of the columns, click Reset to Baseline.
Required license: Enterprise
Required global permission: Create and manage sensitivity rules
You might bulk edit columns that could benefit from a custom sensitivity rule.
For example, in your data, the Widget column is in multiple tables and contains sensitive data that Structural cannot identify. You select all of the Widget columns so that you can mark them as sensitive and apply the Character Scramble generator to them. However, a custom sensitivity rule would ensure that in the future, Widget columns are always marked as sensitive and have the Character Scramble generator recommended.
On the Bulk Edit panel, when all of the selected columns:
Have the same data type
Do not have a generator assigned
Do not have a recommended generator
Then Structural displays the Create a Sensitivity Rule panel, which contains the option to create a new sensitivity rule.
To create a sensitivity rule:
Click Create Custom Rule.
On the Create Custom Rule view, configure the new sensitivity rule. Structural automatically selects a data type based on the selected columns. The current workspace is used as the testing workspace to verify the columns that match the rule configuration. For details about the sensitivity rule configuration, go to #sensitivity-rule-config.
When you finish configuring the new rule:
To both save the rule and apply the generator preset to all workspace columns that match the rule, click Save and Apply. On the confirmation panel, click Confirm Auto Apply.
To save the rule, but not apply the generator preset to matching columns, click Save.
Structural closes the sensitivity rule configuration view and returns you to Database View. It maintains the previous column selection.
If you did not apply the generator preset, then the sensitivity rule is included in the next sensitivity scan.
The column list on Database View contains information about the sensitivity and generator configuration for each column.
The Column column provides general information about the columns and their content, including:
Table and column name. When you click the column name, Table View for the column table displays.
The name of the schema that contains the table.
The data type for the column.
An indicator when the column is a primary key
The Column column also contains the option to display sample data for the column.
The Status column provides information about whether the column contains sensitive data and whether it has an assigned generator.
The protection status can be one of the following values:
Protected - The column has an assigned generator.
Not Sensitive - The column is marked as not sensitive.
At Risk - The column is sensitive and does not have an assigned generator.
At the right of the Status column is a confidence indicator. For At Risk columns, the confidence indicator shows how confident Structural is that the column is sensitive and contains values of the displayed sensitivity type.
For more information about how Structural identifies values and assigns the confidence level, go to #sensitive-data-how-identified.
From the Status column, you can change whether a column is sensitive.
The Applied Generator column is where you select and configure the generator to assign.
The generator dropdown indicates the currently assigned generator. It also indicates when an unprotected column has a recommended generator.
For foreign key columns, the generator dropdown is disabled and the column is marked as a foreign key. Foreign key columns always inherit the generator that is assigned to the primary key.
In a child workspace, when the generator configuration overrides the parent workspace, the generator dropdown displays the override icon.
The Applied Generator column also contains the option to display and create column comments.
To filter the column list, you can:
Use the table list to filter the displayed columns based on the table that the columns belong to.
Use the filter field to filter the columns by table or column name.
Use the Filters panel to filter the columns based on column attributes and generator configuration.
You can use column filters to quickly find columns that you want to verify or update the configuration for.
To filter the column list to only include columns for specific tables, either:
Check the checkbox for each table to display columns for.
To filter the column list by table or column name, in the filter field, begin to type text that is in the table or column name.
As you type, Structural filters the column list.
The Filters panel provides access to column filters other than the table and column name.
To display the Filters panel, click Filters.
To search for a filter or a filter value, in the search field, start to type the value. The search looks for text in the individual settings.
For each filter, the Filters panel indicates the number of matching columns, based on the selected tables and the current filters.
To add a filter, depending on the filter type, either check the checkbox or select a filter option. As you add filters, Structural applies them to the column list. Above the list, Structural displays tags for the selected filters.
To clear all of the currently selected filters, click Clear All.
To only display detected sensitive columns for which there is a recommended generator, on the Filters panel, check Has Generator Recommendation.
An at-risk column:
Is marked as sensitive
Is included in the destination data.
Is assigned the Passthrough generator.
To only display at-risk columns, on the Filters panel, check At-Risk Column.
When you check At-Risk Column, Structural adds the following filters under Privacy Settings:
Sets the sensitivity filter to Sensitive
Sets the protection status filter to Not protected
Sets the column inclusion filter to Included
You can filter the columns based on the column sensitivity.
On the Filters panel, under Privacy Settings, the sensitivity filter is by default set to All, which indicates to display both sensitive and non-sensitive columns.
To only display sensitive columns, click Sensitive.
To only display non-sensitive columns, click Not sensitive.
Note that when you check At-risk Column, Tonic automatically selects Sensitive.
You can filter the columns based on whether they have any generator other than Passthrough assigned. To filter the columns based on specific assigned generators, use the Applied Generator filter.
On the Filters panel, under Privacy Settings, the column protection filter is by default set to All, which indicates to display both protected and not protected columns.
To only display columns that have an assigned generator, click Protected.
To only display columns that do not have an assigned generator, click Not protected.
Note that when you check At-Risk Column, Structural automatically selects Not protected.
You can filter the columns based on whether they are populated in the destination database. For example, if a table is truncated, then the columns in that table are not populated.
On the Filters panel, under Privacy Settings, the column inclusion filter is by default set to All, which indicates to display both included and not included columns.
To only display columns that are populated in the destination database, click Included.
To only display columns that are not populated in the destination database, click Not included.
Note that when you check At-Risk Column, Structural automatically selects Included.
To only display columns that are assigned specific generators, on the Filters panel, under Applied Generator, check the checkbox for each generator to include.
The list of generators only includes generators that are assigned to the currently displayed columns and that are compatible with other applied filters.
To search for a specific generator, in the Filters search field, begin to type the generator name.
You can filter the columns by the column data type. For example, you can only display varchar
columns, or only columns that contain either numeric or integer values.
To only display columns that have specific data types, on the Filters panel, under Database Data Types, check the checkbox for each data type to include.
The list of data types only includes data types that are present in the currently displayed columns and that are compatible with other applied filters.
To search for a specific data type, in the Filters search field, begin to type the data type.
When the source database schema changes, you might need to update the configuration to reflect those changes. If you do not resolve the schema changes, then the data generation might fail. The data generation fails if there are unresolved conflicting changes, or if you configure Structural to always fail data generation when there are any unresolved changes.
For more information about schema changes, go to Viewing and resolving schema changes.
To only display columns that have unresolved schema changes, on the Filters panel, check Unresolved Schema Changes.
For detected sensitive columns, the sensitivity type indicates the type of data that was detected. Examples of sensitivity types include First Name, Address, and Email.
To only display columns that contain specific sensitivity types, on the Filters panel, under Sensitivity Type, check the checkbox for each sensitivity type to include.
The list of sensitivity types only includes sensitivity types that are present in the currently displayed columns.
To search for a specific sensitivity type, in the Filters search field, type the sensitivity type.
When the Structural sensitivity scan identifies a value as belonging to a sensitivity type, it also determines how confident it is in that determination. The Status column displays the confidence level.
You can filter the columns based on the confidence level.
To only display columns that have a specific confidence level, on the Filters panel, under Sensitivity confidence, check the checkbox next to each confidence level to include.
You can filter the column list based on whether the column is nullable.
On the Filters panel, under Data Attributes, the nullability filter is by default set to All, which indicates to display both nullable and non-nullable columns.
To only display columns that are nullable, click Nullable.
To only display columns that are not nullable, click Non-nullable.
You can filter the column list based on whether the column must be unique.
On the Filters panel, under Data Attributes, the uniqueness filter is by default set to All, which indicates to display both unique and not unique columns.
To only display columns that must be unique, click Unique.
To only display columns that do not require uniqueness, click Not unique.
You can filter the column list to indicate whether to include:
Columns that are not primary or foreign keys.
Columns that are foreign keys.
Columns that are primary keys.
On the Filters panel, under Column Type:
To display columns that are neither a primary key nor a foreign key, check Non-keyed.
To display columns that are primary keys, check Primary key.
To display columns that are foreign keys, check Foreign key.
In a child workspace, to only display columns that override the generator configuration that is in the parent workspace, on the Filters panel, check Overrides Inheritance.
You can enable Structural data encryption, a configuration that allows Structural to:
Decrypt source data before applying the generator
Encrypt generated data before writing it to the destination database
For more information, go to Configuring and using Structural data encryption.
When Structural data encryption is enabled, the generator configuration panel includes an option to use Structural data encryption.
To only display columns that are configured to use Structural data encryption, on the Filters panel, check Uses Data Encryption.
By default, the column list is sorted first by table name, then by column name. The columns for each table display together. Within each table, the columns are in alphabetical order.
You can also sort the column list by column name first, then by table. Columns that have the same name display together. Those columns are sorted by the name of the table.
The button at the right of the Column column heading indicates the current sort order.
T.C indicates that the table is sorted by table, then by column
C.T indicates that the table is sorted by column, then by table
To switch the sort order, click the button.
Table View displays source or preview data for a single table. For a file connector workspace, each table corresponds to a file group.
Required workspace permission:
Source data: Preview source data
Destination data: Preview destination data
If you do not have either of these permissions, then you cannot display Table View.
From Table View, you can:
View information about the column data types and protection status.
To display Table View:
On the workspace management view, click Table View.
On Workspaces view, from the dropdown menu in the Name column, select Table View.
You can also display Table View for a table in Database View. To display Table View, either click the arrow icon for the table, or click a row in the table.
When you display Table View from Database View, it displays the data for the selected table.
When you display Table View from the workspace management view or Workspaces view, it displays the most recently displayed table.
If Table View was never displayed before, then it displays the first table in the workspace. To change the selected table, from the Table dropdown list, select the table to view.
Required license: Enterprise
By default, a child workspace inherits the configuration from the parent workspace. You can override the table mode or column generator.
In a child workspace, each Model entry indicates whether the configuration overrides the parent configuration.
When a column overrides the parent configuration, an Overriding label displays above the column.
To filter Table View to only display columns with overrides, toggle Show Overrides Only to the on position.
On the column configuration or Model entry, to reset the configuration to match the parent workspace, click Reset.
Required workspace permission: Assign table modes
To change the table mode that is assigned to the table:
Click the current table mode.
On the table mode panel, from the Table Mode dropdown list, select the new table mode.
When you change the table mode, Tonic Structural updates the preview data as needed. For example, if you change the table mode to Truncate, then the preview data is empty.
For a child workspace, the table mode selection panel indicates whether the selected table mode is inherited from the parent workspace.
If the child workspace currently overrides the parent workspace configuration, then to reset the table mode to the table that is assigned in the parent workspace, click Reset.
The Model section of Table View displays the configured generators for the table columns.
The header for each Model entry is the column name.
Linked columns share an entry. The heading is a comma-separated list of the linked columns.
Each entry contains the following information:
The column and generator, in the format Column Name >> Generator Name
. For example, First_Name >> Name
indicates that the First_Name
column has the Name generator applied.
For linked columns, there is a Column Name >> Generator Name
entry for each column.
The selected configuration options for the generator.
By default, a child workspace inherits the configuration from its parent workspace. You can also override the configuration. For a child workspace, each Model entry indicates whether the configuration overrides the parent configuration. For configurations that override the parent, to remove the overrides and restore the inheritance, click Reset.
The Model entry also indicates when Tonic data encryption is enabled for the column.
To remove the generator from a column, click the delete icon.
The columns section of Table View displays a sample set of data for the table.
The column heading background color indicates the column's protection status.
Red - At risk - The column is marked as sensitive, but the generator is still Passthrough.
Orange - Protected - The column has an assigned generator other than Passthrough. Protected columns might be either sensitive or not sensitive.
Gray - The column is not sensitive and the generator is Passthrough.
The Preview toggle at the top right of Table View allows you to choose whether to display original source data or the transformed data. You can switch back and forth to understand exactly how Structural transforms the data based on the table and column configuration.
By default, the Preview toggle is in the on position, and the displayed data reflects the selected table mode and the assigned generators. For tables that use Truncate mode, the preview data is empty. Truncated tables do not have data in the destination database.
To display the original source data, toggle Preview to the off position.
You can provide a query to filter the source data. The query is always against the source data, not the preview data, regardless of whether the Preview toggle is off or on.
For example, you configure a first name field to use the Name generator and enable consistency. You can then query the source data for a specific first name value to check that the preview data uses the same destination value for all of those records.
To apply a query to the source data:
Click the query filter icon, located between the table name and the table mode.
On the Table Filter dialog, provide the where clause for the query.
To apply the query, click Apply.
To close the dialog, click Close.
To clear an applied query, on the Table Filter dialog, click Clear.
If no filter is applied, then the query filter icon has a white background.
If a valid filter is applied, then the query filter icon has a gray background.
If the provided where clause is not valid, then the query filter icon has a red background.
In addition to the column name, the column heading identifies primary keys and foreign keys, and indicates the type of data.
Primary key columns are indicated by a gold key icon.
Foreign key columns are indicated by a black key icon.
For other columns, the icon reflects the type of data that is in the column, such as text, numeric values, or datetime values.
To display the configuration panel for a column, click the dropdown icon.
From the configuration panel, you can:
Required workspace permission: Configure column sensitivity
On the column configuration panel, the sensitivity toggle at the top right indicates whether the column is marked as sensitive.
To mark a column as sensitive, toggle the setting to the Sensitive position.
To mark a column as not sensitive, toggle the setting to the Not Sensitive position.
In a child workspace, you cannot configure whether a column is sensitive. A child workspace always inherits the sensitivity designation from its parent workspace.
When you copy a workspace, Structural performs a new sensitivity scan on the copy. It does not copy the sensitivity designations from the original workspace.
Required workspace permission: Configure column generators
On the column configuration panel, from the Generator Type dropdown list, select the generator to assign to the column.
When you select a generator, Structural displays the available configuration options for that generator. For details about the configuration options for each generator, go to the Generator reference.
To remove the selected generator or generator preset, and reset the generator to Passthrough, click the delete icon next to the Generator Type dropdown.
For more information about selecting and configuring generators and generator presets, go to Assigning and configuring generators.
You can also manually indicate that a column is sensitive or not sensitive.
For example, the sensitivity scan might incorrectly identify a column as sensitive. Or a column might contain data that you consider sensitive but that does not match a detected sensitivity type.
When you manually change a column from not sensitive to sensitive, Structural marks the sensitivity detection as full confidence.
For information on how to change whether a column is sensitive:
For Privacy Hub, go to .
For Database View, go to:
For a single column,
For multiple selected columns,
For Table View, go to .
The Structural API also provides .
Structural identifies the following types of sensitive values. These include some information types that are considered by many privacy standards and frameworks such as HIPAA, GDPR, CCPA, and PCI.
For more information about the HIPAA and Safe Harbor information types that Structural detects, see the Tonic.ai guide .
Names
First
Last
Full
Organization
Location
Street address
ZIP
PO Box
City
State and two letter abbreviation
Country
Postal code
GPS coordinates
Contact information
Email address
Phone number
User credentials
Username
Password
Financial information
Credit card number
International bank account number (IBAN)
SWIFT code for bank transfers
Money amount
BTC (Bitcoin) address
Identification
Social Security Number
Passport number
Driver's license number
Birth date
Gender
Biometric identifier, such as finger and voice prints
Full face photographic images and similar images
Medical information
ICD-9 and ICD-10 codes (Used to identify diseases)
Medical record number
Health plan beneficiary number
Admission date
Discharge date
Date of death
Other personal information
Marital status
Accounts and licenses
Account number
Certificate or license number
Network and web location
IP address
IPv6 address
MAC address
Web URL
International Mobile Equipment Identity (IMEI)
Vehicle information
Vehicle identification number (VIN)
License plate number
Generators transform the data in a source database column. You assign the generators to use. Tonic Structural offers a variety of generators to transform different types of data. For the sensitive columns that it detects, Structural also recommends the generator configuration to use.
For Enterprise instances, generator presets allow you to configure custom configurations of generators that you can then assign to columns.
You can also view this .
Structural runs sensitivity scans automatically based on specific events. You can also run manual sensitivity scans on demand.
On a self-hosted instance, sensitivity scans can also run automatically at the same time each day.
Structural automatically runs a sensitivity scan when you:
Create a completely new workspace and connect a data source
Change the data connection details for the source database
Add a file group to a file connector workspace
A child workspace always inherits the sensitivity designations from its parent workspace.
When you copy a workspace, Structural runs a new sensitivity scan on the copy to identify sensitive columns. However, it keeps the sensitivity designation for columns that you specifically marked as sensitive or not sensitive.
In addition to the automatic scans, from Privacy Hub, you can .
On self-hosted instances, Structural can also run scheduled daily sensitivity scans in the background.
The daily scans only run on the 10 workspaces that had the most recent activity. Activity includes:
Data generation jobs
By default, Structural runs the sensitivity scans each day at midnight.
TONIC_ENABLE_SCHEDULED_SENSITIVITY_SCAN
- Boolean to indicate whether to enable the scheduled daily sensitivity scans.
The default value is true
. To disable the scheduled daily scan, set this to false
.
TONIC_SENSITIVITY_SCAN_HOUR
- When scheduled scans are enabled, the hour at which to run the scans. The setting uses the local time zone.
The value is an integer between 0 and 23, where 0 is midnight and 23 is 11:00 PM.
For example, a value of 14 indicates to run the job at 2:00 PM.
The default value is 0.
For improved performance, sensitivity scans can use parallel processing.
For document-based databases such as MongoDB, you use the environment setting TONIC_PII_SCAN_PARALLELISM_DOCUMENTDB
. The default value is 1.
The Structural sensitivity scan uses the following rules and processes to:
Identify sensitive columns
Indicate its confidence that an identified column is sensitive and is of the detected sensitivity type
Note that this process cannot guarantee perfect precision and recall. We strongly recommend that a human reviews the sensitivity scan results and the broader dataset to ensure that nothing sensitive was missed.
This part of the sensitivity scan uses regular expression matching and dictionary lookups. It produces high, medium, or low confidence detections.
When this part of the sensitivity scan determines that a column contains sensitivity data, it:
Marks the column as sensitive
Assigns the sensitivity type to the column
Recommends the generator configuration for the identified sensitivity type. Note that if the recommended generator is not compatible with the column, then Structural discards the recommendation.
Marks the sensitivity detection as high, medium, or low confidence. The confidence level is based on a calculation of how well the column matched the applicable rules.
The sensitivity scan also looks for any columns that match custom sensitivity types that you define in your custom sensitivity rules.
Custom sensitivity rules always produce full confidence detections.
When a column matches a custom sensitivity rule, Structural:
Marks the column as sensitive
Assigns the sensitivity rule name as the sensitivity type
Recommends the generator preset from the sensitivity rule
Marks the sensitivity detection as full confidence
To identify additional sensitive columns that might not be captured by the other parts of the scan, the sensitivity scan uses an artificial intelligence (AI) model. Note that the model is pre-trained. Structural does not use customer data to train the model or send any customer data externally.
This part of the scan produces medium or low confidence detections for built-in entity types.
The model considers the table and column name. If the combination of table and column name is similar in meaning to a sensitivity type that Structural has a recommended generator for, then Structural:
Marks the column as sensitive
Assigns the sensitivity type to the column
Recommends the generator configuration for that sensitivity type
Uses AI to compare the table name and column name combination to the sensitivity type, and produces a semantic similarity score.
Based on the semantic similarity score, marks the sensitivity detection as either medium or low confidence.
To download the log of the most recent sensitivity scan:
On the workspace management view, from the download menu, select Download Sensitivity Scan Log.
On Privacy Hub, click Reports and Logs, then select Scan Log.
The log tracks the progress of the scan.
Each table is assigned a table mode. The table mode determines at a high level how the table is populated in the destination database.
Required workspace permission: Assign table modes
Both Database View and Table View allow you to view and update the selected table mode for a table.
For Database View, go to .
For Table View, go to .
This is the default table mode for new tables.
In this mode, Tonic Structural copies over all of the rows to the destination database.
For columns that have the generator set to Passthrough, Structural copies the original source data to the destination database.
For columns that are assigned a generator other than Passthrough, Structural uses the generator to replace the column data in the destination database.
This mode drops all data for the table in the destination database.
For data connectors other than Spark-based data connectors, the table schema and any constraints associated with the table are included in the destination database.
Any existing data in the destination database is removed. For example, if you change the table mode to Truncate after an initial data generation, the next data generation clears the table data. For Spark-based data connectors, the table is removed.
If you assign Truncate mode to a table that has a foreign key constraint, it fails during data generation. If this is a requirement, contact support@tonic.ai for assistance.
This mode preserves the data in the destination database for this table. It does not add or update any records.
This feature is primarily used for very large tables that don't need to be de-identified during subsequent runs after the data exists in the destination database.
When you assign Preserve Destination mode to a table, Structural locks the generator configuration for the table columns.
The destination database must have the same schema as the source database.
You cannot use Preserve Destination mode when you:
Enable upsert for a workspace.
Write destination data to a container artifact.
Write destination data to an Ephemeral snapshot.
Incremental mode only processes the changes that occurred to the source table since the most recent data generation or other changes in the destination. This can greatly reduce generation time for large tables that don't have a lot of changes.
For Incremental mode to work, the following conditions must be satisfied:
The table must exist in the destination database. Either Structural created the table during data generation, or the table was created and populated in some other way.
A reliable updated date column must be present. When you select Incremental mode for a table, Structural prompts you to select the updated date column to use.
The table must have a primary key.
To maximize performance, we recommend that you have an index on the date updated field.
For tables that use Incremental mode, Structural checks the source database for records that have an updated date that that is greater than the maximum date in that column in the destination database.
When identifying records to update, Structural only checks the updated date. It does not check for other updates. Records where the generator configuration is changed are not updated if they do not meet the updated date requirement.
For the identified records, Structural checks for primary key matches between the source and destination databases, then does one of the following:
If the primary key value exists in the destination database, then Structural overwrites the record in the destination database.
If the primary key value does not exist in the destination database, then Structural adds a new record to the destination database.
This mode currently only updates and adds records. Rows that are deleted from the source database remain in the destination database.
To ensure accurate incremental processing of records, we recommend that you do not directly modify the destination database. A direct modification might cause the maximum updated date in the destination database to be after the date of the last data generation. This could prevent records from being identified for incremental processing.
You cannot use Incremental mode when you:
Enable upsert for a workspace.
Write destination data to a container artifact.
Write destination data to an Ephemeral snapshot
In this mode, Structural generates an arbitrary number of new rows, as specified by the user, using the generators that are assigned to the table columns.
You can use linking and partitioning to create complex relationships between columns.
Structural generates primary and foreign keys that reflect the distribution (1:1 or 1:many) between the tables in the source database.
You cannot use Scale mode when you enable upsert for a workspace.
For the Databricks data connector, the table mode configuration includes an Error on Overwrite setting. The setting indicates whether to return an error when Structural attempts to write data to a destination table that already contains data. The option is not available when you write destination data to Databricks Delta tables.
To return the error, toggle the setting to the on position.
To not return the error, toggle the setting to the off position.
For workspaces that use following data connectors, the table mode configuration for De-Identify mode includes an option to apply a filter to the table:
This option is only available for workspaces that use the following data connectors:
On the table mode configuration panel, you can use the Repartition or Coalesce option to indicate a number of partitions to generate.
By default, the destination database uses the same partitioning as the source database. The partition option is set to Neither.
The Repartition option allows you to provide a specific number of partitions to generate.
To use the Repartition option:
Click Repartition.
In the field, enter the number of partitions.
The Coalesce option allows you to provide a maximum number of partitions to generate. If the source data has fewer partitions than the number you specify, then Structural only generates that number. The Coalesce option should be more efficient than the Repartition option.
To use the Coalesce option:
Click Coalesce.
In the field, enter the number of partitions.
Required license: Enterprise
Required global permission: Create and manage sensitivity rules
Not available on Structural Cloud
By default, when you run a Structural security scan on a workspace, it looks for the .
You can also define custom sensitivity rules to identify other values and the corresponding recommended generator. Your data might include values that are specific to your organization.
Each custom sensitivity rule specifies:
The data type for matching columns
Text matching criteria for the names of matching columns
The recommended generator preset
To display the current list of sensitivity rules, in the Tonic navigation menu, click Sensitivity Rules.
For each rule, the list includes:
The rule name and description
The recommended generator preset
When the rule was most recently modified
You can filter the rule list by the following:
Rule name
Rule description
Generator preset name
Name of the user who most recently updated the rule
In the filter field, start to type text from any of those values. As you type, the list is filtered to only include matching rules.
Note that when the list is filtered, you cannot change the display sequence of the rules.
Structural applies the rules based on their display order in the list.
If a column matches more than one rule, Structural applies the first matching rule.
To change the display order of a rule, drag and drop it to the new location in the list.
Note that you cannot change the rule sequence when the list is filtered.
To create a sensitivity rule:
On the Sensitivity Rules view, click New Custom Rule.
Click Save.
To change the configuration of a sensitivity rule:
On the Sensitivity Rules view, click the edit icon for the rule.
Click Save.
Note that any changes to a sensitivity rule do not take effect until the next sensitivity scan.
In the Name field, type the name of the sensitivity rule. The rule name becomes the sensitivity type for matching columns. The rule name must be unique, and also cannot match the name of a built-in sensitivity type.
Optionally, in the Description field, type a longer description of the sensitivity rule.
From the Data Type dropdown list, select the data type for matching columns. For example, a rule might only be used for columns that contain text.
The available data types are general types that map to specific data types in a given database. The available types are:
Array
Binary
Boolean
Continuous Numerical
Date Range
Datetime
Integer
JSON
MAC Address
Network Address
Text
UUID
XML
Under Column Name Match, provide the criteria to identify matching columns based on the column name.
Note that a matching column must match the data type and the column name criteria.
When you provide a list of text matching conditions, a matching column must match all of the conditions. In other words, the conditions are joined by AND
.
To apply the same generator preset to columns that have completely different names, you must create separate sensitivity rules.
To create a list of text matching conditions:
Click Text Match.
To add a column name condition, click Add String Match.
For each condition:
From the comparison type dropdown list, select the type of comparison. For example, Contains, Starts with, Ends with.
In the comparison text field, provide the text to check for.
The comparison text is case insensitive. For example, if you set a condition to match column names that contain the text term
, it also matches column names that contain TERM
or Term
or tErM
.
To remove a column name condition, click its delete icon.
To use a regular expression to identify matching columns based on the column name:
Click Regular Expression.
In the field, provide the regular expression.
From the Recommended Generator Preset dropdown list, select the generator preset that is the recommended generator for matching columns.
To search for a specific preset, begin to type the generator preset name.
Required global permission: Create and manage generator presets
When you configure a sensitivity rule, you can also create a new generator preset or update the configuration of the selected generator preset.
To create a new generator preset, click Create Preset. On the generator preset details panel, provide the generator preset configuration, then click Create.
To edit the selected generator preset, click Edit Current Preset. On the generator preset details panel, update the generator preset configuration, then click Save and Apply.
If you have access to a workspace, then you can use the workspace to preview the sensitivity rule results.
Under Test Results, from the workspace dropdown list, select the workspace to use.
Structural searches the workspace schema for matching columns based on the sensitivity rule configuration.
It displays any matching columns. You can filter the matching columns based on the table or column name.
For each matching column, the list includes:
The column name and table
A sample value from the source data. To see the sample source value, you must have the Preview source data permission for the workspace.
A sample replacement value, based on the selected generator preset for the sensitivity rule. To see the sample replacement value, you must have the Preview destination data permission for the workspace.
To delete a sensitivity rule, on the Sensitivity Rules view, click the delete icon for the rule.
Note that existing generator recommendations for the rule remain in place until the next sensitivity scan.
User-initiated updates that are included in the
To enable and configure the daily sensitivity scans, use the following . You can add these settings to the Environment Settings list on Structural Settings.
For relational databases such as PostgreSQL and SQL Server, to configure parallel processing, you use the TONIC_PII_SCAN_PARALLELISM_RDBMS
. The default value is 4.
Recommend generators for those columns. For information about applying recommended generators to columns, go to .
To identify that a column contains sensitive information for a , Structural looks at the data type, column name, and column values.
Custom sensitivity rules are based on the column data type and column name. For more information about custom sensitivity rules, go to .
For Spark-based data connectors (, , ), the table is ignored completely.
For the , file groups are treated as tables. When a file group is assigned Truncate mode, the data generation process ignores the files that are in that file group.
When is enabled, the Truncate table mode does not actually truncate the destination table. Instead, it works more like Preserve Destination table mode, which preserves existing records in the destination table.
Incremental mode is currently supported on PostgreSQL, MySQL, and SQL Server. If you want to use this table mode with another database type, contact .
Table filters provide a way to generate a smaller set of data when a data connector does not support subsetting. For more information, go to .
On the Create Custom Rule view, .
On the Edit Custom Rule view, .
For more information about generator preset configuration, go to .
Run the Structural sensitivity scan
Run, configure, and get the results of the sensitivity scan
Set column sensitivity manually
Options to override the sensitivity scan determination of sensitivity
Built-in sensitivity types
Types of sensitive data that the sensitivity scan can identify
Configure custom sensitivity rules
Set up rules to enable the scan to identify other sensitive columns based on the column data types and names
This generator reference provides the details for each of the the supported generators in Tonic Structural.
The generators are in alphabetical order by the generator name.
Here are some groupings to help to identify generators that are used for different types of values. Generator hints and tips also provides some suggestions for generators to use for specific uses cases.
Generates unique alphanumeric strings of the same length as the input.
For example, for the origin value ABC123
, the output value is a six-character alphanumeric string such as D24N05
.
To configure the generator, toggle the Consistency setting to indicate whether to make the generator self-consistent.
By default, the generator is not consistent.
If Structural data encryption is enabled, then to use it for this column, toggle Use data encryption process to the on position.
A version of the Character Scramble generator that can be used for array values.
This generator replaces letters with random other letters, and numbers with random other numbers. Punctuation and whitespace are preserved.
For example, for the following array value:
["ABC.123", 3, "last week"]
The output might be something like:
["KFR.860", 7, "sdrw mwoc"]
This generator securely masks letters and numbers. There is no way to recover the original data.
To configure the generator, toggle the Consistency setting to indicate whether to make the generator self-consistent.
By default, the generator is not consistent.
If Structural data encryption is enabled, then to use it for this column, toggle Use data encryption process to the on position.
The algebraic generator identifies the algebraic relationship between three or more numeric values and generates new values to match. At least one of the values must be a non-integer.
If a relationship cannot be found, then the generator defaults to the Categorical generator.
This generator can be linked with other Algebraic generators.
To configure the generator, from the Link To dropdown list, select the columns to link this column to. You can select other columns that are assigned the Algebraic generator.
You must select at least three columns.
The column values must be numeric. At least one of the columns must contain a value other than an integer.
If Structural data encryption is enabled, then to use it for this column, toggle Use data encryption process to the on position.
The following table summarizes the available generators. The table includes generator characteristics that you might take into account when you select the generator to use for a column.
Generator hints and tips also provides some suggestions for generators to use for specific use cases.
Generator | Description | Supported features |
---|---|---|
Generates a random address-like string.
You can indicate which part of an address string that the column contains. For example, the column might contain only the street address or the city, or it might contain the full address.
To configure the generator:
From the Link To dropdown list, select the columns to link this column to. You can link columns that use the Address generator to mask one of the following address components:
City
City State
Country
Country Code
State
State Abbreviation
Zip Code
Latitude
Longitude
Note that when linked to another address column, a country or country code is always the United States.
From the address component dropdown list, select the address component that this column contains. The available options are:
Building Number
Cardinal Direction (North, South, East, West)
City
City Prefix (Examples: North, South, East, West, Port, New)
City Suffix (Examples: land, ville, furt, town)
City with State (Example: Spokane, Washington)
City with State Abbr (Example: Houston, TX)
Country (Examples: Spain, Canada)
Country Code (Uses the 2-character country code. Examples: ES, CA)
County
Direction (Examples: North, Northeast, Southwest, East)
Full Address
Latitude (Examples: 33.51, 41.32)
Longitude (Examples: -84.05, -74.21)
Ordinal Direction (Examples: Northeast, Southwest)
Secondary Address (Examples: Apt 123, Suite 530)
State (Examples: Alabama, Wisconsin)
State Abbr (Examples: AL, WI)
Street Address (Example: 123 Main Street)
Street Name (Examples: Broad, Elm)
Street Suffix (Examples: Way, Hill, Drive)
US Address
US Address with Country
Zip Code (Example: 12345)
Toggle the Consistency setting to indicate whether to make the column consistent. By default, the consistency is disabled.
If consistency is enabled, then by default, the generator is self-consistent. To make the generator consistent with another column, from the Consistent to dropdown list, select the column. When the Address generator is consistent with itself, then the same value in the source database is always mapped to the same destination value. For example, for a column that contains a state name, Alabama is always mapped to Illinois. When the Address generator is consistent with another column, then the same value in the other column always results in the same destination value for the address column. For example, if the address column is consistent with a name column, then every instance of John Smith in the name column in the source database has the same address value in the destination database.
If Structural data encryption is enabled, then to use it for this column, toggle Use data encryption process to the on position.
For the Address generator, Spark workspaces (Amazon EMR, Databricks, and self-managed Spark clusters) only support the following address parts:
Building Number
City
Country
Country Code
Full Address
Latitude
Longitude
State
State Abbr
Street Address
Street Name
Street Suffix
US Address
US Address with Country
Zip Code
Generates a random company name-like string.
To configure the generator, toggle the Consistency setting to indicate whether to make the generator consistent.
By default, the generator is not consistent.
If consistency is enabled, then by default it is self-consistent. To make the generator consistent with another column, from the Consistent to dropdown list, select the column.
When the generator is consistent with itself, then a given source value is always mapped to the same destination value. For example, My Business is always mapped to New Business.
When the generator is consistent with another column, then a given source value in that other column always results in the same destination value for the company name column. For example, if the company name column is consistent with a name column, then every instance of John Smith in the name column in the source database has the same company name in the destination database.
If is enabled, then to use it for this column, toggle Use data encryption process to the on position.
Consistency
Yes, can be made self-consistent.
Linking
No, cannot be linked.
Differential privacy
No
Data-free
No
Allowed for primary keys
Yes
Allowed for unique columns
Yes
Uses format-preserving encryption (FPE)
Yes
Privacy ranking
3 if not consistent
4 if consistent
Generator ID (for the API)
Consistency
Yes, can be made self-consistent.
Linking
No, cannot be linked.
Differential privacy
No
Data-free
No
Allowed for primary keys
No
Allowed for unique columns
No
Uses format-preserving encryption (FPE)
No
Privacy ranking
3 if not consistent
4 if consistent
Generator ID (for the API)
Consistency
No, cannot be made consistent.
Linking
Yes, can be linked.
Differential privacy
No
Data-free
No
Allowed for primary keys
No
Allowed for unique columns
No
Uses format-preserving encryption (FPE)
No
Privacy ranking
3
Generator ID (for the API)
Address API: AddressGenerator
Generates replacement values for U.S. mailing addresses. You select the address component or format for the replacement values. For example, the column might only contain a street address or a postal code, or it might contain a full address.
Consistency - Self and other Linkable Differential privacy if not consistent Data-free if not consistent Privacy ranking: - 1 if not consistent - 4 if consistent
Identifies the algebraic relationship between 3 or more numeric values, including at least one non-integer. Based on the relationship, generates new values to match. If there is no relationship, uses the Categorical generator.
Linkable - linking is required Privacy ranking: 3
Generates unique alphanumeric strings of the same length as the input.
For example, for the origin value ABC123
, the output value is a six-character alphanumeric string such as D24N05
.
Consistency - Self only Primary key generator Unique columns allowed Format-preserving encryption (FPE) Privacy ranking: - 3 if not consistent - 4 if consistent
Within an array, replaces letters with random other letters, and numbers with random other numbers. Preserves punctuation and whitespace.
Consistency - Self only Privacy ranking: - 3 if not consistent - 4 if consistent
Used to transform array values in JSON.
To identify values to transform, you provide a list of JSONPaths. For each JSONPath, you assign a sub-generator to apply to matching values.
Composite generator. Feature support is based on the sub-generators. Privacy ranking: 5
Used to transform values in an array. To identify values to transform, you provide a regular expression. For each capture group in an expression, you assign a sub-generator to apply to matching values.
Composite generator. Feature support is based on the sub-generators. Privacy ranking: 5
Generates unique alpha-numeric strings based on any printable ASCII characters. You can optionally exclude lowercase letters from the generated values. The replacement value does not preserve the length of the original value.
Consistency - Self only Primary key generator Unique columns allowed Format-preserving encryption (FPE) Privacy ranking: - 3 if not consistent - 4 if consistent
Generates a random company name-like string.
Consistency - Self or other Differential privacy if not consistent Data-free if not consistent Privacy ranking: - 1 if not consistent - 4 if consistent
Shuffles the original values for a column to different rows. Maintains the overall frequency of each value.
For example, a column contains the values Small
(3 times), Medium
(4 times), and Large (5 times).
In the transformed data, each value appears the same number of times, but the values are shuffled to different rows.
Linkable Differential privacy is configurable Privacy ranking: - 2 with differential privacy - 3 without differential privacy
Replaces letters with random other letters and numbers with random other numbers. Preserves punctuation, whitespace, and mathematical symbols.
Consistency - Self only Privacy ranking: - 3 if not consistent - 4 if consistent
Replaces characters with other random characters. Preserves punctuation, capitalization, and whitepace.
A replacement character is always from within the same Unicode Block as the source character.
A source character is always mapped to the same destination character. For example, M
might always map to V
.
Always self-consistent Unique columns allowed Privacy ranking: 4
Company Name (Deprecated) API: CompanyNameGenerator
This generator is deprecated. Use the Business Name generator instead. Generates a random company name-like string.
Consistency - Self or other Differential privacy if not consistent Data-free if not consistent Privacy ranking: - 1 if not consistent - 4 if consistent
Applies different generators to rows conditionally based on the column value. For example, apply the Character Scramble generator for values other than Test. You configure a list of conditions. Each condition performs a check against the column value. For each condition, you assign a sub-generator to apply to matching values.
Unique columns allowed Composite generator. Other feature support is based on the sub-generators. Privacy ranking: If a fallback generator is selected, then the lower of 5 or the fallback generator. 5 if no fallback generator is selected.
Uses a single specified value to replace all of the values in the column. The replacement value must be compatible with the column data type.
Differential privacy Data-free Privacy ranking: 1
Generates a continuous distribution to fit the underlying data. Can link to other columns to create multivariate distributions. Can also be partitioned by other columns.
Linkable Differential privacy is configurable Privacy ranking: - 2 with differential privacy - 3 without differential privacy
Populates the column using the sum of values from a column in another table. To select the rows to use, uses a foreign key value that matches the primary key value for the current row. For example, to transform the Total_Sales column in the Customers table, from the Transactions table, use the sum of the Amount values for rows where the Customer_ID value matches the primary key value for the current customer.
Privacy ranking: 3
CSV Mask API: CsvMaskGenerator
Used to mask text in a delimited format.
Parses the text as a row where the columns are delimited by a specified character. For each index, you assign a sub-generator to apply to the index value.
Composite generator. Feature support is based on the sub-generators. Privacy ranking: 5
Replaces the original column value with a value from list of values that you provide.
Consistency - Self and other Linkable Differential privacy if not consistent Data-free if not consistent Privacy ranking: - 1 if not consistent - 4 if consistent
Truncates dates or timestamps to a specific date or time component. For example, you might truncate a date value to the month or a timestamp to the hour.
Privacy ranking: 5
Email API: EmailGenerator
Scrambles characters in an email address.
Preserves the formatting and keeps the @
and .
.
You can identify specific email domains to not scramble.
Consistency - Self only Privacy ranking: - 3 if not consistent - 4 if consistent
Generates timestamps that fit an event distribution. You can link columns to create a sequence of events across multiple columns. You can also partition the generator by other columns.
Linkable Privacy ranking: 3
Scrambles characters in a file name.
Preserves the formatting and the file extension.
Consistency - Self only Privacy ranking: - 3 if not consistent - 4 if consistent
Replaces all instances of the find string with the replace string. For the find string, you can optionally provide a regular expression.
Privacy ranking: 5
FNR API: FnrGenerator
Transforms Norwegian national identity numbers. You can optionally preserve the gender and birthdate portions of the identifier values.
Consistency - Self and other Unique columns allowed Privacy ranking - 3 if not consistent - 4 if consistent
Geo API: GeoGenerator
Used to transform columns that contain latitude and longitude values.
Linkable Unique columns allowed Privacy ranking: 3
Can be used to generate cities, states, zip codes, and latitude/longitude values that follow HIPAA guidelines for safe harbor.
Consistency - Self only Privacy ranking: - 3 if not consistent - 4 if consistent
Generates random host names, based on the English language.
Consistency - Self and other Differential privacy if not consistent Data-free if not consistent Privacy ranking: - 1 if not consistent - 4 if consistent
Used to transform values in an HStore column in a PostgreSQL database. You specify a list of keys for which to transform the values. For each key, you assign a generator to apply to the key value.
Composite generator. Feature support is based on the sub-generators. Privacy ranking: 5
Used to transform columns that contain HTML content. To identify the values to transform, you provide a list of path expressions. For each path expression, you assign a generator to apply to the matching value.
Composite generator. Feature support is based on the sub-generators. Privacy ranking: 5
Generates unique integer values.
By default, the generated values are within the range of the column’s data type.
You can also specify a range for the generated values. The source values must be within that range.
Differential privacy if not consistent Data-free if not consistent Primary key generator Unique columns allowed Format-preserving encryption (FPE) Privacy ranking: - 1 if not consistent - 4 if consistent
For Canadian mailing addresses, can generate:
Street name
Postal code
For United Kingdom (UK) mailing addresses, can generate postal codes.
Consistency - Self only Differential privacy if not consistent Data-free if not consistent Privacy ranking: - 1 if not consistent - 4 if consistent
Generates a random IP address-formatted string. You specify the percentage of IPv4 addresses. The remaining addresses are IPv6.
Consistency - Self or other Differential privacy if not consistent Data-free if not consistent Privacy ranking: - 1 if not consistent - 4 if consistent
Used to transform values in JSON columns. To identify values to transform, you provide a list of JSONPaths.
For each JSONPath, you assign a sub-generator to apply to matching values.
Composite generator. Feature support is based on the sub-generators. Privacy ranking: 5
Generates a random MAC address formatted string.
Consistency - Self only Differential privacy if not consistent Data-free if not consistent Format-preserving encryption (FPE) Privacy ranking: - 1 if not consistent - 4 if consistent
Generates unique MongoDB objectId values. Can be assigned to text columns that contain MongoDB ObjectId values. The column value must be 12 bytes long.
Consistency - Self only Privacy ranking: - 3 if not consistent - 4 if consistent
Name API: NameGenerator
Generates a random name string from a dictionary of first and last names. You specify the name format. For example, a column might contain only a first name, or a full name that is last name first.
Consistency - Self or other Differential privacy if not consistent Data-free if not consistent Privacy ranking: - 1 if not consistent - 4 if consistent
Masks values in numeric columns.
Either adds or multiplies the original value by random noise.
Consistency - Self or other Privacy ranking: - 3 if not consistent - 4 if consistent
Null API: NullGenerator
Replaces all of the column values with NULL
values.
Differential privacy Data-free Unique columns allowed Privacy ranking: 1
Generates unique numeric strings of the same length as the input numeric string.
Consistency - Self only Primary key generator Unique columns allowed Format-preserving encryption (FPE) Privacy ranking: - 3 if not consistent - 4 if consistent
Default generator. Does not perform any transformation on the source data.
Unique columns allowed Privacy ranking: 6
Generates a random phone number that matches the country or region and format of the input phone number. For invalid phone numbers, either replaces individual numbers or generates a valid replacement number.
Consistency - Self only Privacy ranking: 3
Generates a random boolean value. You specify the percentage of true values. The remaining values are false.
Differential privacy Data-free Privacy ranking: 1
Generates a random double number that is between the specified minimum (inclusive) and maximum (exclusive) values.
Differential privacy Data-free Privacy ranking: 1
Generates a random hash string.
Differential privacy Data-free Privacy ranking: 1
Returns a random integer that is between the specified minimum (inclusive) and maximum (exclusive) values.
Differential privacy Data-free Privacy ranking: 1
Generates random dates, times, and timestamps that fall within a specified range.
Differential privacy Data-free Privacy ranking: 1
Random UUID API: UUIDGenerator
Generates a random new UUID string.
Differential privacy Data-free Unique columns allowed Privacy ranking: 1
To identify values to transform, you provide a regular expression.
For each capture group in an expression, you assign a sub-generator to apply to matching values.
Unique columns allowed Composite generator. Other feature support is based on the sub-generators. Privacy ranking: 5
Generates a column of unique integer values that start with specified value, and then increment by 1 for each processed row.
Linkable Unique columns allowed Privacy ranking: 3
Generates values of ISO 6346 compliant shipping container codes. The codes are all in the freight ("U") category.
Consistency - Self or other Differential privacy if not consistent Data-free if not consistent Privacy ranking: - 1 if not consistent - 4 if consistent
SIN API: SINGenerator
Generates a new valid Canadian Social Insurance Number. Preserves the formatting from the original value.
Consistency - Self only Data-free if not consistent Unique columns allowed Format-preserving encryption (FPE) Privacy ranking: - 1 if not consistent - 4 if consistent
SSN API: SsnGenerator
Generates a new valid United States Social Security Number. For numeric columns, the dashes (xxx-xx-xxxx) are always excluded. Otherwise, you can specify the percentage of values for which to include the dashes.
Consistency - Self or other Differential privacy if not consistent Data-free if not consistent Privacy ranking: - 1 if not consistent - 4 if consistent
Used to transform StructFields within a StructType in Spark databases (Databricks and Amazon EMR). To identify the StructField value to transform, you provide a path expression. For each path expression, you assign a sub-generator to apply to the matching values.
Composite generator. Feature support is based on the sub-generators. Privacy ranking: 5
Shifts timestamps by a random amount of a specific unit of time, within a set range. The range can start before the original value.
Consistency - Self or other Privacy ranking: - 3 if not consistent - 4 if consistent
Generates unique email addresses.
Replaces the username with a randomly generated GUID, and masks the domain with a character scramble.
Consistency - Self only Unique columns allowed Privacy ranking: - 3 if not consistent - 4 if consistent
URL API: UrlGenerator
Used to transform URLs. Preserves the formatting. Keeps the URL scheme and top-level domain intact.
Unique columns allowed Privacy ranking: 3
UUID Key API: UuidPkGenerator
Generates UUIDs.
Consistency - Self only Primary key generator Unique columns allowed Format-preserving encryption (FPE) Privacy ranking: - 3 if not consistent - 4 if consistent
XML Mask API: XmlMaskGenerator
Used to transform values in XML columns. To identify the values to transform, you provide XPaths. For each XPath, you assign a sub-generator to apply to the matching values.
Composite generator. Feature support is based on the sub-generators. Privacy ranking: 5
Consistency
Yes, can be made self-consistent or consistent with another column.
Linking
Yes, can be linked.
Differential privacy
Yes, if consistency is not enabled.
Data-free
Yes, if consistency is not enabled.
Allowed for primary keys
No
Allowed for unique columns
No
Uses format-preserving encryption (FPE)
No
Privacy ranking
1 if not consistent
4 if consistent
Generator ID (for the API)
Consistency | Yes, can be made self-consistent or consistent with another column. |
Linking | No, cannot be linked. |
Differential privacy | Yes, if consistency is not enabled. |
Data-free | Yes, if consistency is not enabled. |
Allowed for primary keys | No |
Allowed for unique columns | No |
Uses format-preserving encryption (FPE) | No |
Privacy ranking |
|
Generator ID (for the API) |
Generates unique alpha-numeric strings based on any printable ASCII characters. The length of the source string is not preserved. You can choose to exclude lowercase letters from the generated values.
To configure the generator:
To exclude lowercase letters from the generated values, toggle Exclude Lowercase Alphabet to the on position.
Toggle the Consistency setting to indicate whether to make the generator consistent. By default, the generator is not consistent.
If Structural data encryption is enabled, then to use it for this column, toggle Use data encryption process to the on position.
This generator replaces letters with random other letters and numbers with random other numbers. Punctuation, whitespace, and mathematical symbols are preserved.
For example, for the following input string:
ABC.123 123-456-789 Go!
The output would be something like:
PRX.804 296-915-378 Ab!
This generator securely masks letters and numbers. There is no way to recover the original data.
Character Scramble is similar to Character Substitution, with a couple of key differences.
While you can enable consistency for the entire value, Character Scramble does not always replace the same source character with the same destination character. Because there is no guarantee of unique values, you cannot use Character Scramble on unique columns.
Character Substitution, however, does always map the same source character to the same destination character. Character Substitution is always consistent, which makes it less secure than Character Scramble. You can use Character Substitution on unique columns.
To configure the generator, toggle the Consistency setting to indicate whether to make the generator self-consistent.
By default, the generator is not consistent.
If Structural data encryption is enabled, then to use it for this column, toggle Use data encryption process to the on position.
The Categorical generator shuffles the existing values within a field while maintaining the overall frequency of the values. It disassociates the values from other pieces of data. Note that NULL is considered a separate value.
For example, a column contains the values Small
, Medium
, and Large
. Small
appears 3 times, Medium
appears 4 times, and Large
appears 5 times. In the output data, each value still appears the same number of times, but the values are shuffled to different rows.
This generator is optimized for categories with fewer than 10,000 unique values. If your underlying data has more unique values (for example, your field is populated by freeform text entry), we recommend that you use the Character Scramble or Custom Categorical generator.
To configure the generator:
From the Link To dropdown, select the columns to link to the current column. You can select from other columns that use the Categorical generator.
Toggle the Differential Privacy setting to indicate whether to make the output data differentially private. By default, differential privacy is disabled.
If Structural data encryption is enabled, then to use it for this column, toggle Use data encryption process to the on position.
This is a composite generator.
A version of the Regex Mask generator that can be used for array values.
Uses regular expressions to parse strings and replace specified substrings with the output of specified generators. The parts of the string to replace are specified inside unnamed top-level capture groups.
To add a regular expression:
Click Add Regex. On the configuration panel, Cell Value shows a sample value from the source database. You can use the previous and next options to navigate through the values.
By default, Replace all matches is enabled. To only match the first occurrence of a pattern, toggle Replace all matches to the off position.
In the Pattern field, enter a regular expression. If the expression is valid, then Structural displays the capture groups for the expression.
For each capture group, to select and configure the generator to apply, click the selected generator. You cannot select another composite generator.
To save the configuration and immediately add a generator for another path expression, click Save and Add Another. To save the configuration and close the add generator panel, click Save.
From the Regexes list:
To edit a regex, click the edit icon.
To remove a regex, click the delete icon.
This is a composite generator.
A version of the JSON Mask generator that can be used for array values.
Runs a selected generator on values that match a user-specified JSONPath.
To assign a generator to a path expression:
Under Sub-generators, click Add Generator. On the sub-generator configuration panel, the Cell JSON field contains a sample value from the source database. You can use the previous and next icons to page through different values.
In the Path Expression field, type the JSONPath expression to identify the value to apply the generator to. To populate a path expression, you can also click a value in the Cell JSON field. Matched JSON Values shows the result from the value in Cell JSON.
By default, the selected generator is applied to any value that matches the expression. To limit the types of values to apply the generator to, from the Type Filter, specify the applicable types. You can select Any, or you can select any combination of String, Number, and Null.
From the Generator Configuration dropdown list, select the generator to apply to the path expression. You cannot select another composite generator.
Configure the selected generator. You cannot configure the selected generator to be consistent with another column.
To save the configuration and immediately add a generator for another path expression, click Save and Add Another. To save the configuration and close the add generator panel, click Save.
From the Sub-Generators list:
To edit a generator assignment, click the edit icon.
To remove a generator assignment, click the delete icon.
To move a generator assignment up or down in the list, click the up or down arrow.
If Structural data encryption is enabled, then to use it for this column, toggle Use data encryption process to the on position.
Performs a random character replacement that preserves formatting (spaces, capitalization, and punctuation).
Characters are replaced with other characters from within the same Unicode Block. A given source character is always mapped to the same destination character. For example, M
might always map to V
.
For example, for the following input string:
Miami Store #162
The output would be something like:
Vgkjg Gmlvf #681
Note that for a numeric column, when a generated number starts with a 0, the starting 0 is removed. This could result in matching output values in different columns. For example, one column is changed to 113 and the other to 0113, which also becomes 113.
Character Substitution is similar to , with a couple of key differences. Because Character Substitution always maps the same source character to the same destination character, it is always consistent. It also can be used for unique columns.
In Character Scramble, the character mapping is random, which makes Character Scramble slightly more secure. However, Character Scramble cannot be used for unique columns.
Generates a continuous distribution to fit the underlying data.
This generator can be linked to other Continuous generators to create multivariate distributions and can be partitioned by other columns.
To configure the generator:
From the Link To drop-down list, select the other Continuous generator columns to link to. The linking creates a multivariate distribution.
From the Partition By drop-down list, select one or more columns to use to partition the data. The selected columns must have the generator set to either Passthrough or Categorical. For more information about partitioning and how it works, go to .
Toggle the Differential Privacy setting to indicate whether to make the output data differentially private. By default, the generator is not differentially private.
If is enabled, then to use it for this column, toggle Use data encryption process to the on position.
Uses a single value to mask all of the values in the column.
For example, you can replace every value in a string column with the String1
. Or you can replace every value in a numeric column with the value 12345
.
To configure the generator, in the Constant Value field, provide the value to use.
The value must be compatible with the field type. For example, you cannot provide a string value for an integer column.
If is enabled, then to use it for this column, toggle Use data encryption process to the on position.
This generator is deprecated. Use the generator instead.
Generates a random company name-like string.
To configure the generator, toggle the Consistency setting to indicate whether to make the generator consistent.
By default, the generator is not consistent.
If consistency is enabled, then by default it is self-consistent. To make the generator consistent with another column, from the Consistent to dropdown list, select the column.
When the generator is consistent with itself, then a given source value is always mapped to the same destination value. For example, My Company is always mapped to New Company.
When the generator is consistent with another column, then a given source value in that other column always results in the same destination value for the company name column. For example, if the company name column is consistent with a name column, then every instance of John Smith in the name column in the source database has the same company name in the destination database.
Links columns in two tables. This column value is the sum of the values in a column in another table.
This generator does not provide a preview. The sums are not computed until the other table is generated.
For example, a Customers table contains a Total_Sales column. The Transactions table uses a foreign key Customer_ID column to identify the customer who made the transaction, and an Amount column that contains the amount of the sale. The Customer_ID value in the Transactions table is a value from the ID primary key column in the Customers table.
You assign the Cross Table Sum generator to the Total_Sales column. In the generator configuration, you indicate that the value is the sum of the Amount values for the Customer_ID value that matches the primary key ID value for the current row.
For the Customers row for ID 123
, the Total_Sales column contains the sum of the Amount column for Transactions rows where Customer_ID is 123
.
To configure the generator:
From the Foreign Table dropdown list, select the table that contains the column for which to sum the values.
From the Foreign Key dropdown list, select the foreign key. The foreign key identifies the row from the current table that is referred to in the foreign table.
From the Sum Over dropdown list, select the column for which to sum the values.
From the Primary Key dropdown list, select the primary key for the current table.
This is a .
Applies different generators to the value conditionally based on any value in the table.
For example, a Users table contains Name, Username, and Role columns. For the Username column, you can use a conditional generator to indicate that if the value of Role is something other than Test, then use the Character Scramble generator for the Username value. For Test users, the name is not masked.
The generator consists of a list of options. Each option includes the required conditions and the generator to use if those conditions are met.
The generator always contains a Default option. The Default option is used if the value does not meet any of the conditions. To configure the Default option:
From the Default dropdown list, select the generator to use by default.
Configure the selected generator.
To add a condition option:
Click + Conditional Generator.
To add a condition:
Click + Condition.
From the column list, select the column for which to check the value.
Select the comparison type.
Enter the column value to check for.
To remove a condition, click the delete icon for the condition.
From the Generator dropdown list, select the generator to run on the current column if the conditions are met. You cannot select another composite generator.
Choose the configuration options for the selected generator.
To view details for and edit a condition option, click the expand icon for that option.
To remove a condition option, click the delete icon for the option.
If is enabled, then to use it for this column, toggle Use data encryption process to the on position.
If is enabled, then to use it for this column, toggle Use data encryption process to the on position.
If is enabled, then to use it for this column, toggle Use data encryption process to the on position.
Consistency
Yes, can be made self-consistent.
Linking
No, cannot be linked.
Differential privacy
No
Data-free
No
Allowed for primary keys
Yes
Allowed for unique columns
Yes
Uses format-preserving encryption (FPE)
Yes
Privacy ranking
3 if not consistent
4 if consistent
Generator ID (for the API)
Consistency
Yes, can be made self-consistent
Linking
No, cannot be linked
Differential privacy
No
Data-free
No
Allowed for primary keys
No
Allowed for unique columns
No
Uses format-preserving encryption (FPE)
No
Privacy ranking
3 if not consistent
4 if consistent
Generator ID (for the API)
Consistency
No, cannot be made consistent.
Linking
Yes, can be linked.
Differential privacy
Configurable
Data-free
No
Allowed for primary keys
No
Allowed for unique columns
No
Uses format-preserving encryption (FPE)
No
Privacy ranking
2 if differential privacy enabled
3 if differential privacy not enabled
Generator ID (for the API)
Consistency
Determined by the selected sub-generators.
Linking
Determined by the selected sub-generators.
Differential privacy
Determined by the selected sub-generators.
Data-free
Determined by the selected sub-generators.
Allowed for primary keys
No
Allowed for unique columns
No
Uses format-preserving encryption (FPE)
No
Privacy ranking
5
Generator ID (for the API)
Consistency
Determined by the specified sub-generators.
Linking
Determined by the specified sub-generators.
Differential privacy
Determined by the specified sub-generators.
Data-free
Determined by the specified sub-generators.
Allowed for primary keys
No
Allowed for unique columns
No
Uses format-preserving encryption (FPE)
No
Privacy ranking
5
Generator ID (for the API)
Consistency | This generator is implicitly self-consistent. You do not specify whether the generator is consistent. Every occurrence of a character always maps to the same substitute character. Because of this, it can be used to preserve a join between two text columns, such as a join on a name or email. |
Linking | No, cannot be linked. |
Differential privacy | No |
Data-free | No |
Allowed for primary keys | No |
Allowed for unique columns | Yes |
Uses format-preserving encryption (FPE) | No |
Privacy ranking | 4 |
Generator ID (for the API) |
Consistency | No, cannot be made consistent. |
Linking | Yes, can be linked. |
Differential privacy | Configurable |
Data-free | No |
Allowed for primary keys | No |
Allowed for unique columns | No |
Uses format-preserving encryption (FPE) | No |
Privacy ranking |
|
Generator ID (for the API) |
Consistency | Yes, can be made self-consistent or consistent with another column. |
Linking | No, cannot be linked. |
Differential privacy | Yes, if consistency is not enabled. |
Data-free | Yes, if consistency is not enabled. |
Allowed for primary keys | No |
Allowed for unique columns | No |
Uses format-preserving encryption (FPE) | No |
Privacy ranking |
|
Generator ID (for the API) |
Consistency | No, cannot be made consistent. |
Linking | No, cannot be linked. |
Differential privacy | No |
Data-free | No |
Allowed for primary keys | No |
Allowed for unique columns | No |
Uses format-preserving encryption (FPE) | No |
Privacy ranking | 3 |
Generator ID (for the API) |
Consistency | Determined by the selected generators. |
Linking | Determined by the selected generators. |
Differential privacy | Determined by the selected generators. |
Data-free | Determined by the selected generators. |
Allowed for primary keys | No |
Allowed for unique columns | Yes |
Uses format-preserving encryption (FPE) | No |
Privacy ranking |
|
Generator ID (for the API) |
This generator can be used to generate cities, states, and zip codes that follow HIPAA guidelines for safe harbor.
How the HIPAA Address generator handles zip codes is based on whether the Replace zeros in truncated Zip Code toggle in the generator configuration is off or on.
By default, the setting is off. In this case, the last two digits of the zip code in the column are replaced with zeros, unless the zip code is a low population area as designated by the current census. For a low population area, all of the digits in the zip code are replaced with zeros.
If the setting is on, then the generator selects a real zip code that starts with the same three digits as the original zip code. For a low population area, if a state is linked, then the generator selects a random zip code from within that state. Otherwise the generator selects a random zip code from the United States.
When a zip code column is not linked, a random city is chosen in the United States. When a zip code is already added to the link, a city is chosen at random that has at least some overlap with the zip code.
If the original zip code is designated as a low population area then a random city is chosen within the state, this is done only if the user has linked a State column. If they have not, a random city within the United States is chosen.
For example, if the original city and zip code were (Atlanta, 30305), the zip code would be replaced with 30300. There are many cities that contain zip codes beginning in 303 such as Atlanta, Decatur, Chamblee, Hapeville, Dunwoody, College Park, etc.). One of these cities is chosen at random so that our final value is (Chamblee, 30300), for example.
HIPAA guidelines allow for information at the state level to be kept. Therefore, these values are passed through.
GPS coordinates are randomly generated in descending order of dependence of the linked HIPAA address components:
If a zip code is linked, a random point within the same 3-digit zip code prefix is generated, if the 3-digit zip code prefix is not designated a low population area. If it is a low population area, use the linked state.
If a state is available and a zip code and city are not, or the zip code or city are in a 3-digit zip code prefix that is designated a low population area, then a random GPS coordinate is generated somewhere within the state.
If no zip code, city, or state is linked, or one or more of them were provided, but there was a problem generating a random GPS coordinate within the linked areas, then a GPS coordinate is generated at a random location within the United States.
Note: If the city component of the HIPAA address is linked with latitude and/or longitude, the GPS coordinate components are randomly generated independently of the city.
All other address parts are generated randomly. The output value is not influenced at all by the underlying value in the column.
To configure the generator:
From the Link To dropdown list, select the other columns to link to. You can only select columns that are also assigned the HIPAA Address generator.
From the address part dropdown list, select the type of address value that is in the column.
Toggle the Replace zeros in truncated Zip Code setting how to generate zip codes. If the setting is off, then the last two digits are replaced with zero. For low population areas, the entire zip code is populated with zeroes. If the setting is on, then a real zip code is selected that starts with the first three digits of the original zip code. For low population areas, if a state is linked, a random zip code from the state is used. Otherwise, a random zip code from the United States is used.
Toggle the Consistency setting to indicate whether to make the column self-consistent. By default, consistency is disabled.
For the HIPAA Address generator, Spark workspaces (Amazon EMR, Databricks, and self-managed Spark clusters) only support the following address parts:
City
City with State
City with State Abbr
State
State Abbr
US Address
US Address with Country
Zip Code
Consistency | No, cannot be made consistent. |
Linking | No, cannot be linked. |
Differential privacy | Yes |
Data-free | Yes |
Allowed for primary keys | No |
Allowed for unique columns | No |
Uses format-preserving encryption (FPE) | No |
Privacy ranking | 1 |
Generator ID (for the API) |
A version of the Categorical generator that selects from values that you provide instead of shuffling the original values.
To configure the generator:
From the Link To dropdown list, select the columns to link this column to. You can only select other columns that use the Custom Categorical generator.
In the Custom Categories text area, enter the list of values that the generator can choose from.
Put each value on a separate line.
To add a NULL value to the list, use the keyword {NULL}
.
Toggle the Consistency setting to indicate whether to make the column consistent. By default, consistency is disabled.
If you enable consistency, then by default the generator is self-consistent. To make the generator consistent with another column, from the Consistent to dropdown list, select the column. When a generator is self-consistent, then a given value in the source database is always mapped to the same value in the destination database. When a generator is consistent with another column, then a given source value in that column always results in the same value for the current column in the destination database. For example, a department column is consistent with a username column. For each instance of User1 in the source database, the value in the department column is the same.
If Structural data encryption is enabled, then to use it for this column, toggle Use data encryption process to the on position.
Generates timestamps fitting an event distribution. The source timestamp must include a date. It cannot be a time-only value.
Link columns to create a sequence of events across multiple columns. This generator can be partitioned by other columns.
To configure the generator:
From the Link To dropdown list, select the other Event Timestamps generator columns to link this column to. Linking creates a sequence across multiple columns.
From the Partition drop-down list, select one or more columns to use to partition the data. The selected columns must have their generator set to either Passthrough or Categorical. For more information about partitioning and how it works, go to Partitioning a column.
The Options list displays the current column and linked columns. Use the Up and Down buttons to configure the column sequence.
If Structural data encryption is enabled, then to use it for this column, toggle Use data encryption process to the on position.
This generator scrambles characters while preserving formatting and keeping the file extension intact.
For example, for the following input value:
DataSummary1.pdf
The output value would look something like:
RsnoPwcsrtv5.pdf
This generator securely masks letters and numbers. There is no way to recover the original data.
To configure the generator, toggle the Consistency setting to indicate whether to make the generator self-consistent.
By default, the generator is not consistent.
If Structural data encryption is enabled, then to use it for this column, toggle Use data encryption process to the on position.
Scrambles the characters in an email address. It preserves formatting and keeps the @
and .
characters.
For example, for the following input value:
johndoe@company.com
The output value would be something like:
brwomse@xorwxlt.slt
By default, the generator scrambles the domain. You can configure the generator to not mask specific domains. You can also specify a domain to use for all of the output email addresses.
For example, if you configure the generator to not scramble the domain company.com
, then the output for johndoe@company.com
would look something like:
brwomse@company.com
This generator securely masks letters and numbers. There is no way to recover the original data.
If your email addresses include name values - for example, John.Smith@mycompany.com - then you can use the Regex Mask generator to produce email addresses that are tied to name values in the same table. For information on how to do this, go to #generator-tips-email-name-alignment.
To configure the generator:
In the Email Domain field, enter a domain to use for all of the output values.
For example, use @mycompany.com
for all of the generated values. The generator scrambles the content before the @
.
In the Excluded Email Domains field, enter a comma-separated list of domains for which email addresses are not masked in the output values. This allows you, for example, to maintain internal or testing email addresses that are not considered sensitive.
Toggle the Replace invalid emails setting to indicate whether to replace an invalid email address with a generated valid email address. By default, invalid email addresses are not replaced. In the replacement values, the username is generated. If you specify a value for Email Domain, then the email addresses use that domain. Otherwise, the domain is generated.
Toggle the Consistency setting to indicate whether to make the column self-consistent. By default, consistency is disabled.
Truncates a date value or a timestamp to a specific part.
For a date or a timestamp, you can truncate to the year, month, or day.
For a timestamp, you can also truncate to the hour, minute, or second.
To configure the generator:
From the dropdown list, select the part of the date or timestamp to truncate to. For both date and timestamp values, you can truncate to the year, month, or day. When you select one of these options, the time portion of a timestamp is set to 00:00:00. For the date, the values below the selected truncation value are set to 01. For example, when you truncate to month, the day value is set to 01, and the timestamp is set to 00:00:00. For a timestamp value, you also can truncate to the hour, minute, or second. The date values remain the same as the original data. The time values below the selected truncation value are set to 00. For example, when you truncate to minute, the seconds value is set to 00.
Toggle the Birth Date option. When you enable Birth Date, the generator shifts dates that are more than 90 years before the generation date to the date exactly 90 years before the generation date. For example, a generation occurs on January 1, 2023. Any date that occurs before January 1, 1933 is changed to January 1, 1933.
This is mostly intended for birthdate values, to group birthdates for everyone who is older than 89 into a single year. This is used to comply with HIPAA Safe Harbor.
If Structural data encryption is enabled, then to use it for this column, toggle Use data encryption process to the on position.
Here are examples of date and time values and how the selected truncation affects the output:
This is a composite generator.
Masks text columns by parsing the values as rows whose columns are delimited by a specified character.
You can assign specific generators to specific indexes. You can also use the generator that is assigned to a specific index as the default. This applies the generator to every index that does not have an assigned generator.
The output value maintains the quotes around the index values.
For example, a column contains the following value:
"first","second","third"
You assign the Character Scramble generator to index 0 and assign Passthrough to index 2. You select index 0 as the index to use for the default generator.
In the output, the first and second values are masked by the Character Scramble generator. The third value is not masked. The output looks something like:
"wmcop", "xjorsl", "third"
In the Delimiter field, type the delimiter that is used as a separator for the value.
For example, for the value "first","second","third"
, the delimiter is a comma.
You can configure a generator for any or all of the indexes. To add a sub-generator for an index:
Under Sub-Generators, click Add Generator. On the add generator dialog, the Cell CSV field contains a sample value from the source data. You can use the navigation icons to page through the values.
In the CSV Index field, type the index to assign a generator to. The index numbers start with 0. You cannot use an index that already has an assigned generator. Matched CSV values shows the value at that index for the current sample column value.
Under Generator Configuration, from the Select a Generator dropdown list, select the generator to use for the selected index. You cannot select another composite generator. To remove the selection, click the delete icon.
Configure the selected generator. You cannot configure the selected generator to be consistent with another column.
To save the configuration and immediately add a generator for another index, click Save and Add Another. To save the configuration and close the add generator panel, click Save.
From the Sub-Generators list:
To edit a generator assignment, click the edit icon.
To remove a generator assignment, click the delete icon.
To move a generator assignment up or down in the list, click the up or down arrow.
After you configure a generator for at least one index, the Default Link dropdown list is displayed.
From the Default Link dropdown list, select the index to use to determine how to mask values for indexes that do not have an assigned generator.
For example, you assign the Character Scramble generator to index 2. If you set Default Link to 2, then all indexes that do not have an assigned generator use the Character Scramble generator.
Generates random host names, based on the English language.
To configure the generator, toggle the Consistency setting to indicate whether to make the generator consistent.
By default, the generator is not consistent.
If you enable consistency, then by default the generator is self-consistent. To make the generator consistent with another column, from Consistent to, select the column.
When the generator is consistent with itself, then a given value in the source database is mapped to the same value in the destination database. For example, Host123 in the source database always produces MyHostABC in the destination database.
When the generator is consistent with another column, then a given source value in the other column results in the same host name value in the destination database. For example, a host name column is consistent with a department column. Every instance of Sales in the source data is given the same host name in the destination database.
The FNR generator transforms Norwegian national identity numbers. In Norwegian, the term for national identity number abbreviates to FNR.
The first six digits of an FNR reflects the person's birthdate. You can choose to preserve the birthdates from the source values in the destination values. If you do not preserve the source values, the destination values are still within the same date range as the source values.
Another digit in an FNR indicates whether the person is male or female. You can specify whether to preserve in the generated value the gender indicated in the source value.
The last digits in an FNR are a checksum value. The last digits in the destination value are not a checksum - the values are random.
To configure the generator:
To preserve the gender from the source value in the destination value, toggle Preserve Gender to the on position.
To preserve the birthdate from the source value in the destination value, toggle Preserve Birthdate to the on position.
Toggle the Consistency setting to indicate whether to make the generator consistent. By default, consistency is disabled.
If you enable consistency, then by default the generator is self-consistent. To make the generator consistent with another column, from the Consistent to dropdown list, select the column. When a generator is self-consistent, then a given value in the source database is always mapped to the same value in the destination database. When a generator is consistent with another column, then a given value for that other column in the source database results in the same value in the destination database. For example, if the FNR column is consistent with a Name column, then every instance of John Smith in the source database results in the same FNR in the destination database.
If is enabled, then to use it for this column, toggle Use data encryption process to the on position.
If is enabled, then to use it for this column, toggle Use data encryption process to the on position.
The provides support for additional address parts in Spark workspaces.
Option | Date value | Timestamp value |
---|---|---|
If is enabled, then to use it for this column, toggle Use data encryption process to the on position.
Consistency
Yes, can be made self-consistent or consistent with another column.
Linking
Yes, can be linked.
Differential privacy
Yes, if consistency is not enabled.
Data-free
Yes, if consistency is not enabled.
Allowed for primary keys
No
Allowed for unique columns
No
Uses format-preserving encryption (FPE)
No
Privacy ranking
1 if not consistent
4 if consistent
Generator ID (for the API)
Consistency
No, cannot be made consistent.
Linking
Yes, can be linked.
Differential privacy
No
Data-free
No
Allowed for primary keys
No
Allowed for unique columns
No
Uses format-preserving encryption (FPE)
No
Privacy ranking
3
Generator ID (for the API)
Consistency
Yes, can be made self-consistent.
Linking
No, cannot be linked.
Differential privacy
No
Data-free
No
Allowed for primary keys
No
Allowed for unique columns
No
Uses format-preserving encryption (FPE)
No
Privacy ranking
3 if not consistent
4 if consistent
Generator ID (for the API)
Consistency
Yes, can be made self-consistent.
Linking
No, cannot be linked.
Differential privacy
No
Data-free
No
Allowed for primary keys
No
Allowed for unique columns
No
Uses format-preserving encryption (FPE)
No
Privacy ranking
3 if not consistent
4 if consistent
Generator ID (for the API)
Consistency
No, cannot be made consistent.
Linking
No, cannot be linked.
Differential privacy
No
Data-free
No
Allowed for primary keys
No
Allowed for unique columns
No
Uses format-preserving encryption (FPE)
No
Privacy ranking
5
Generator ID (for the API)
Original value
2021-12-20
2021-12-20 13:42:55
Truncate to year
2021-01-01
2021-01-01 00:00:00
Truncate to month
2021-12-01
2021-12-01 00:00:00
Truncate to day
2021-12-20
2021-12-20 00:00:00
Truncate to hour
Not applicable
2021-12-20 13:00:00
Truncate to minute
Not applicable
2021-12-20 13:42:00
Truncate to second
Not applicable
2021-12-20 13:42:55
Consistency
Determined by the selected sub-generators.
Linking
Determined by the selected sub-generators.
Differential privacy
Determined by the selected sub-generators.
Data-free
Determined by the selected sub-generators.
Allowed for primary keys
No
Allowed for unique columns
No
Uses format-preserving encryption (FPE)
No
Privacy ranking
5
Generator ID (for the API)
Consistency | Yes, can be made self-consistent or consistent with another column. |
Linking | No, cannot be linked. |
Differential privacy | Yes, if consistency is not enabled. |
Data-free | Yes, if consistency is not enabled. |
Allowed for primary keys | No |
Allowed for unique columns | No |
Uses format-preserving encryption (FPE) | No |
Privacy ranking |
|
Generator ID (for the API) |
Consistency | Yes, can be made self-consistent or consistent with another column. |
Linking | No, cannot be linked |
Differential privacy | No |
Data-free | No |
Allowed for primary keys | No |
Allowed for unique columns | Yes |
Uses format-preserving encryption (FPE) | No |
Privacy ranking |
|
Generator ID (for the API) |
Consistency | Yes, can be made self-consistent. |
Linking | Yes, can be linked. |
Differential privacy | No |
Data-free | No |
Allowed for primary keys | No |
Allowed for unique columns | No |
Uses format-preserving encryption (FPE) | No |
Privacy ranking |
|
Generator ID (for the API) |
This generator replaces all instances of the find string with the replace string.
For example, you can indicate to replace all instances of abc with 123.
To configure the generator:
In the Find field, type the string to look for in the source column value.
To use a regular expression to identify the source value, check the Use Regex checkbox.
If you use a regular expression, use backslash ( \
) as the escape character.
In the Replace field, type the string to replace the matching string with.
If Structural data encryption is enabled, then to use it for this column, toggle Use data encryption process to the on position.
This is a composite generator.
Runs selected generators on specified key values in an HStore column in a PostgreSQL database. HStore columns contain a set of key-value pairs.
To assign a generator to a key:
Under Sub-generators, click Add Generator. On the sub-generator configuration panel, the Cell HStore field contains a sample value from the source database. You can use the previous and next icons to page through different values.
Under Enter a key, enter the name of a key from the column value.
For example, for the column value:
"pages"=>"446", "title"=>"The Iliad", "category"=>"mythology"
To apply a generator to the title, you would enter title
as the key.
Matched HStore Values shows the result from the value in Cell HStore.
From the Generator Configuration dropdown list, select the generator to apply to the key value. You cannot select another composite generator.
Configure the selected generator. You cannot configure the selected generator to be consistent with another column.
To save the configuration and immediately add a generator for another key, click Save and Add Another. To save the configuration and close the add generator panel, click Save.
From the Sub-Generators list:
To edit a generator assignment, click the edit icon.
To remove a generator assignment, click the delete icon.
To move a generator assignment up or down in the list, click the up or down arrow.
This generator can be used to mask columns of latitude and longitude.
The Geo generator divides the globe into grids that are approximately 4.9 x 4.9 km. It then counts the number of points within each grid.
During data generation, each (latitude, longitude) pair is mapped to its grid.
If the grid contains a sufficient number of points to preserve privacy, then the generator returns a randomly chosen point in that grid.
If the grid does not contain enough points to preserve privacy, then the generator returns a random coordinate from the nearest grid that contains enough points.
To configure the generator:
From the Link To dropdown list, select the column to link to this one. You typically assign the Geo generator to both the latitude and longitude column, then link those columns.
From the value type dropdown, select whether this column contains a latitude value or a longitude value.
Generates unique integer values. By default, the generated values are within the range of the column’s data type.
You can also specify a range for the generated values. The source values must be within that range.
This generator cannot be used to transform negative numbers.
To configure the generator:
In the Minimum field, enter the minimum value to use for an output value. The minimum value cannot be larger than any of the values in the source data.
In the Maximum field, enter the maximum value to use for an output value. The maximum value cannot be smaller than any of the values in the source data.
Toggle the Consistency setting to indicate whether to make the column self-consistent. By default, consistency is disabled.
If is enabled, then to use it for this column, toggle Use data encryption process to the on position.
Generates an address-like string to replace either:
For a Canadian postal address, the street name or postal code
For a United Kingdom (UK) mailing address, the postal code
To replace a Canadian postal code:
The generator selects a real postal code that starts with the same three digits - has the same Forward Sortation Area (FSA) - as the original postal code, but that has a different Local Delivery Unit (LDU).
For a postal code whose FSA is not on the list that the generator uses, you can provide a fallback value to use.
To replace a UK postal code, the generator selects a real postal code.
To configure the generator:
From the Generator Type dropdown list, select International Address.
From the Country dropdown list, select the country (Canada or United Kingdom).
From the Address Component dropdown list, select the address component that this column contains. For Canada, the available options are:
Street Name
Postal Code
For the UK, the only option is to generate a postal code.
For a Canadian postal code, in the Fallback Value field, type the FSA to use if the value in the data does not exist.
For example, the FSA in the data might be new and not yet in the list that Tonic uses, or the FSA might be invalid.
By default, the fallback value is NULL
, meaning that the postal code value will also be string literal "NULL" in the destination data.
Toggle the Consistency setting to indicate whether to make the column consistent. By default, consistency is disabled.
If Tonic data encryption is enabled, then to use it for this column, toggle Use data encryption process to the on position.
This is a .
Masks text columns by parsing the contents as HTML, and applying sub-generators to specified path expressions.
If applying a sub-generator fails because of an error, the generator selected as the fallback generator is applied instead.
Path expressions are defined using the .
For example, for the following HTML:
To get the value of h1
, the expression is //h1/text(
).
To get the value of the first list item, the expression is //ul/li[1]/text()
.
To assign a generator to a path expression:
Under Sub-generators, click Add Generator. On the sub-generator configuration panel, the Cell HTML field contains a sample value from the source database. You can use the previous and next icons to page through different values.
In the Path Expression field, type the path expression to identify the value to apply the generator to. Matched HTML Values shows the result from the value in Cell HTML.
From the Generator Configuration dropdown list, select the generator to apply to the path expression. You cannot select another composite generator.
Configure the selected generator. You cannot configure the selected generator to be consistent with another column.
To save the configuration and immediately add a generator for another path expression, click Save and Add Another. To save the configuration and close the add generator panel, click Save.
From the Sub-Generators list:
To edit a generator assignment, click the edit icon.
To remove a generator assignment, click the delete icon.
To move a generator assignment up or down in the list, click the up or down arrow.
From the Fallback Generator dropdown list, select the generator to use if the assigned generator for a path expression fails.
The options are:
If is enabled, then to use it for this column, toggle Use data encryption process to the on position.
Consistency
No, cannot be made consistent.
Linking
No, cannot be linked.
Differential privacy
No
Data-free
No
Allowed for primary keys
No
Allowed for unique columns
No
Uses format-preserving encryption (FPE)
No
Privacy ranking
5
Generator ID (for the API)
Consistency
Determined by the selected sub-generators.
Linking
Determined by the selected sub-generators.
Differential privacy
Determined by the selected sub-generators.
Data-free
Determined by the selected sub-generators.
Allowed for primary keys
No
Allowed for unique columns
No
Uses format-preserving encryption (FPE)
No
Privacy ranking
5
Generator ID (for the API)
Consistency | No, cannot be made consistent. |
Linking | Yes, can be linked. |
Differential privacy | No |
Data-free | No |
Allowed for primary keys | No |
Allowed for unique columns | Yes |
Uses format-preserving encryption (FPE) | No |
Privacy ranking | 3 |
Generator ID (for the API) |
Consistency | Yes, can be made self-consistent. |
Linking | No, cannot be linked. |
Differential privacy | Yes, if consistency is not enabled. |
Data-free | Yes, if consistency is not enabled. |
Allowed for primary keys | Yes |
Allowed for unique columns | Yes |
Uses format-preserving encryption (FPE) | Yes |
Privacy ranking |
|
Generator ID (for the API) |
Consistency | Yes, can be made self-consistent. |
Linking | No, cannot be linked. |
Differential privacy | Yes, if consistency is not enabled. |
Data-free | Yes, if consistency is not enabled. |
Allowed for primary keys | No |
Allowed for unique columns | No |
Uses format-preserving encryption (FPE) | No |
Privacy ranking |
|
Generator ID (for the API) |
Consistency | Determined by the selected sub-generators. |
Linking | Determined by the selected sub-generators. |
Differential privacy | Determined by the selected sub-generators. |
Data-free | Determined by the selected sub-generators. |
Allowed for primary keys | No |
Allowed for unique columns | No |
Uses format-preserving encryption (FPE) | No |
Privacy ranking | 5 |
Generator ID (for the API) |
This is a composite generator.
Runs a selected sub-generator on values that match a user specified JSONPath. You can only search for and apply sub-generators to individual key values. You cannot apply a sub-generator to an object or to an array.
If an error occurs, the selected fallback generator is used for the entirety of the JSON value.
Sub-generators are applied sequentially, from the sub-generator at the top of the list to the sub-generator at the bottom of the list.
If multiple JSONPath expressions point to the same key, the most recently added generator takes priority.
JSON paths can also contain regular expressions and comparison logic, which allows the configured sub-generators to be applied only when there are properties that satisfy the query.
For example, a column contains this JSON:
[ { file_name: "foo.txt", b: 10 }, ... ]
The following JSON path only applies to array elements that contain a file_name
key for which the value ends in .txt
:
$.[?(@.file_Name =~ /^.*.txt$/)]
A JSON path can also be used to point to a key name recursively. For example, a column contains this JSON:
The following JSON path applies to all properties for which the key is first_name
:
$..first_name
To assign a generator to a path expression:
Under Sub-generators, click Add Generator. On the sub-generator configuration panel, the Cell JSON field contains a sample value from the source database. You can use the previous and next icons to page through different values.
In the Path Expression field, type the path expression to identify the value to apply the generator to. To create a path expression, you can also click the value in Cell JSON that you want the expression to point to. The path expression must identify a key value. You cannot apply sub-generators to an object or to an array. Matched JSON Values shows the result from the value in Cell JSON.
By default, the selected generator is applied to any value that matches the expression. To limit the types of values to apply the generator to, from the Type Filter, specify the applicable types. You can select Any, or you can select any combination of String, Number, Boolean, and Null.
From the Generator Configuration dropdown list, select the generator to apply to the path expression. You cannot select another composite generator.
Configure the selected generator. You cannot configure the selected generator to be consistent with another column.
To save the configuration and immediately add a generator for another path expression, click Save and Add Another. To save the configuration and close the add generator panel, click Save.
From the Sub-Generators list:
To edit a generator assignment, click the edit icon.
To remove a generator assignment, click the delete icon.
To move a generator assignment up or down in the list, click the up or down arrow.
From the Fallback Generator dropdown list, select the generator to use if the assigned generator for a path expression fails.
The options are:
Generates a random IP address formatted string.
To configure the generator:
In the Percent IPv4 field, type the percentage of output values that are IPv4 addresses.
For example, if you set this to 60
, then 60% of the generated IP addresses are IPv4 addresses, and 40% of the generated IP addresses are IPv6 addresses.
If you set this to 100
, then all of the generated IP addresses are IPv4 addresses.
If you set this to 0
, then all of the generated IP addresses are IPv6 addresses.
Toggle the Consistency setting to indicate whether to make the column consistent. By default, consistency is disabled.
If you enable consistency, then by default the generator is self-consistent. To make the generator consistent with another column, from the Consistent to dropdown list, select the column. When a generator is self-consistent, then a given value in the source database is always mapped to the same value in the destination database. When a generator is consistent with another column, then a given source value in that column always results in the same IP address value in the destination database. For example, an IP address column is consistent with a username column. For each instance of User1 in the source database, the value in the IP address column is the same.
If Structural data encryption is enabled, then to use it for this column, toggle Use data encryption process to the on position.
Generates unique object identifiers.
Can be assigned to text columns that contain MongoDB ObjectId
values. The column value must be 12 bytes long.
To configure the generator:
A MongoID object identifier consists of an epoch timestamp, a random value, and an incremented counter. To only change the random value portion of the identifier, but keep the timestamp and counter portions, toggle Preserve Timestamp and Incremental Counter to the on position.
Toggle the Consistency setting to indicate whether to make the generator self-consistent. By default, the generator is not consistent.
If Structural data encryption is enabled, then to use it for this column, toggle Use data encryption process to the on position.
Generates a random MAC address formatted string.
To configure the generator:
In the Bytes Preserved field, enter the number of bytes to preserve in the generated address.
Toggle the Consistency setting to indicate whether to make the column self-consistent. By default, consistency is disabled.
If Structural data encryption is enabled, then to use it for this column, toggle Use data encryption process to the on position.
Masks values in numeric columns. Adds or multiplies the original value by random noise.
The additive noise generator draws noise from an interval around 0 scaled to the magnitude of original value. For example, the default scale is 10% of the underlying value. The larger the value, the larger the amount of noise that is added.
The multiplicative noise generator multiplies the original value by a random scaling factor that falls within a specified range.
You can use either the additive noise generator or the multiplicative noise generator, then set the other generator settings.
To use the additive noise generator:
From the dropdown list, choose Additive.
In the Relative noise scale field, type the percentage of the underlying value to scale the noise to. The default value is 10
.
Tonic samples the additive noise from a range between [-{
scale
/100} * |
value
|, {
scale
/ 100} * |
value
|)
, where scale
is the noise scale, and value
is the original data value.
The lower value of the range is inclusive, and the upper value of the range is exclusive.
For example, for the default noise scale of 10
, and a data value of 20
, the additive noise range would be [-.1 * 20, .1 * 20)
. In other words, between -2 (inclusive) and 2 (exclusive).
To use the multiplicative noise generator:
From the dropdown list, choose Multiplicative.
In the Min field, type the minimum value for the scaling factor. The minimum value is inclusive. The default value is 0.5
.
In the Max field, type the maximum value for the scaling factor. The maximum value is exclusive. The default value is 5
.
Tonic scales the original value from a range between [
min
,
max
)
, where min
is the minimum scaling factor, and max
is the maximum scaling factor.
For example, for the default values of 0.5
and 5
, Tonic multiplies the original data value by a value from between 0.5 (inclusive) and 5 (exclusive).
To configure the generator consistency and data encryption:
Toggle the Consistency setting to indicate whether to make the column consistent. By default, the consistency is disabled.
If you enable consistency, then by default the generator is self-consistent. To make the generator consistent with another column, from the Consistent to dropdown list, select the column. If the generator is self-consistent, then a given value in the source database is masked in exactly the same way to produce the value in the destination database. If the generator is consistent with another column, then for a given value in that other column, the column that is assigned the Noise generator is always masked in exactly the same way in the destination database. For example, a field containing a salary value is assigned the Noise Generator and is consistent with the username field. For each instance of User1, the Noise Generator masks the salary value in exactly the same way.
If Structural data encryption is enabled, then to use it for this column, toggle Use data encryption process to the on position.
Generates a random name string from a dictionary of first and last names.
You specify the name information that is contained in the column. A column might only contain a first name or last name, or might contain a full name. A full name might be first name first or last name first.
For example, a Name column contains a full name in the format Last, First. For the input value Smith, John
, the output value would be something like, Jones, Mary
.
To configure the generator:
From the name format dropdown list, select the type of name value that the column contains:
First. This also is commonly used for standalone middle name fields.
Last
First Last
First Middle Last
First Middle Initial Last
Last, First
Last, First Middle
Middle Initial
Toggle the Preserve Capitalization setting to indicate whether to preserve the capitalization of the column value. By default, the capitalization is not preserved.
Toggle the Consistency setting to indicate whether to make the column consistent. By default, consistency is disabled.
If you enable consistency, then by default the generator is self-consistent. To make the generator consistent with another column, from the Consistent to dropdown list, select the column.
If Structural data encryption is enabled, then to use it for this column, toggle Use data encryption process to the on position.
Consistency
Yes, can be made self-consistent or consistent with another column.
Linking
No, cannot be linked.
Differential privacy
Yes, if consistency is not enabled.
Data-free
Yes, if consistency is not enabled.
Allowed for primary keys
No
Allowed for unique columns
No
Uses format-preserving encryption (FPE)
No
Privacy ranking
1 if not consistent
4 if consistent
Generator ID (for the API)
Consistency
Yes, can be made self-consistent
Linking
No, cannot be linked
Differential privacy
No
Data-free
No
Allowed for primary keys
No
Allowed for unique columns
No
Uses format-preserving encryption (FPE)
No
Privacy ranking
3 if not consistent
4 if consistent
Generator ID (for the API)
Consistency
Yes, can be made self-consistent.
Linking
No, cannot be linked.
Differential privacy
Yes, if consistency is not enabled.
Data-free
Yes, if consistency is not enabled.
Allowed for primary keys
No
Allowed for unique columns
No
Uses format-preserving encryption (FPE)
Yes
Privacy ranking
1 if not consistent
4 if consistent
Generator ID (for the API)
Consistency
Determined by the selected sub-generators.
Linking
Determined by the selected sub-generators.
Differential privacy
Determined by the selected sub-generators.
Data-free
Determined by the selected sub-generators.
Allowed for primary keys
No
Allowed for unique columns
No
Uses format-preserving encryption (FPE)
No
Privacy ranking
5
Generator ID (for the API)
Consistency
Yes, can be made self-consistent or consistent with another column.
Linking
No, cannot be linked.
Differential privacy
No
Data-free
No
Allowed for primary keys
No
Allowed for unique columns
No
Uses format-preserving encryption (FPE)
No
Privacy ranking
3 if not consistent
4 if consistent
Generator ID (for the API)
Consistency
Yes, can be made self-consistent or consistent with another column. Note that all Name generator columns that have the same consistency configuration are automatically consistent with each other. The columns must either be all self-consistent or all consistent with the same other column. For example, you can use this to ensure that a first name and last name column value always match the first name and last name in a full name column.
Linking
No, cannot be linked.
Differential privacy
Yes, if consistency is not enabled.
Data-free
Yes, if consistency is not enabled.
Allowed for primary keys
No
Allowed for unique columns
No
Uses format-preserving encryption (FPE)
No
Privacy ranking
1 if not consistent
4 if consistent
Generator ID (for the API)
Generates a random boolean value.
To configure the generator, in the Percent True field, enter the percentage of values to set to True
in the output.
For example, if you set this to 60
, then 60% of the output values are True
, and 40% of the output values are False
.
If you set this to 100
, then all of the output values are True
.
If you set this to 0
, then all of the output values are False
.
If Structural data encryption is enabled, then to use it for this column, toggle Use data encryption process to the on position.
Generates a random phone number that matches the country or region of the input phone number while maintaining the format. For example, (123) 456-7890 or 123-456-7890.
If the input is not a valid phone number, the generator randomly replaces numeric characters. You can also replace invalid numbers with valid numbers.
By default, the numbers are United States phone numbers. Generated numbers pass Google's libphonenumber
verification if the input is a valid phone number or if you replace invalid numbers.
To configure the generator:
Toggle the Replace invalid numbers setting to indicate whether to replace invalid input values with a valid output value. By default, the generator does not replace invalid values. It randomly replaces numeric characters.
Toggle the Consistency setting to indicate whether to make the generator self-consistent. By default, consistency is disabled.
If Structural data encryption is enabled, then to use it for this column, toggle Use data encryption process to the on position.
Generates unique numeric strings of the same length as the input value.
For example, for the input value 123456
, the output value would be something like 832957
.
You can apply this generator only to columns that contain numeric strings.
To configure the generator, toggle the Consistency setting to indicate whether to make the generator self-consistent.
By default, the generator is not consistent.
If Structural data encryption is enabled, then to use it for this column, toggle Use data encryption process to the on position.
Generates a random double number between the specified minimum (inclusive) and maximum (exclusive).
To configure the generator:
In the Minimum field, type the minimum value to use in the output values. The minimum value is inclusive. The output values can be that value or higher.
In the Maximum field, type the maximum value to use in the output values. The maximum value is exclusive. The output values are lower than that value.
If Structural data encryption is enabled, then to use it for this column, toggle Use data encryption process to the on position.
Consistency
No, cannot be made consistent.
Linking
No, cannot be linked.
Differential privacy
No
Data-free
No
Allowed for primary keys
No
Allowed for unique columns
Yes
Uses format-preserving encryption (FPE)
No
Privacy ranking
6
Generator ID (for the API)
Consistency
No, cannot be made consistent.
Linking
No, cannot be linked.
Differential privacy
Yes
Data-free
Yes
Allowed for primary keys
No
Allowed for unique columns
No
Uses format-preserving encryption (FPE)
No
Privacy ranking
1
Generator ID (for the API)
Consistency
Yes, can be made self-consistent.
Linking
No, cannot be linked.
Differential privacy
No
Data-free
No
Allowed for primary keys
No
Allowed for unique columns
No
Uses format-preserving encryption (FPE)
No
Privacy ranking
3
Generator ID (for the API)
Consistency
Yes, can be made self-consistent.
Linking
No, cannot be linked.
Differential privacy
No
Data-free
No
Allowed for primary keys
Yes
Allowed for unique columns
Yes
Uses format-preserving encryption (FPE)
Yes
Privacy ranking
3 if not consistent
4 if consistent
Generator ID (for the API)
Consistency
No, cannot be made consistent.
Linking
No, cannot be linked.
Differential privacy
Yes
Data-free
Yes
Allowed for primary keys
No
Allowed for unique columns
No
Uses format-preserving encryption (FPE)
No
Privacy ranking
1
Generator ID (for the API)
This is a composite generator.
Uses regular expressions to parse strings and replace specified substrings with the output of specified generators. The parts of the string to replace are specified inside unnamed top-level capture groups.
Defining multiple expressions allows you to attach completely different sets of sub-generators to to a given cell, depending on the cell's value.
If multiple regular expressions match a given string, the regular expressions and their associated generators are applied in the order that they are specified. The first expression defined that matches has the selected sub-generators applied.
With the Replace all matches option, the Regex Mask generator behaves similarly to a traditional regex parser. It matches all occurrences of a pattern before the next pattern is encountered. For example, the pattern ^(a)$
applied to the string aaab
matches every occurrence of the letter a
, instead of just the first.
Note that for Spark-based data connectors, depending on your environment, there might be slight differences in the regular expression support.
To ensure consistent results across all data connectors, use regular expression patterns that are compatible with both Java and C#.
For more information about regular expressions in C#, go to this reference. For more information about regular expressions in Java, go to this reference.
In a cell that contains the string ProductId:123-BuyerId:234
, to mask the substrings 123
and 234
, specify the regular expression:
^ProductId:([0-9]{3})-BuyerId:([0-9]{3})$
This captures the two occurrences of three-digit numbers in the pattern ProductId:xxx-BuyerId:xxx
. This makes it possible to define a sub-generator on neither, either, or both of these captured substrings.
The following regular expression defines a broader capture that matches more cell values:
^(\w+).(\d+).(\w+).(\d+)$
This captures pairs of words ((\w+)
) and numbers ((\d+)
) if there is a single character of any value between them, instead of the relatively more specific pattern of the first expression.
To add a regular expression:
Click Add Regex. On the configuration panel, Cell Value shows a sample value from the source database. You can use the previous and next options to navigate through the values.
By default, Replace all matches is enabled. To only match the first occurrence of a pattern, toggle Replace all matches to the off position.
In the Pattern field, enter a regular expression. If the expression is valid, then Tonic displays the capture groups for the expression.
For each capture group, to select and configure the generator to apply, click the selected generator. You cannot select another composite generator.
To save the configuration and immediately add a generator for another path expression, click Save and Add Another. To save the configuration and close the add generator panel, click Save.
From the Regexes list:
To edit a regex, click the edit icon.
To remove a regex, click the delete icon.
Returns a random integer between the specified minimum (inclusive) and maximum (exclusive).
For example, for a column that contains a percentage value, you can indicate to use a value between 0
and 101
.
To configure the generator:
In the Minimum field, type the minimum value to use in the output values. The minimum value is inclusive. The output values can be that value or higher.
In the Maximum field, type the maximum value to use in the output values. The maximum value is exclusive. The output values are lower than that value.
If Structural data encryption is enabled, then to use it for this column, toggle Use data encryption process to the on position.
Generates random dates, times, and timestamps that fall within a specified range.
For example, you might want the output dates to all fall within a specific year or month.
To configure the generator, in the Range fields, provide the start and end dates, times, or timestamps to use for the output values.
If Structural data encryption is enabled, then to use it for this column, toggle Use data encryption process to the on position.
Generates a random new UUID string.
If Structural data encryption is enabled, then to use it for this column, toggle Use data encryption process to the on position.
Generates a random hash string.
If Structural data encryption is enabled, then to use it for this column, toggle Use data encryption process to the on position.
Generates a new valid Canadian Social Insurance Number that preserves the formatting of the original value.
For example, the original value might be 123456789
, 123 456 789
, or 123-456-789
. The output value uses the same format.
To configure the generator, toggle the Consistency setting to indicate whether to make the generator self-consistent.
By default, the generator is not consistent.
If is enabled, then to use it for this column, toggle Use data encryption process to the on position.
Generates values of ISO 6346 compliant shipping container codes. All generated codes are in the freight category ("U").
To configure the generator, toggle the Consistency setting to indicate whether to make the generator consistent.
By default, the generator is not consistent.
If you enable consistency, then by default the generator is self-consistent. To make the generator consistent with another column, from the Consistent to dropdown list, select the column.
When the generator is self-consistent, then a given value in the source database is always mapped to the same value in the destination database.
When the generator is consistent with another column, then a given value for the other column in the source database always results in the same shipping container code value in the destination database. For example, a shipping container column is consistent with an owner column. Every instance of an owner column from the source database has the same shipping container value in the destination database.
Generates a column of unique integer values. The values increment by 1.
To configure the generator:
From the Link To dropdown list, select the other columns to link to the current column. You can only select columns that also use the Sequential Integer generator.
In the Starting Point field, type the number to use as the starting point.
By default, the starting point is 0
. This means that the column value in the first processed row is 0
. The value in the next processed row is 1
. The generator continues to increment the value by 1 in each row that it processes.
If is enabled, then to use it for this column, toggle Use data encryption process to the on position.
If is enabled, then to use it for this column, toggle Use data encryption process to the on position.
Consistency
No, cannot be made consistent.
Linking
No, cannot be linked.
Differential privacy
Yes
Data-free
Yes
Allowed for primary keys
No
Allowed for unique columns
No
Uses format-preserving encryption (FPE)
No
Privacy ranking
1
Generator ID (for the API)
Consistency
No, cannot be made consistent.
Linking
No, cannot be linked.
Differential privacy
Yes
Data-free
Yes
Allowed for primary keys
No
Allowed for unique columns
No
Uses format-preserving encryption (FPE)
No
Privacy ranking
1
Generator ID (for the API)
Consistency
No, cannot be made consistent.
Linking
No, cannot be linked.
Differential privacy
Yes
Data-free
Yes
Allowed for primary keys
No
Allowed for unique columns
Yes
Uses format-preserving encryption (FPE)
No
Privacy ranking
1
Generator ID (for the API)
Consistency
No, cannot be made consistent.
Linking
No, cannot be linked.
Differential privacy
Yes
Data-free
Yes
Allowed for primary keys
No
Allowed for unique columns
No
Uses format-preserving encryption (FPE)
No
Privacy ranking
1
Generator ID (for the API)
Consistency
Determined by the selected sub-generators.
Linking
Determined by the selected sub-generators.
Differential privacy
Determined by the selected sub-generators.
Data-free
Determined by the selected sub-generators.
Allowed for primary keys
No
Allowed for unique columns
Yes
Uses format-preserving encryption (FPE)
No
Privacy ranking
5
Generator ID (for the API)
Consistency | Yes, can be made self-consistent or consistent with another column. |
Linking | No, cannot be linked. |
Differential privacy | Yes, if consistency is not enabled. |
Data-free | Yes, if consistency is not enabled. |
Allowed for primary keys | No |
Allowed for unique columns | No |
Uses format-preserving encryption (FPE) | No |
Privacy ranking |
|
Generator ID (for the API) |
Consistency | Yes, can be made self-consistent. |
Linking | No, cannot be linked. |
Differential privacy | No, cannot be made differentially private. |
Data-free | Yes, if consistency is not enabled. |
Allowed for primary keys | No |
Allowed for unique columns | Yes |
Uses format-preserving encryption (FPE) | Yes |
Privacy ranking |
|
Generator ID (for the API) |
Consistency | No, cannot be made consistent. |
Linking | No, cannot be linked. |
Differential privacy | Yes |
Data-free | Yes |
Allowed for primary keys | No |
Allowed for unique columns | Yes |
Uses format-preserving encryption (FPE) | No |
Privacy ranking | 1 |
Generator ID (for the API) |
Consistency | No, cannot be made consistent. |
Linking | Yes, can be linked. |
Differential privacy | No |
Data-free | No |
Allowed for primary keys | No |
Allowed for unique columns | Yes |
Uses format-preserving encryption (FPE) | No |
Privacy ranking | 3 |
Generator ID (for the API) |