LogoLogo
Release notesAPI docsDocs homeStructural CloudTonic.ai
  • Tonic Structural User Guide
  • About Tonic Structural
    • Structural data generation workflow
    • Structural deployment types
    • Structural implementation roles
    • Structural license plans
  • Logging into Structural for the first time
  • Getting started with the Structural free trial
  • Managing your user account
  • Frequently Asked Questions
  • Tutorial videos
  • Creating and managing workspaces
    • Managing workspaces
      • Viewing your list of workspaces
      • Creating, editing, or deleting a workspace
      • Workspace configuration settings
        • Workspace identification and connection type
        • Data connection settings
        • Configuring secrets managers for database connections
        • Data generation settings
        • Enabling and configuring upsert
        • Writing output to Tonic Ephemeral
        • Writing output to a container repository
        • Advanced workspace overrides
      • About the workspace management view
      • About workspace inheritance
      • Assigning tags to a workspace
      • Exporting and importing the workspace configuration
    • Managing access to workspaces
      • Sharing workspace access
      • Transferring ownership of a workspace
    • Viewing workspace jobs and job details
  • Configuring data generation
    • Privacy Hub
    • Database View
      • Viewing and configuring tables
      • Viewing the column list
      • Displaying sample data for a column
      • Configuring an individual column
      • Configuring multiple columns
      • Identifying similar columns
      • Commenting on columns
    • Table View
    • Working with document-based data
      • Performing scans on collections
      • Using Collection View
    • Identifying sensitive data
      • Running the Structural sensitivity scan
      • Manually indicating whether a column is sensitive
      • Built-in sensitivity types that Structural detects
      • Creating and managing custom sensitivity rules
    • Table modes
    • Generator information
      • Generator summary
      • Generator reference
        • Address
        • Algebraic
        • Alphanumeric String Key
        • Array Character Scramble
        • Array JSON Mask
        • Array Regex Mask
        • ASCII Key
        • Business Name
        • Categorical
        • Character Scramble
        • Character Substitution
        • Company Name
        • Conditional
        • Constant
        • Continuous
        • Cross Table Sum
        • CSV Mask
        • Custom Categorical
        • Date Truncation
        • Email
        • Event Timestamps
        • File Name
        • Find and Replace
        • FNR
        • Geo
        • HIPAA Address
        • Hostname
        • HStore Mask
        • HTML Mask
        • Integer Key
        • International Address
        • IP Address
        • JSON Mask
        • MAC Address
        • Mongo ObjectId Key
        • Name
        • Noise Generator
        • Null
        • Numeric String Key
        • Passthrough
        • Phone
        • Random Boolean
        • Random Double
        • Random Hash
        • Random Integer
        • Random Timestamp
        • Random UUID
        • Regex Mask
        • Sequential Integer
        • Shipping Container
        • SIN
        • SSN
        • Struct Mask
        • Timestamp Shift Generator
        • Unique Email
        • URL
        • UUID Key
        • XML Mask
      • Generator characteristics
        • Enabling consistency
        • Linking generators
        • Differential privacy
        • Partitioning a column
        • Data-free generators
        • Supporting uniqueness constraints
        • Format-preserving encryption (FPE)
      • Generator types
        • Composite generators
        • Primary key generators
    • Generator assignment and configuration
      • Reviewing and applying recommended generators
      • Assigning and configuring generators
      • Document View for file connector JSON columns
      • Generator hints and tips
      • Managing generator presets
      • Configuring and using Structural data encryption
      • Custom value processors
    • Subsetting data
      • About subsetting
      • Using table filtering for data warehouses and Spark-based data connectors
      • Viewing the current subsetting configuration
      • Subsetting and foreign keys
      • Configuring subsetting
      • Viewing and managing configuration inheritance
      • Viewing the subset creation steps
      • Viewing previous subsetting data generation runs
      • Generating cohesive subset data from related databases
      • Other subsetting hints and tips
    • Viewing and adding foreign keys
    • Viewing and resolving schema changes
    • Tracking changes to workspaces, generator presets, and sensitivity rules
    • Using the Privacy Report to verify data protection
  • Running data generation
    • Running data generation jobs
      • Types of data generation
      • Data generation process
      • Running data generation manually
      • Scheduling data generation
      • Issues that prevent data generation
    • Managing data generation performance
    • Viewing and downloading container artifacts
    • Post-job scripts
    • Webhooks
  • Installing and Administering Structural
    • Structural architecture
    • Using Structural securely
    • Deploying a self-hosted Structural instance
      • Deployment checklist
      • System requirements
      • Deploying with Docker Compose
      • Deploying on Kubernetes with Helm
      • Enabling the option to write output data to a container repository
        • Setting up a Kubernetes cluster to use to write output data to a container repository
        • Required access to write destination data to a container repository
      • Entering and updating your license key
      • Setting up host integration
      • Working with the application database
      • Setting up a secret
      • Setting a custom certificate
    • Using Structural Cloud
      • Structural Cloud notes
      • Setting up and managing a Structural Cloud pay-as-you-go subscription
      • Structural Cloud onboarding
    • Managing user access to Structural
      • Structural organizations
      • Determining whether users can create accounts
      • Creating a new account in an existing organization
      • Single sign-on (SSO)
        • Structural user authentication with SSO
        • Enabling and configuring SSO on Structural Cloud
        • Synchronizing SSO groups with Structural
        • Viewing the list of SSO groups in Tonic Structural
        • AWS IAM Identity Center
        • Duo
        • GitHub
        • Google
        • Keycloak
        • Microsoft Entra ID (previously Azure Active Directory)
        • Okta
        • OpenID Connect (OIDC)
        • SAML
      • Managing Structural users
      • Managing permissions
        • About permission sets
        • Built-in permission sets
        • Available permissions
        • Viewing the lists of global and workspace permission sets
        • Configuring custom permission sets
        • Selecting default permission sets
        • Configuring access to global permission sets
        • Setting initial access to all global permissions
        • Granting Account Admin access for a Structural Cloud organization
    • Structural monitoring and logging
      • Monitoring Structural services
      • Performing health checks
      • Downloading the usage report
      • Tracking user access and permissions
      • Redacted and diagnostic (unredacted) logs
      • Data that Tonic.ai collects
      • Verifying and enabling telemetry sharing
    • Configuring environment settings
    • Updating Structural
  • Connecting to your data
    • About data connectors
    • Overview for database administrators
    • Data connector summary
    • Amazon DynamoDB
      • System requirements and limitations for DynamoDB
      • Structural differences and limitations with DynamoDB
      • Before you create a DynamoDB workspace
      • Configuring DynamoDB workspace data connections
    • Amazon EMR
      • Structural process overview for Amazon EMR
      • System requirements for Amazon EMR
      • Structural differences and limitations with Amazon EMR
      • Before you create an Amazon EMR workspace
        • Creating IAM roles for Structural and Amazon EMR
        • Creating Athena workgroups
        • Configuration for cross-account setups
      • Configuring Amazon EMR workspace data connections
    • Amazon Redshift
      • Structural process overview for Amazon Redshift
      • Structural differences and limitations with Amazon Redshift
      • Before you create an Amazon Redshift workspace
        • Required AWS instance profile permissions for Amazon Redshift
        • Setting up the AWS Lambda role for Amazon Redshift
        • AWS KMS permissions for Amazon SQS message encryption
        • Amazon Redshift-specific Structural environment settings
        • Source and destination database permissions for Amazon Redshift
      • Configuring Amazon Redshift workspace data connections
    • Databricks
      • Structural process overview for Databricks
      • System requirements for Databricks
      • Structural differences and limitations with Databricks
      • Before you create a Databricks workspace
        • Granting access to storage
        • Setting up your Databricks cluster
        • Configuring the destination database schema creation
      • Configuring Databricks workspace data connections
    • Db2 for LUW
      • System requirements for Db2 for LUW
      • Structural differences and limitations with Db2 for LUW
      • Before you create a Db2 for LUW workspace
      • Configuring Db2 for LUW workspace data connections
    • File connector
      • Overview of the file connector process
      • Supported file and content types
      • Structural differences and limitations with the file connector
      • Before you create a file connector workspace
      • Configuring the file connector storage type and output options
      • Managing file groups in a file connector workspace
      • Downloading generated file connector files
    • Google BigQuery
      • Structural differences and limitations with Google BigQuery
      • Before you create a Google BigQuery workspace
      • Configuring Google BigQuery workspace data connections
      • Resolving schema changes for de-identified views
    • MongoDB
      • System requirements for MongoDB
      • Structural differences and limitations with MongoDB
      • Configuring MongoDB workspace data connections
      • Other MongoDB hints and tips
    • MySQL
      • System requirements for MySQL
      • Before you create a MySQL workspace
      • Configuring MySQL workspace data connections
    • Oracle
      • Known limitations for Oracle schema objects
      • System requirements for Oracle
      • Structural differences and limitations with Oracle
      • Before you create an Oracle workspace
      • Configuring Oracle workspace data connections
    • PostgreSQL
      • System requirements for PostgreSQL
      • Before you create a PostgreSQL workspace
      • Configuring PostgreSQL workspace data connections
    • Salesforce
      • System requirements for Salesforce
      • Structural differences and limitations with Salesforce
      • Before you create a Salesforce workspace
      • Configuring Salesforce workspace data connections
    • Snowflake on AWS
      • Structural process overviews for Snowflake on AWS
      • Structural differences and limitations with Snowflake on AWS
      • Before you create a Snowflake on AWS workspace
        • Required AWS instance profile permissions for Snowflake on AWS
        • Other configuration for Lambda processing
        • Source and destination database permissions for Snowflake on AWS
        • Configuring whether Structural creates the Snowflake on AWS destination database schema
      • Configuring Snowflake on AWS workspace data connections
    • Snowflake on Azure
      • Structural process overview for Snowflake on Azure
      • Structural differences and limitations with Snowflake on Azure
      • Before you create a Snowflake on Azure workspace
      • Configuring Snowflake on Azure workspace data connections
    • Spark SDK
      • Structural process overview for the Spark SDK
      • Structural differences and limitations with the Spark SDK
      • Configuring Spark SDK workspace data connections
      • Using Spark to run de-identification of the data
    • SQL Server
      • System requirements for SQL Server
      • Before you create a SQL Server workspace
      • Configuring SQL Server workspace data connections
    • Yugabyte
      • System requirements for Yugabyte
      • Structural differences and limitations with Yugabyte
      • Before you create a Yugabyte workspace
      • Configuring Yugabyte workspace data connections
      • Troubleshooting Yugabyte data generation issues
  • Using the Structural API
    • About the Structural API
    • Getting an API token
    • Getting the workspace ID
    • Using the Structural API to perform tasks
      • Configure environment settings
      • Manage generator presets
        • Retrieving the list of generator presets
        • Structure of a generator preset
        • Creating a custom generator preset
        • Updating an existing generator preset
        • Deleting a generator preset
      • Manage custom sensitivity rules
      • Create a workspace
      • Connect to source and destination data
      • Manage file groups in a file connector workspace
      • Assign table modes and filters to source database tables
      • Set column sensitivity
      • Assign generators to columns
        • Getting the generator IDs and available metadata
        • Updating generator configurations
        • Structure of a generator assignment
        • Generator API reference
          • Address (AddressGenerator)
          • Algebraic (AlgebraicGenerator)
          • Alphanumeric String Key (AlphaNumericPkGenerator)
          • Array Character Scramble (ArrayTextMaskGenerator)
          • Array JSON Mask (ArrayJsonMaskGenerator)
          • Array Regex Mask (ArrayRegexMaskGenerator)
          • ASCII Key (AsciiPkGenerator)
          • Business Name (BusinessNameGenerator)
          • Categorical (CategoricalGenerator)
          • Character Scramble (TextMaskGenerator)
          • Character Substitution (StringMaskGenerator)
          • Company Name (CompanyNameGenerator)
          • Conditional (ConditionalGenerator)
          • Constant (ConstantGenerator)
          • Continuous (GaussianGenerator)
          • Cross Table Sum (CrossTableAggregateGenerator)
          • CSV Mask (CsvMaskGenerator)
          • Custom Categorical (CustomCategoricalGenerator)
          • Date Truncation (DateTruncationGenerator)
          • Email (EmailGenerator)
          • Event Timestamps (EventGenerator)
          • File Name (FileNameGenerator)
          • Find and Replace (FindAndReplaceGenerator)
          • FNR (FnrGenerator)
          • Geo (GeoGenerator)
          • HIPAA Address (HipaaAddressGenerator)
          • Hostname (HostnameGenerator)
          • HStore Mask (HStoreMaskGenerator)
          • HTML Mask (HtmlMaskGenerator)
          • Integer Key (IntegerPkGenerator)
          • International Address (InternationalAddressGenerator)
          • IP Address (IPAddressGenerator)
          • JSON Mask (JsonMaskGenerator)
          • MAC Address (MACAddressGenerator)
          • Mongo ObjectId Key (ObjectIdPkGenerator)
          • Name (NameGenerator)
          • Noise Generator (NoiseGenerator)
          • Null (NullGenerator)
          • Numeric String Key (NumericStringPkGenerator)
          • Passthrough (PassthroughGenerator)
          • Phone (USPhoneNumberGenerator)
          • Random Boolean (RandomBooleanGenerator)
          • Random Double (RandomDoubleGenerator)
          • Random Hash (RandomStringGenerator)
          • Random Integer (RandomIntegerGenerator)
          • Random Timestamp (RandomTimestampGenerator)
          • Random UUID (UUIDGenerator)
          • Regex Mask (RegexMaskGenerator)
          • Sequential Integer (UniqueIntegerGenerator)
          • Shipping Container (ShippingContainerGenerator)
          • SIN (SINGenerator)
          • SSN (SsnGenerator)
          • Struct Mask (StructMaskGenerator)
          • Timestamp Shift (TimestampShiftGenerator)
          • Unique Email (UniqueEmailGenerator)
          • URL (UrlGenerator)
          • UUID Key (UuidPkGenerator)
          • XML Mask (XmlMaskGenerator)
      • Configure subsetting
      • Check for and resolve schema changes
      • Run data generation jobs
      • Schedule data generation jobs
    • Example script: Starting a data generation job
    • Example script: Polling for a job status and creating a Docker package
Powered by GitBook
On this page
  • Selecting the data generation option
  • Warning for non-conflicting schema changes
  • Warning for the Ephemeral Cloud database access configuration
  • Confirming the generation details
  • Indicating whether to use upsert
  • Indicating whether to generate a subset
  • Determining the data generation process to use (Oracle, SQL Server, MySQL only)
  • Enabling diagnostic logging for the job
  • Indicating whether to generate performance metrics
  • Viewing the destination location
  • Verifying the intermediate database connection information (for upsert)
  • Viewing generation performance tips
  • Viewing suggested columns to index
  • Hint to truncate tables
  • Starting the generation job
  • Starting an upsert job based on the most recent data generation

Was this helpful?

Export as PDF
  1. Running data generation
  2. Running data generation jobs

Running data generation manually

Last updated 1 month ago

Was this helpful?

Selecting the data generation option

Required workspace permission: Run data generation

To start the data generation, at the top right of the workspace management view, click Generate Data.

As you configure the data generation options, Structural runs checks to verify that you can use the current configuration to generate data.

If any of these checks do not pass, then when you click Generate Data, Structural displays information about why you cannot run the data generation job.

If all of those checks pass, then when you click Generate Data, if there are no warnings, the Confirm Generation panel displays.

Warning for non-conflicting schema changes

Data generation is always blocked by conflicting schema changes.

The workspace configuration includes whether to block data generation for all schema changes, including non-conflicting changes.

If this setting is turned off, then if there are non-conflicting schema changes, when you click Generate Data, a warning displays. Non-conflicting schema changes include new tables and columns. If the new columns contain sensitive data, then if you do not assign generators before you generate data, that sensitive data will be in the destination database.

If you are sure that the data in the new tables and columns is not sensitive, then to continue to the Confirm Generation panel, click Continue to Data Generation.

Warning for the Ephemeral Cloud database access configuration

When databases are publicly accessible, the Confirm Generation panel always displays a warning. The warning is a reminder that databases are publicly accessible. To limit access, you must enable IP allowlisting in Ephemeral.

When IP allowlisting is enabled, a warning only displays when the temporary table is preserved. The warning in this case is a reminder to make sure that the IP allowlist includes your IP address, so that you can connect to the database.

Confirming the generation details

The Confirm Generation panel allows you to confirm the details for the data generation.

Indicating whether to use upsert

If upsert is available for the workspace, then you can also determine whether to use upsert for data generation.

If upsert is enabled for the workspace, then by default Use Upsert is in the on position.

To not use upsert, toggle Use Upsert to the off position. When upsert is turned off, the data generation is a simple data generation that directly populates and replaces the destination database.

Indicating whether to generate a subset

If you configured subsetting, then you can indicate whether to only generate the subset.

To create a subset based on the current subsetting configuration, toggle Use Subsetting to the on position.

The initial setting matches the current setting in the subsetting configuration. If Use subsetting is enabled in Subsetting view, then it is enabled by default on the Generation Confirmation panel.

When you change the setting on the generation confirmation panel, it also updates the setting on Subsetting view.

Determining the data generation process to use (Oracle, SQL Server, MySQL only)

Tonic.ai has released an improved version of the data generation process. It is used automatically for several data connectors. It is optional for Oracle, SQL Server, and MySQL.

For the new process, the job type is Data Pipeline Generation instead of Data Generation.

By default, Oracle and SQL Server workspaces use the new process, and MySQL workspaces use the previous process.

On the Confirm Generation panel, the Data Pipeline V2 toggle indicates whether to use the new process:

  • When the toggle is in the off position, Structural uses the previous process.

  • When the toggle is in the on position, Structural uses the new process.

Enabling diagnostic logging for the job

Required global permission: Enable diagnostic logging

If the data connector is not configured to use diagnostic logging, then you can choose whether to enable diagnostic logging for an individual data generation job. The option is also available for data connectors that do not have a diagnostic logging setting.

On the Confirm Generation panel, to enable diagnostic logging for the job, toggle Enable Diagnostic Logging to the on position.

Access to diagnostic logs is also controlled by the Enable diagnostic logging global permission. If you do not have this permission, then you cannot download diagnostic logs.

Indicating whether to generate performance metrics

Required workspace permission: Download job logs

To help to troubleshoot issues, for workspaces that use the newer data generation processing, you can configure the data generation job to also generate performance metrics.

The performance metrics start when a specified table is processed, and continue for a specified length of time.

To enable performance metrics for the data generation job:

  1. Toggle Collect Performance Metrics to the on position.

  2. From the Table Trigger dropdown list, select the table that triggers the performance metrics.

  3. From the Trace Duration dropdown list, select the length of time to run the performance metrics.

Viewing the destination location

The Confirm Generation panel provides the destination information for the workspace. To display the destination database connection details, click Destination Settings.

Depending on the workspace configuration and data connector type, the destination information is either:

  • Connection information for a database server.

  • A storage location such as an S3 bucket.

  • Configuration for an Ephemeral snapshot.

  • Information to create container artifacts.

If the destination information is incorrect, to navigate to the workspace configuration view to make updates, click Edit Destination Settings.

Verifying the intermediate database connection information (for upsert)

When upsert is enabled, the Confirm Generation panel provides access to the connection information for the intermediate database. To display the intermediate database connection details, click Intermediate Upsert Database.

If the intermediate database information is incorrect, to navigate to the workspace configuration view to make updates, click Edit Intermediate.

Viewing generation performance tips

For data generation, assigning Truncate table mode to tables that you don't need data for can improve generation performance.

For subsetting, if an upstream table is very large, and the foreign key columns are not indexed, then it can make the subsetting process run more slowly.

The Want faster generations? message displays at the bottom of the Confirm Generation panel. It displays for all non-subsetting jobs. For subsetting jobs, the panel only displays if Structural identified columns that you should consider indexing.

To display information about tips for faster generation, click Generation Tips.

Viewing suggested columns to index

On the Generation Tips panel for subsetting jobs, the Add Indexes panel displays the first few columns that you might consider indexing.

To display a panel with a suggested SQL command to add the index, click the information icon next to the column.

On the panel, to copy the command to the clipboard, click Copy SQL to Clipboard.

If there are additional columns that are not listed, then to display the full list of columns to index, click Show all columns.

On the full list, to download the list to a CSV file, click Download list of columns (.csv).

Hint to truncate tables

On the Generation Tips panel for non-subsetting jobs, the Truncate Tables panel displays the hint to truncate tables that contain data that you do not need in the destination database.

To navigate to Database View to change the current configuration, click Go to Database View.

Starting the generation job

On the Confirm Generation panel, after you confirm the generation details, to start the data generation, click Run Generation.

When upsert is enabled, to start the data generation and upsert jobs:

  1. Click Run Generation + Upsert.

  2. In the menu, click Run Generation + Upsert.

Starting an upsert job based on the most recent data generation

If upsert is enabled for a workspace, then on the Confirm Generation panel, the more common option is to run both data generation and upsert.

After you run at least one successful data generation to the intermediate database, then you can also choose to run only the upsert process.

For example, if the data generation succeeds but the upsert process fails, then after you address the issues that caused the upsert to fail, you can run the upsert process again.

You also must start the upsert job manually if you turn off Automatically Start Upsert After Successful Data Generation in the workspace settings.

From the Confirm Generation panel, to run upsert only:

  1. Click the Run Generation + Upsert button.

  2. In the menu, click Run Upsert Only.

When you run upsert only, the process uses the results of the most recent data generation.

When a workspace , the Confirm Generation panel can include a warning about database accessibility.

Data generation to Ephemeral includes the creation of a temporary database. By default, the temporary database is removed after the snapshot is created. You can also .

By default, databases created in Ephemeral Cloud are publicly accessible. To restrict database access to specific IP addresses, an organization can choose to enable IP allowlisting. For more information, go to in the Ephemeral documentation.

By default, Structural redacts sensitive values from the logs. To help support troubleshooting, you can configure some Structural data connectors to use diagnostic logging, which generates unredacted versions of the log files. For details, go to .

For a workspace, if the source files came from a local file system, then the destination files are written to the large file store in the Structural application database. You can .

If the destination data is written to a container repository, then from the Confirm Generation panel, you can configure custom tag values to use for the artifacts that the data generation job generates. For information about how to configure the tag values, go to .

Structural displays a notification that the job has started. To track the progress of the data generation job and view the results, click the View Job button on the notification, or go to .

writes data to a snapshot on Ephemeral Cloud
file connector
download the most recently generated files
Jobs view
choose to preserve it
Enabling diagnostic logs across a Structural instance for specific data connectors
Providing tags for the container artifacts
Generate Data button to start a data generation
Confirm Generation panel
Confirm Generation panel with the Data Pipeline V2 option
Configuration options for generating performance traces
Generation Tips panel with indexing suggestions
Example SQL command for indexing
Full list of indexing suggestions
Upsert generation options
Configuring an allowlist for Ephemeral Cloud database connections