LogoLogo
Release notesAPI docsDocs homeStructural CloudTonic.ai
  • Tonic Structural User Guide
  • About Tonic Structural
    • Structural data generation workflow
    • Structural deployment types
    • Structural implementation roles
    • Structural license plans
  • Logging into Structural for the first time
  • Getting started with the Structural free trial
  • Managing your user account
  • Frequently Asked Questions
  • Tutorial videos
  • Creating and managing workspaces
    • Managing workspaces
      • Viewing your list of workspaces
      • Creating, editing, or deleting a workspace
      • Workspace configuration settings
        • Workspace identification and connection type
        • Data connection settings
        • Configuring secrets managers for database connections
        • Data generation settings
        • Enabling and configuring upsert
        • Writing output to Tonic Ephemeral
        • Writing output to a container repository
        • Advanced workspace overrides
      • About the workspace management view
      • About workspace inheritance
      • Assigning tags to a workspace
      • Exporting and importing the workspace configuration
    • Managing access to workspaces
      • Sharing workspace access
      • Transferring ownership of a workspace
    • Viewing workspace jobs and job details
  • Configuring data generation
    • Privacy Hub
    • Database View
      • Viewing and configuring tables
      • Viewing the column list
      • Displaying sample data for a column
      • Configuring an individual column
      • Configuring multiple columns
      • Identifying similar columns
      • Commenting on columns
    • Table View
    • Working with document-based data
      • Performing scans on collections
      • Using Collection View
    • Identifying sensitive data
      • Running the Structural sensitivity scan
      • Manually indicating whether a column is sensitive
      • Built-in sensitivity types that Structural detects
      • Creating and managing custom sensitivity rules
    • Table modes
    • Generator information
      • Generator summary
      • Generator reference
        • Address
        • Algebraic
        • Alphanumeric String Key
        • Array Character Scramble
        • Array JSON Mask
        • Array Regex Mask
        • ASCII Key
        • Business Name
        • Categorical
        • Character Scramble
        • Character Substitution
        • Company Name
        • Conditional
        • Constant
        • Continuous
        • Cross Table Sum
        • CSV Mask
        • Custom Categorical
        • Date Truncation
        • Email
        • Event Timestamps
        • File Name
        • Find and Replace
        • FNR
        • Geo
        • HIPAA Address
        • Hostname
        • HStore Mask
        • HTML Mask
        • Integer Key
        • International Address
        • IP Address
        • JSON Mask
        • MAC Address
        • Mongo ObjectId Key
        • Name
        • Noise Generator
        • Null
        • Numeric String Key
        • Passthrough
        • Phone
        • Random Boolean
        • Random Double
        • Random Hash
        • Random Integer
        • Random Timestamp
        • Random UUID
        • Regex Mask
        • Sequential Integer
        • Shipping Container
        • SIN
        • SSN
        • Struct Mask
        • Timestamp Shift Generator
        • Unique Email
        • URL
        • UUID Key
        • XML Mask
      • Generator characteristics
        • Enabling consistency
        • Linking generators
        • Differential privacy
        • Partitioning a column
        • Data-free generators
        • Supporting uniqueness constraints
        • Format-preserving encryption (FPE)
      • Generator types
        • Composite generators
        • Primary key generators
    • Generator assignment and configuration
      • Reviewing and applying recommended generators
      • Assigning and configuring generators
      • Document View for file connector JSON columns
      • Generator hints and tips
      • Managing generator presets
      • Configuring and using Structural data encryption
      • Custom value processors
    • Subsetting data
      • About subsetting
      • Using table filtering for data warehouses and Spark-based data connectors
      • Viewing the current subsetting configuration
      • Subsetting and foreign keys
      • Configuring subsetting
      • Viewing and managing configuration inheritance
      • Viewing the subset creation steps
      • Viewing previous subsetting data generation runs
      • Generating cohesive subset data from related databases
      • Other subsetting hints and tips
    • Viewing and adding foreign keys
    • Viewing and resolving schema changes
    • Tracking changes to workspaces, generator presets, and sensitivity rules
    • Using the Privacy Report to verify data protection
  • Running data generation
    • Running data generation jobs
      • Types of data generation
      • Data generation process
      • Running data generation manually
      • Scheduling data generation
      • Issues that prevent data generation
    • Managing data generation performance
    • Viewing and downloading container artifacts
    • Post-job scripts
    • Webhooks
  • Installing and Administering Structural
    • Structural architecture
    • Using Structural securely
    • Deploying a self-hosted Structural instance
      • Deployment checklist
      • System requirements
      • Deploying with Docker Compose
      • Deploying on Kubernetes with Helm
      • Enabling the option to write output data to a container repository
        • Setting up a Kubernetes cluster to use to write output data to a container repository
        • Required access to write destination data to a container repository
      • Entering and updating your license key
      • Setting up host integration
      • Working with the application database
      • Setting up a secret
      • Setting a custom certificate
    • Using Structural Cloud
      • Structural Cloud notes
      • Setting up and managing a Structural Cloud pay-as-you-go subscription
      • Structural Cloud onboarding
    • Managing user access to Structural
      • Structural organizations
      • Determining whether users can create accounts
      • Creating a new account in an existing organization
      • Single sign-on (SSO)
        • Structural user authentication with SSO
        • Enabling and configuring SSO on Structural Cloud
        • Synchronizing SSO groups with Structural
        • Viewing the list of SSO groups in Tonic Structural
        • AWS IAM Identity Center
        • Duo
        • GitHub
        • Google
        • Keycloak
        • Microsoft Entra ID (previously Azure Active Directory)
        • Okta
        • OpenID Connect (OIDC)
        • SAML
      • Managing Structural users
      • Managing permissions
        • About permission sets
        • Built-in permission sets
        • Available permissions
        • Viewing the lists of global and workspace permission sets
        • Configuring custom permission sets
        • Selecting default permission sets
        • Configuring access to global permission sets
        • Setting initial access to all global permissions
        • Granting Account Admin access for a Structural Cloud organization
    • Structural monitoring and logging
      • Monitoring Structural services
      • Performing health checks
      • Downloading the usage report
      • Tracking user access and permissions
      • Redacted and diagnostic (unredacted) logs
      • Data that Tonic.ai collects
      • Verifying and enabling telemetry sharing
    • Configuring environment settings
    • Updating Structural
  • Connecting to your data
    • About data connectors
    • Overview for database administrators
    • Data connector summary
    • Amazon DynamoDB
      • System requirements and limitations for DynamoDB
      • Structural differences and limitations with DynamoDB
      • Before you create a DynamoDB workspace
      • Configuring DynamoDB workspace data connections
    • Amazon EMR
      • Structural process overview for Amazon EMR
      • System requirements for Amazon EMR
      • Structural differences and limitations with Amazon EMR
      • Before you create an Amazon EMR workspace
        • Creating IAM roles for Structural and Amazon EMR
        • Creating Athena workgroups
        • Configuration for cross-account setups
      • Configuring Amazon EMR workspace data connections
    • Amazon Redshift
      • Structural process overview for Amazon Redshift
      • Structural differences and limitations with Amazon Redshift
      • Before you create an Amazon Redshift workspace
        • Required AWS instance profile permissions for Amazon Redshift
        • Setting up the AWS Lambda role for Amazon Redshift
        • AWS KMS permissions for Amazon SQS message encryption
        • Amazon Redshift-specific Structural environment settings
        • Source and destination database permissions for Amazon Redshift
      • Configuring Amazon Redshift workspace data connections
    • Databricks
      • Structural process overview for Databricks
      • System requirements for Databricks
      • Structural differences and limitations with Databricks
      • Before you create a Databricks workspace
        • Granting access to storage
        • Setting up your Databricks cluster
        • Configuring the destination database schema creation
      • Configuring Databricks workspace data connections
    • Db2 for LUW
      • System requirements for Db2 for LUW
      • Structural differences and limitations with Db2 for LUW
      • Before you create a Db2 for LUW workspace
      • Configuring Db2 for LUW workspace data connections
    • File connector
      • Overview of the file connector process
      • Supported file and content types
      • Structural differences and limitations with the file connector
      • Before you create a file connector workspace
      • Configuring the file connector storage type and output options
      • Managing file groups in a file connector workspace
      • Downloading generated file connector files
    • Google BigQuery
      • Structural differences and limitations with Google BigQuery
      • Before you create a Google BigQuery workspace
      • Configuring Google BigQuery workspace data connections
      • Resolving schema changes for de-identified views
    • MongoDB
      • System requirements for MongoDB
      • Structural differences and limitations with MongoDB
      • Configuring MongoDB workspace data connections
      • Other MongoDB hints and tips
    • MySQL
      • System requirements for MySQL
      • Before you create a MySQL workspace
      • Configuring MySQL workspace data connections
    • Oracle
      • Known limitations for Oracle schema objects
      • System requirements for Oracle
      • Structural differences and limitations with Oracle
      • Before you create an Oracle workspace
      • Configuring Oracle workspace data connections
    • PostgreSQL
      • System requirements for PostgreSQL
      • Before you create a PostgreSQL workspace
      • Configuring PostgreSQL workspace data connections
    • Salesforce
      • System requirements for Salesforce
      • Structural differences and limitations with Salesforce
      • Before you create a Salesforce workspace
      • Configuring Salesforce workspace data connections
    • Snowflake on AWS
      • Structural process overviews for Snowflake on AWS
      • Structural differences and limitations with Snowflake on AWS
      • Before you create a Snowflake on AWS workspace
        • Required AWS instance profile permissions for Snowflake on AWS
        • Other configuration for Lambda processing
        • Source and destination database permissions for Snowflake on AWS
        • Configuring whether Structural creates the Snowflake on AWS destination database schema
      • Configuring Snowflake on AWS workspace data connections
    • Snowflake on Azure
      • Structural process overview for Snowflake on Azure
      • Structural differences and limitations with Snowflake on Azure
      • Before you create a Snowflake on Azure workspace
      • Configuring Snowflake on Azure workspace data connections
    • Spark SDK
      • Structural process overview for the Spark SDK
      • Structural differences and limitations with the Spark SDK
      • Configuring Spark SDK workspace data connections
      • Using Spark to run de-identification of the data
    • SQL Server
      • System requirements for SQL Server
      • Before you create a SQL Server workspace
      • Configuring SQL Server workspace data connections
    • Yugabyte
      • System requirements for Yugabyte
      • Structural differences and limitations with Yugabyte
      • Before you create a Yugabyte workspace
      • Configuring Yugabyte workspace data connections
      • Troubleshooting Yugabyte data generation issues
  • Using the Structural API
    • About the Structural API
    • Getting an API token
    • Getting the workspace ID
    • Using the Structural API to perform tasks
      • Configure environment settings
      • Manage generator presets
        • Retrieving the list of generator presets
        • Structure of a generator preset
        • Creating a custom generator preset
        • Updating an existing generator preset
        • Deleting a generator preset
      • Manage custom sensitivity rules
      • Create a workspace
      • Connect to source and destination data
      • Manage file groups in a file connector workspace
      • Assign table modes and filters to source database tables
      • Set column sensitivity
      • Assign generators to columns
        • Getting the generator IDs and available metadata
        • Updating generator configurations
        • Structure of a generator assignment
        • Generator API reference
          • Address (AddressGenerator)
          • Algebraic (AlgebraicGenerator)
          • Alphanumeric String Key (AlphaNumericPkGenerator)
          • Array Character Scramble (ArrayTextMaskGenerator)
          • Array JSON Mask (ArrayJsonMaskGenerator)
          • Array Regex Mask (ArrayRegexMaskGenerator)
          • ASCII Key (AsciiPkGenerator)
          • Business Name (BusinessNameGenerator)
          • Categorical (CategoricalGenerator)
          • Character Scramble (TextMaskGenerator)
          • Character Substitution (StringMaskGenerator)
          • Company Name (CompanyNameGenerator)
          • Conditional (ConditionalGenerator)
          • Constant (ConstantGenerator)
          • Continuous (GaussianGenerator)
          • Cross Table Sum (CrossTableAggregateGenerator)
          • CSV Mask (CsvMaskGenerator)
          • Custom Categorical (CustomCategoricalGenerator)
          • Date Truncation (DateTruncationGenerator)
          • Email (EmailGenerator)
          • Event Timestamps (EventGenerator)
          • File Name (FileNameGenerator)
          • Find and Replace (FindAndReplaceGenerator)
          • FNR (FnrGenerator)
          • Geo (GeoGenerator)
          • HIPAA Address (HipaaAddressGenerator)
          • Hostname (HostnameGenerator)
          • HStore Mask (HStoreMaskGenerator)
          • HTML Mask (HtmlMaskGenerator)
          • Integer Key (IntegerPkGenerator)
          • International Address (InternationalAddressGenerator)
          • IP Address (IPAddressGenerator)
          • JSON Mask (JsonMaskGenerator)
          • MAC Address (MACAddressGenerator)
          • Mongo ObjectId Key (ObjectIdPkGenerator)
          • Name (NameGenerator)
          • Noise Generator (NoiseGenerator)
          • Null (NullGenerator)
          • Numeric String Key (NumericStringPkGenerator)
          • Passthrough (PassthroughGenerator)
          • Phone (USPhoneNumberGenerator)
          • Random Boolean (RandomBooleanGenerator)
          • Random Double (RandomDoubleGenerator)
          • Random Hash (RandomStringGenerator)
          • Random Integer (RandomIntegerGenerator)
          • Random Timestamp (RandomTimestampGenerator)
          • Random UUID (UUIDGenerator)
          • Regex Mask (RegexMaskGenerator)
          • Sequential Integer (UniqueIntegerGenerator)
          • Shipping Container (ShippingContainerGenerator)
          • SIN (SINGenerator)
          • SSN (SsnGenerator)
          • Struct Mask (StructMaskGenerator)
          • Timestamp Shift (TimestampShiftGenerator)
          • Unique Email (UniqueEmailGenerator)
          • URL (UrlGenerator)
          • UUID Key (UuidPkGenerator)
          • XML Mask (XmlMaskGenerator)
      • Configure subsetting
      • Check for and resolve schema changes
      • Run data generation jobs
      • Schedule data generation jobs
    • Example script: Starting a data generation job
    • Example script: Polling for a job status and creating a Docker package
Powered by GitBook
On this page
  • Subsetting summary
  • Subsetting data generation results
  • Table View
  • Information in the table list
  • Sorting the list
  • Filtering the list
  • Graph View
  • Information on the Graph View table blocks
  • How relationships display on Graph View
  • Focusing on a specific table
  • Other Graph View navigation options
  • How Structural determines the number of pre-subset and post-subset rows
  • Viewing table details

Was this helpful?

Export as PDF
  1. Configuring data generation
  2. Subsetting data

Viewing the current subsetting configuration

Last updated 4 months ago

Was this helpful?

To display Subsetting view, either:

  • On the workspace management view, in the workspace navigation bar, click Subsetting.

  • On Workspaces view, from the dropdown menu in the Name column, select Subsetting.

  • On Workspaces view, click the subsetting icon for the workspace.

The Configuration tab on Subsetting view shows the current subsetting configuration.

It consists of:

  • Subsetting summary

  • Table View and Graph View. Both views display the source data tables, show the current subsetting configuration, and allow you to update the configuration. Table View displays a tabular list of tables. Graph View displays a diagram that shows the relationships between the tables.

  • Configuration to enable subsetting for data generation.

  • Configuration for handling out-of-subset tables.

  • Results of the most recent subsetting data generation.

Subsetting summary

The panels at the top of the Configuration tab provide a clickable summary of the current subsetting configuration.

The summary includes the following values:

  • Target shows the number of target tables.

  • Lookup shows the number of lookup tables.

  • In Subset shows the number of tables that are in the subset. This includes target tables, lookup tables, and related tables.

  • Out of Subset shows the number of tables that are not in the subset.

When you click a summary panel:

  • On Table View, the table list is filtered to only display matching tables. For example, when you click Target, the list is filtered to only include target tables.

  • On Graph View, the matching tables are highlighted with a shadow behind the table objects.

Subsetting data generation results

After you run data generation with subsetting, on Table View, the Latest Results tab displays on the Configuration tab.

Before you run data generation with subsetting, the Latest Results tab does not display.

The Latest Results tab displays details for the most recent data generation with subsetting. It ignores data generation runs that do not use subsetting.

The subsetting results include:

  • The job status (successful, failed, canceled).

  • The amount of time it took to complete the run.

  • The percentage of the source data that is included in the subset destination data.

  • The volume of data in the source data and the subset destination data.

  • The percent reduction from the original source data to the subset destination data.

  • When the job began and ended.

To display the details for the data generation job, click View Job Details.

Table View

Information in the table list

The Configuration tab contains the list of tables in the source database. It shows how each table is affected by the most recently completed subsetting configuration.

For each table, the table list includes:

  • Whether the table is a target table, a lookup table, or a related table that is filtered.

  • Whether the table is in or out of the subset. Target and lookup tables are always in the subset. Related tables also are in the subset. Other tables are out of the subset.

  • The number of rows in the table before and after the subset is created. For tables that are in the subset, the percentage of table data that is in the subset. For more information, go to How Structural determines the number of pre-subset and post-subset rows.

Sorting the list

You can sort the list based on values in a selected column. To use a column to sort the list, click the column heading. To switch the sort order, click the column heading again.

The Sort by dropdown list provides the following options to sort the list:

  • Rows pre-subset - Sort by the number of rows in the table before subsetting.

  • Rows post-subset - Sort by the number of rows in the table after subsetting. Before you run a data generation job to create the subset, this value is unknown.

  • Inbound relationships - Sort the list based on the number of inbound relationships.

  • Outbound relationships - Sort the list based on the number of outbound relationships.

  • Total relationships - Sort the list based on the total number of inbound and outbound relationships.

By default, the drop-down sort options sort the table list in descending order. For example, when you select Rows pre-subset, the table that currently has the largest number of rows is at the top of the list. To change the sort direction, select the option again.

Filtering the list

You can filter the list based on:

  • The table name.

  • Whether the table is in or out of the subset.

  • Whether the table is a target or lookup table.

To filter by table name, in the filter field, begin to type text from the table name. As you type, the list is filtered to only include tables whose names contain the filter text.

To filter the list to show only target tables, lookup tables, in-subset tables, or out-of-subset tables, do one of the following:

  • Click the panel at the top of the tab.

  • From the Filter Tables drop-down list, select the filter option.

To remove a table subset status filter, click the delete icon.

You can combine a name filter and a table subset status filter. For example, you can filter the list to show in-subset tables that contain the text "test".

You cannot combine the table subset status filters. When you select a different filter, the current filter is replaced.

Graph View

Graph View displays a diagram of the source data tables and the relationships between them. It also indicates:

  • Whether each table is in the subset.

  • Whether the subsetting status for the table changed since the last subsetting data generation.

Information on the Graph View table blocks

Each table block provides the following information about the table:

  • At the top left:

    • The name of the table.

    • The name of the schema that contains the table.

  • At the top right, the status of the table in the context of the subset. A table might be a target table, a lookup table, a related table that is in the subset, or a table that is out of the subset.

  • At the bottom, the number of rows in the table before and after the subset is created. For more information, go to How Structural determines the number of pre-subset and post-subset rows.

How relationships display on Graph View

The Graph View diagram connects tables that are related to each other based on a foreign key relationship. The position of the tables indicates the type of relationship.

  • Tables that have an upstream relationship with another table are displayed above the table.

  • Tables that have a downstream relationship with another table are displayed below the table.

  • The Attendees table refers to the event for the attendee. Attendees is upstream of Events, and would display above the Events table in Graph View.

  • The Events table refers to a venue from the Venues table. Venues is downstream of Events, and would display below the Events table in Graph View.

Focusing on a specific table

To find and focus on a specific table:

  1. In the search field, begin to type text in the table name. As you type, Tonic Structural filters the list to display matching tables.

  2. When you see the table that you want, click the table name. Structural highlights the connections to other tables and displays the table details panel.

Other Graph View navigation options

To pan around the Graph View graph, click and drag.

To zoom in and out, use the navigation tools at the bottom left of Graph View.

How Structural determines the number of pre-subset and post-subset rows

For tables that contain fewer than 1,000 rows, the pre-subset number of rows is displayed as <1k.

For tables that are in the subset, the resulting rows are based on the target table and related table configuration.

If the data generation job hasn't run yet, or the details from the job are not yet available, then the number of rows after the subset is marked as unknown.

If you updated the configuration for a table since the most recent data generation, then on Table View, an information icon displays next to the post-subset value.

Viewing table details

When you click a table in either Table View or Graph View, the table details panel displays to the right of the table.

The table details include:

  • Whether the table is in the subset.

  • The number of rows before and after the subsetting. For more information, go to How Structural determines the number of pre-subset and post-subset rows.

  • The number of outbound and inbound relationships.

  • The list of inbound and outbound relationships with other tables. When you click a table name, Structural selects and displays the details for that table.

The number of direct inbound (downstream) and outbound (upstream) relationships for the table. An inbound relationship means that a primary key from another table is used as a value in the current table. An outbound relationship means that the primary key of the current table is a foreign key in another table. You can to only include the records that you need. For target tables, the relationships are used to determine the related tables that are included in the subset. The related tables can also include other tables where the relationship is indirect.

It also indicates the effect on the table of subset configuration changes that occurred since the most recent subsetting generation. For more information, go to .

In the example schema from , the Events table contains a list of events:

For tables that are not in the subset, the resulting rows are based on whether you enable Process tables that are out of subset. For more information, go to .

For target tables, the .

Configuration tab of the Subsetting view
Subsetting summary panels
Results of the most recent data generation with subsetting
Multiple filters applied to the table list
Graph View on the subsetting Configuration tab
Table details on Subsetting view
How Structural creates a subset
Identifying configuration changes since the most recent subsetting run
Determining how to process tables that are not in the subset
filter the upstream records
subset configuration