LogoLogo
Release notesAPI docsDocs homeStructural CloudTonic.ai
  • Tonic Structural User Guide
  • About Tonic Structural
    • Structural data generation workflow
    • Structural deployment types
    • Structural implementation roles
    • Structural license plans
  • Logging into Structural for the first time
  • Getting started with the Structural free trial
  • Managing your user account
  • Frequently Asked Questions
  • Tutorial videos
  • Creating and managing workspaces
    • Managing workspaces
      • Viewing your list of workspaces
      • Creating, editing, or deleting a workspace
      • Workspace configuration settings
        • Workspace identification and connection type
        • Data connection settings
        • Configuring secrets managers for database connections
        • Data generation settings
        • Enabling and configuring upsert
        • Writing output to Tonic Ephemeral
        • Writing output to a container repository
        • Advanced workspace overrides
      • About the workspace management view
      • About workspace inheritance
      • Assigning tags to a workspace
      • Exporting and importing the workspace configuration
    • Managing access to workspaces
      • Sharing workspace access
      • Transferring ownership of a workspace
    • Viewing workspace jobs and job details
  • Configuring data generation
    • Privacy Hub
    • Database View
      • Viewing and configuring tables
      • Viewing the column list
      • Displaying sample data for a column
      • Configuring an individual column
      • Configuring multiple columns
      • Identifying similar columns
      • Commenting on columns
    • Table View
    • Working with document-based data
      • Performing scans on collections
      • Using Collection View
    • Identifying sensitive data
      • Running the Structural sensitivity scan
      • Manually indicating whether a column is sensitive
      • Built-in sensitivity types that Structural detects
      • Creating and managing custom sensitivity rules
    • Table modes
    • Generator information
      • Generator summary
      • Generator reference
        • Address
        • Algebraic
        • Alphanumeric String Key
        • Array Character Scramble
        • Array JSON Mask
        • Array Regex Mask
        • ASCII Key
        • Business Name
        • Categorical
        • Character Scramble
        • Character Substitution
        • Company Name
        • Conditional
        • Constant
        • Continuous
        • Cross Table Sum
        • CSV Mask
        • Custom Categorical
        • Date Truncation
        • Email
        • Event Timestamps
        • File Name
        • Find and Replace
        • FNR
        • Geo
        • HIPAA Address
        • Hostname
        • HStore Mask
        • HTML Mask
        • Integer Key
        • International Address
        • IP Address
        • JSON Mask
        • MAC Address
        • Mongo ObjectId Key
        • Name
        • Noise Generator
        • Null
        • Numeric String Key
        • Passthrough
        • Phone
        • Random Boolean
        • Random Double
        • Random Hash
        • Random Integer
        • Random Timestamp
        • Random UUID
        • Regex Mask
        • Sequential Integer
        • Shipping Container
        • SIN
        • SSN
        • Struct Mask
        • Timestamp Shift Generator
        • Unique Email
        • URL
        • UUID Key
        • XML Mask
      • Generator characteristics
        • Enabling consistency
        • Linking generators
        • Differential privacy
        • Partitioning a column
        • Data-free generators
        • Supporting uniqueness constraints
        • Format-preserving encryption (FPE)
      • Generator types
        • Composite generators
        • Primary key generators
    • Generator assignment and configuration
      • Reviewing and applying recommended generators
      • Assigning and configuring generators
      • Document View for file connector JSON columns
      • Generator hints and tips
      • Managing generator presets
      • Configuring and using Structural data encryption
      • Custom value processors
    • Subsetting data
      • About subsetting
      • Using table filtering for data warehouses and Spark-based data connectors
      • Viewing the current subsetting configuration
      • Subsetting and foreign keys
      • Configuring subsetting
      • Viewing and managing configuration inheritance
      • Viewing the subset creation steps
      • Viewing previous subsetting data generation runs
      • Generating cohesive subset data from related databases
      • Other subsetting hints and tips
    • Viewing and adding foreign keys
    • Viewing and resolving schema changes
    • Tracking changes to workspaces, generator presets, and sensitivity rules
    • Using the Privacy Report to verify data protection
  • Running data generation
    • Running data generation jobs
      • Types of data generation
      • Data generation process
      • Running data generation manually
      • Scheduling data generation
      • Issues that prevent data generation
    • Managing data generation performance
    • Viewing and downloading container artifacts
    • Post-job scripts
    • Webhooks
  • Installing and Administering Structural
    • Structural architecture
    • Using Structural securely
    • Deploying a self-hosted Structural instance
      • Deployment checklist
      • System requirements
      • Deploying with Docker Compose
      • Deploying on Kubernetes with Helm
      • Enabling the option to write output data to a container repository
        • Setting up a Kubernetes cluster to use to write output data to a container repository
        • Required access to write destination data to a container repository
      • Entering and updating your license key
      • Setting up host integration
      • Working with the application database
      • Setting up a secret
      • Setting a custom certificate
    • Using Structural Cloud
      • Structural Cloud notes
      • Setting up and managing a Structural Cloud pay-as-you-go subscription
      • Structural Cloud onboarding
    • Managing user access to Structural
      • Structural organizations
      • Determining whether users can create accounts
      • Creating a new account in an existing organization
      • Single sign-on (SSO)
        • Structural user authentication with SSO
        • Enabling and configuring SSO on Structural Cloud
        • Synchronizing SSO groups with Structural
        • Viewing the list of SSO groups in Tonic Structural
        • AWS IAM Identity Center
        • Duo
        • GitHub
        • Google
        • Keycloak
        • Microsoft Entra ID (previously Azure Active Directory)
        • Okta
        • OpenID Connect (OIDC)
        • SAML
      • Managing Structural users
      • Managing permissions
        • About permission sets
        • Built-in permission sets
        • Available permissions
        • Viewing the lists of global and workspace permission sets
        • Configuring custom permission sets
        • Selecting default permission sets
        • Configuring access to global permission sets
        • Setting initial access to all global permissions
        • Granting Account Admin access for a Structural Cloud organization
    • Structural monitoring and logging
      • Monitoring Structural services
      • Performing health checks
      • Downloading the usage report
      • Tracking user access and permissions
      • Redacted and diagnostic (unredacted) logs
      • Data that Tonic.ai collects
      • Verifying and enabling telemetry sharing
    • Configuring environment settings
    • Updating Structural
  • Connecting to your data
    • About data connectors
    • Overview for database administrators
    • Data connector summary
    • Amazon DynamoDB
      • System requirements and limitations for DynamoDB
      • Structural differences and limitations with DynamoDB
      • Before you create a DynamoDB workspace
      • Configuring DynamoDB workspace data connections
    • Amazon EMR
      • Structural process overview for Amazon EMR
      • System requirements for Amazon EMR
      • Structural differences and limitations with Amazon EMR
      • Before you create an Amazon EMR workspace
        • Creating IAM roles for Structural and Amazon EMR
        • Creating Athena workgroups
        • Configuration for cross-account setups
      • Configuring Amazon EMR workspace data connections
    • Amazon Redshift
      • Structural process overview for Amazon Redshift
      • Structural differences and limitations with Amazon Redshift
      • Before you create an Amazon Redshift workspace
        • Required AWS instance profile permissions for Amazon Redshift
        • Setting up the AWS Lambda role for Amazon Redshift
        • AWS KMS permissions for Amazon SQS message encryption
        • Amazon Redshift-specific Structural environment settings
        • Source and destination database permissions for Amazon Redshift
      • Configuring Amazon Redshift workspace data connections
    • Databricks
      • Structural process overview for Databricks
      • System requirements for Databricks
      • Structural differences and limitations with Databricks
      • Before you create a Databricks workspace
        • Granting access to storage
        • Setting up your Databricks cluster
        • Configuring the destination database schema creation
      • Configuring Databricks workspace data connections
    • Db2 for LUW
      • System requirements for Db2 for LUW
      • Structural differences and limitations with Db2 for LUW
      • Before you create a Db2 for LUW workspace
      • Configuring Db2 for LUW workspace data connections
    • File connector
      • Overview of the file connector process
      • Supported file and content types
      • Structural differences and limitations with the file connector
      • Before you create a file connector workspace
      • Configuring the file connector storage type and output options
      • Managing file groups in a file connector workspace
      • Downloading generated file connector files
    • Google BigQuery
      • Structural differences and limitations with Google BigQuery
      • Before you create a Google BigQuery workspace
      • Configuring Google BigQuery workspace data connections
      • Resolving schema changes for de-identified views
    • MongoDB
      • System requirements for MongoDB
      • Structural differences and limitations with MongoDB
      • Configuring MongoDB workspace data connections
      • Other MongoDB hints and tips
    • MySQL
      • System requirements for MySQL
      • Before you create a MySQL workspace
      • Configuring MySQL workspace data connections
    • Oracle
      • Known limitations for Oracle schema objects
      • System requirements for Oracle
      • Structural differences and limitations with Oracle
      • Before you create an Oracle workspace
      • Configuring Oracle workspace data connections
    • PostgreSQL
      • System requirements for PostgreSQL
      • Before you create a PostgreSQL workspace
      • Configuring PostgreSQL workspace data connections
    • Salesforce
      • System requirements for Salesforce
      • Structural differences and limitations with Salesforce
      • Before you create a Salesforce workspace
      • Configuring Salesforce workspace data connections
    • Snowflake on AWS
      • Structural process overviews for Snowflake on AWS
      • Structural differences and limitations with Snowflake on AWS
      • Before you create a Snowflake on AWS workspace
        • Required AWS instance profile permissions for Snowflake on AWS
        • Other configuration for Lambda processing
        • Source and destination database permissions for Snowflake on AWS
        • Configuring whether Structural creates the Snowflake on AWS destination database schema
      • Configuring Snowflake on AWS workspace data connections
    • Snowflake on Azure
      • Structural process overview for Snowflake on Azure
      • Structural differences and limitations with Snowflake on Azure
      • Before you create a Snowflake on Azure workspace
      • Configuring Snowflake on Azure workspace data connections
    • Spark SDK
      • Structural process overview for the Spark SDK
      • Structural differences and limitations with the Spark SDK
      • Configuring Spark SDK workspace data connections
      • Using Spark to run de-identification of the data
    • SQL Server
      • System requirements for SQL Server
      • Before you create a SQL Server workspace
      • Configuring SQL Server workspace data connections
    • Yugabyte
      • System requirements for Yugabyte
      • Structural differences and limitations with Yugabyte
      • Before you create a Yugabyte workspace
      • Configuring Yugabyte workspace data connections
      • Troubleshooting Yugabyte data generation issues
  • Using the Structural API
    • About the Structural API
    • Getting an API token
    • Getting the workspace ID
    • Using the Structural API to perform tasks
      • Configure environment settings
      • Manage generator presets
        • Retrieving the list of generator presets
        • Structure of a generator preset
        • Creating a custom generator preset
        • Updating an existing generator preset
        • Deleting a generator preset
      • Manage custom sensitivity rules
      • Create a workspace
      • Connect to source and destination data
      • Manage file groups in a file connector workspace
      • Assign table modes and filters to source database tables
      • Set column sensitivity
      • Assign generators to columns
        • Getting the generator IDs and available metadata
        • Updating generator configurations
        • Structure of a generator assignment
        • Generator API reference
          • Address (AddressGenerator)
          • Algebraic (AlgebraicGenerator)
          • Alphanumeric String Key (AlphaNumericPkGenerator)
          • Array Character Scramble (ArrayTextMaskGenerator)
          • Array JSON Mask (ArrayJsonMaskGenerator)
          • Array Regex Mask (ArrayRegexMaskGenerator)
          • ASCII Key (AsciiPkGenerator)
          • Business Name (BusinessNameGenerator)
          • Categorical (CategoricalGenerator)
          • Character Scramble (TextMaskGenerator)
          • Character Substitution (StringMaskGenerator)
          • Company Name (CompanyNameGenerator)
          • Conditional (ConditionalGenerator)
          • Constant (ConstantGenerator)
          • Continuous (GaussianGenerator)
          • Cross Table Sum (CrossTableAggregateGenerator)
          • CSV Mask (CsvMaskGenerator)
          • Custom Categorical (CustomCategoricalGenerator)
          • Date Truncation (DateTruncationGenerator)
          • Email (EmailGenerator)
          • Event Timestamps (EventGenerator)
          • File Name (FileNameGenerator)
          • Find and Replace (FindAndReplaceGenerator)
          • FNR (FnrGenerator)
          • Geo (GeoGenerator)
          • HIPAA Address (HipaaAddressGenerator)
          • Hostname (HostnameGenerator)
          • HStore Mask (HStoreMaskGenerator)
          • HTML Mask (HtmlMaskGenerator)
          • Integer Key (IntegerPkGenerator)
          • International Address (InternationalAddressGenerator)
          • IP Address (IPAddressGenerator)
          • JSON Mask (JsonMaskGenerator)
          • MAC Address (MACAddressGenerator)
          • Mongo ObjectId Key (ObjectIdPkGenerator)
          • Name (NameGenerator)
          • Noise Generator (NoiseGenerator)
          • Null (NullGenerator)
          • Numeric String Key (NumericStringPkGenerator)
          • Passthrough (PassthroughGenerator)
          • Phone (USPhoneNumberGenerator)
          • Random Boolean (RandomBooleanGenerator)
          • Random Double (RandomDoubleGenerator)
          • Random Hash (RandomStringGenerator)
          • Random Integer (RandomIntegerGenerator)
          • Random Timestamp (RandomTimestampGenerator)
          • Random UUID (UUIDGenerator)
          • Regex Mask (RegexMaskGenerator)
          • Sequential Integer (UniqueIntegerGenerator)
          • Shipping Container (ShippingContainerGenerator)
          • SIN (SINGenerator)
          • SSN (SsnGenerator)
          • Struct Mask (StructMaskGenerator)
          • Timestamp Shift (TimestampShiftGenerator)
          • Unique Email (UniqueEmailGenerator)
          • URL (UrlGenerator)
          • UUID Key (UuidPkGenerator)
          • XML Mask (XmlMaskGenerator)
      • Configure subsetting
      • Check for and resolve schema changes
      • Run data generation jobs
      • Schedule data generation jobs
    • Example script: Starting a data generation job
    • Example script: Polling for a job status and creating a Docker package
Powered by GitBook
On this page
  • Identifying and configuring target tables
  • Options to specify the data subset
  • Configuring a percentage target table
  • Configuring a WHERE clause target table
  • Removing a target table
  • Identifying lookup tables
  • Properties of a lookup table
  • Identifying an individual lookup table
  • Identifying multiple lookup tables
  • Removing lookup tables
  • Filtering optional records
  • Filtering by date
  • Filtering with a WHERE clause
  • Removing a filter
  • Identifying configuration changes since the most recent subsetting run
  • Determining how to process tables that are not in the subset
  • Determining whether to use subsetting during data generation
  • Enabling parallel processing for subsetting

Was this helpful?

Export as PDF
  1. Configuring data generation
  2. Subsetting data

Configuring subsetting

Last updated 4 months ago

Was this helpful?

Identifying and configuring target tables

A target table is a table for which you specify a subset of the data to include in the destination database.

Options to specify the data subset

To identify the subset of data to include, you can either:

  • Specify a percentage of the table to include in the destination database. You can use this option when you care about the specific volume of data, but not the specific rows. Tonic Structural converts the percentage to a filter or a WHERE clause, depending on your database type. Depending on how your tables are related, the target tables in the final subset might contain more rows than the percentage that you specified. These additional rows are required to maintain referential integrity. To view the tables that contribute to the additional rows, see the . For additional assistance, reach out to your Tonic.ai contact.

  • Provide a WHERE clause to specify the subset of data to include in the destination database. The WHERE clause allows you to be more specific about the data to include. For example, you might want to only include data for a specific user or date range.

To combine a specified set of records with a random set of the remaining records, use a WHERE clause. For example, to get all users that are from Alabama, and 5 percent of the other records, use the following WHERE clause:

state = "Alabama" OR random() < 0.05 

Configuring a percentage target table

To identify and configure subsetting for a target table:

  1. In Table View or Graph View, click the table.

  2. On the table details panel, from the Select Table Type dropdown list, select Target Table (Percentage).

  3. In the Target Percentage field, type the percentage of the data to include in the destination database. The default is 5, which indicates to use 5% of the rows in the table. You can specify a decimal value, including a value that is less than 1. For example, you might configure the subset to include .5 percent of the rows, or 33.33 percent of the rows.

Configuring a WHERE clause target table

  1. In Table View or Graph View, click the table.

  2. On the table details panel, from the Select Table Type dropdown list, select Target Table (Where Clause).

  3. In the Target Where Clause field, type the WHERE clause to use to identify the data to include in the destination database. For example, the target table contains a column called event_id. To select all rows where event_id is greater than 1000, add the following WHERE clause: event_id > 1000

  4. For a more complex WHERE clause, you can display an editor with a larger text area.

    1. Click Open in Editor.

    2. In the text area, enter the WHERE clause.

    3. Click Save.

You can query across tables within the WHERE clause. For example, you configure the customers table as a target table, but you also want to use information from the customers_legacy table to identify the target records in customers.

In the following example query, the matching records in customers have a Customer_Key value that matches a CustomerKey value in customers_legacy, and where the value of Occupation in customers_legacy is Detective:

"Customer_Key" IN (
    SELECT "Customer_Key"
    FROM customers_legacy
    WHERE customers_legacy."Customer_Key"=customers."Customer_Key"
    AND customers_legacy."Occupation" = 'Detective'
)

You can also create a query that selects a random percentage of a specified set of data.

For example, in PostgreSQL, to select 50% of the records that have an identifier that is divisible by 3, you could use the following WHERE clause:

id % 3 = 0 and random() < 0.5
order by id

Removing a target table

To remove a target table:

  1. In Table View or Graph View, click the table.

  2. On the table details panel, from the table type dropdown list, select Remove.

Identifying lookup tables

A lookup table contains a list of values that are used to populate columns in other tables. For example, a list of states, countries, or currencies. Lookup tables are sometimes referred to as reference tables.

Structural always copies lookup tables to the destination database in their entirety. If you do not configure a table as a lookup table, then Structural treats the table as a related table, and copies only rows that are used in the subset data. Structural also pulls in rows from other tables that refer to the table, but that are not necessarily related to the target tables. This could result in an unexpectedly large subset.

For example, in a Users table, every user record refers to a state in the States table. If you do not identify States as a lookup table, then the subset would include every record in the Users table.

Properties of a lookup table

Here are some typical properties of a lookup table:

  • It is fairly small and rarely updated.

  • Many tables point to the table, but it does not point to another table.

  • The table contains a set of unique values.

Identifying an individual lookup table

To identify an individual table as a lookup table:

  1. In Table View or Graph View, click the table.

  2. On the table details panel, from the Select Table Type dropdown list, select Lookup Table.

Identifying multiple lookup tables

To identify multiple tables as lookup tables:

  1. On Table View, check the checkbox for each table to identify as a lookup table.

  2. From the Actions dropdown list, select Add Lookup Tables.

Removing lookup tables

To remove the lookup designation for a table:

  1. In Table View or Graph View, click the table.

  2. On the table details panel, from the dropdown list, select Remove.

Filtering optional records

The Salesforce data connector does not support filtering optional records.

Records that reference required subset records are considered upstream records. Unlike downstream records, upstream records are not required for referential integrity. Upstream records are optional.

To reduce the size of the subset, you can apply a filter to these optional records. To filter the records, you can either:

  • Use a date column to specify an amount of time before the current date for which to include records. For example, you can only include records for which the update date is one week before the current date.

  • Use a WHERE clause to identify the records to include.

You can filter a table that contains both upstream and downstream records. However, the filter only applies to the optional upstream records.

In the table list, when an upstream table is filtered, a Filtered icon displays.

Filtering by date

The date filter allows you to filter optional records based on the value of a date-based column.

To filter an upstream table by date:

  1. In Table View or Graph View, click the table. On the table details panel, under Filter Optional Tables, Type is set by default to No Filter Applied, which indicates that the table is not filtered.

  2. From the Type dropdown list, select Filter By Date Column.

  3. From the Date Column dropdown list, select the date column to use for the filter. To improve performance, select a column that is indexed.

  4. Under Get data from the last, from the time unit dropdown list, select the unit of time to use for the filter. You can filter records based on their age in days, weeks, months, or years.

  5. In the field, enter the number of the selected unit before the current date for which to include the upstream records. For example, you select days as the unit, and set the number to 4. Structural then pulls related records for which the date column value is up to 4 days before the current date.

Filtering with a WHERE clause

To filter the upstream records, you can also use a WHERE clause.

To use a WHERE clause to filter the upstream records:

  1. In Table View or Graph View, click the table. On the table details panel, under Filter Optional Tables, Type is set by default to No Filter Applied, which indicates that the table is not filtered.

  2. From the Type dropdown list, select Filter by Where Clause.

  3. In the Where Clause text area, enter the WHERE clause to use to filter the related records.

    Note that if the WHERE clause evaluates to false, then Structural excludes all of the upstream records from the table. For most data connectors, to exclude all of the upstream records, you can simply set the text of the WHERE clause to false. Otherwise, use an obviously false statement such as 1=0.

  4. For a more complex WHERE clause, you can display an editor with a larger text area.

    1. Click Open in Editor.

    2. In the text area, enter the WHERE clause.

    3. Click Save.

  5. To copy the WHERE clause to the clipboard, click Copy To Clipboard.

Removing a filter

To remove an upstream filter, from the Type dropdown list, select No filter applied.

Identifying configuration changes since the most recent subsetting run

As you make changes to the subsetting configuration, Table View and Graph View indicate how the changes affect the next run of the subsetting generation when compared to the most recent subsetting generation.

When a table's inclusion in the subset is affected, on Graph View, a colored marker is added to the bottom of the table box.

On Table View, a colored icon displays next to the table. A tooltip indicates the type of change.

The possible types of changes are:

  • Added to the subset. For example:

    • A new target table

    • A table that is newly included because it is related to a new target table

    • A new lookup table

  • Removed from the subset. For example:

    • A removed target table

    • A table that is removed because it is related to a removed target table

    • A removed lookup table

  • Modified in the subset. This usually reflects a change to a target table configuration. You might:

    • Change the type of target table (percentage or WHERE clause)

    • Change the percentage

    • Change the WHERE clause

    • Change the upstream filter

When you run a subsetting generation, Tonic clears the markers.

Determining how to process tables that are not in the subset

Tables other than target tables, lookup tables, or related tables are not in the subset.

The subsetting configuration includes how to copy all of these tables to the destination database.

You can either:

  • Use the table modes that are assigned to the out-of-subset tables.

  • Truncate all of the out-of-subset tables. The table schema is preserved, but none of the data is copied to the destination database.

On Table View, on the Configuration tab, you use the Process tables that are out of subset toggle to determine how to handle these tables. After you run subsetting data generation, the toggle is on the Options tab.

By default, the setting is turned off, and Structural truncates the out-of-subset tables.

To use the assigned table mode to process each table, toggle the setting to the on position.

Determining whether to use subsetting during data generation

If you configured subsetting, then when you run a data generation job, you can either generate the entire dataset, or use the subsetting configuration to generate a subset.

On Table View, on the Configuration tab, the Use Subsetting toggle indicates whether to generate a subset. After you run subsetting data generation, the Use Subsetting toggle is on the Options tab.

By default, the toggle is in the off position. When you run a data generation job, it generates the entire destination data dataset.

To instead generate a subset, toggle Use subsetting to the on position.

Enabling parallel processing for subsetting

You can sometimes use parallel processing to improve the performance of the subsetting process. Parallel processing allows multiple subsetting steps to be processed at the same time. The steps cannot rely on the output of other steps that are processed in parallel.

The effect of subsetting parallelism on performance depends on:

  • Your subsetting configuration.

  • The layout of your schema.

  • The performance characteristics of the machine that runs Structural.

  • The performance characteristics of your databases.

We recommend that you start with a relatively small number such as 4, and then run a data generation job to see how it affects performance. If performance improves, you can increase the number incrementally until the performance no longer improves.

The environment setting only controls the maximum number of steps that can be processed in parallel. Performance should not degrade if your system cannot support parallelism or won't benefit from using it.

If you have any other questions, contact support@tonic.ai.

When you , you are also prompted to confirm whether to generate the entire dataset or a subset. These two toggles are synchronized. If you turn on the Use Subsetting toggle on the Configuration tab, then it is on by default on the generation confirmation panel.

To enable parallel processing for subsetting, set the TONIC_TABLE_PARALLELISM to a number greater than 1 (the default). You can configure this setting from Structural Settings. This setting determines the maximum number of subsetting steps that Structural can process in parallel. For regular data generation, it also determines the number of tables that Structural operates on at the same time.

run a data generation job
environment setting
subset steps
Configuration for a percentage target table
Configuration for a where clause target table
Actions menu for selected rows
Filter Optional Tables for date column
Filter Optional Tables fields for WHERE clause
Graph View table box marked as changed
Change marker for a table in the Table View list
Configuration for how to process tables that are not in the subset