Comment on page
Configuring subsetting
A target table is a table for which you specify a subset of the data to include in the destination database.
To identify the subset of data to include, you can either:
- Specify a percentage of the table to include in the destination database. You can use this option when you care about the specific volume of data, but not the specific rows. Tonic converts the percentage to a filter or a
WHERE
clause, depending on your database type. Depending on how your tables are related, the target tables in the final subset might contain more rows than the percentage that you specified. These additional rows are required to maintain referential integrity. To view the tables that contribute to the additional rows, see the subset steps. For additional assistance, reach out to your Tonic contact.

Configuration for a percentage target table
- Provide a
WHERE
clause to specify the subset of data to include in the destination database. TheWHERE
clause allows you to be more specific about the data to include. For example, you might want to only include data for a specific user or date range.

Configuration for a where clause target table
To combine a specified set of records with a random set of the remaining records, use a
WHERE
clause. For example, to get all users that are from Alabama, and 5 percent of the other records, use the following WHERE
clause:state = "Alabama" OR random() < 0.05
To identify and configure subsetting for a target table:
- 1.In Table View or Graph View, click the table.
- 2.On the table details panel, from the Select Table Type dropdown list, select Target Table (Percentage).
- 3.In the Target Percentage field, type the percentage of the data to include in the destination database. The default is 5, which indicates to use 5% of the rows in the table. You can specify a decimal value, including a value that is less than 1. For example, you might configure the subset to include .5 percent of the rows, or 33.33 percent of the rows.
- 1.In Table View or Graph View, click the table.
- 2.On the table details panel, from the Select Table Type dropdown list, select Target Table (Where Clause).
- 3.In the Target Where Clause field, type the
WHERE
clause to use to identify the data to include in the destination database. For example, the target table contains a column calledevent_id
. To select all rows whereevent_id
is greater than 1000, add the followingWHERE
clause:event_id > 1000
- 4.For a more complex
WHERE
clause, you can display an editor with a larger text area.- 1.Click Open in Editor.
- 2.In the text area, enter the
WHERE
clause. - 3.Click Save.
You can query across tables within the
WHERE
clause.
For example, you configure the customers
table as a target table, but you also want to use information from the customers_legacy
table to identify the target records in customers
.In the following example query, the matching records in
customers
have a Customer_Key
value that matches a CustomerKey
value in customers_legacy
, and where the value of Occupation
in customers_legacy
is Detective
:"Customer_Key" IN (
SELECT "Customer_Key"
FROM customers_legacy
WHERE customers_legacy."Customer_Key"=customers."Customer_Key"
AND customers_legacy."Occupation" = 'Detective'
)
You can also create a query that selects a random percentage of a specified set of data.
For example, in PostgreSQL, to select 50% of the records that have an identifier that is divisible by 3, you could use the following
WHERE
clause:id % 3 = 0 and random() < 0.5
order by id
To remove a target table:
- 1.In Table View or Graph View, click the table.
- 2.On the table details panel, from the table type dropdown list, select Remove.
A lookup table contains a list of values that are used to populate columns in other tables. For example, a list of states, countries, or currencies. Lookup tables are sometimes referred to as reference tables.
Tonic always copies lookup tables to the destination database in their entirety. If you do not configure a table as a lookup table, then Tonic treats the table as a related table, and copies only rows that are used in the subset data. Tonic also pulls in rows from other tables that refer to the table, but that are not necessarily related to the target tables. This could result in an unexpectedly large subset.
For example, in a Users table, every user record refers to a state in the States table. If you do not identify States as a lookup table, then the subset would include every record in the Users table.
Here are some typical properties of a lookup table:
- It is fairly small and rarely updated.
- Many tables point to the table, but it does not point to another table.
- The table contains a set of unique values.
To identify an individual table as a lookup table:
- 1.In Table View or Graph View, click the table.
- 2.On the table details panel, from the Select Table Type dropdown list, select Lookup Table.
To identify multiple tables as lookup tables:
- 1.On Table View, check the checkbox for each table to identify as a lookup table.
- 2.From the Actions dropdown list, select Add Lookup Tables.

Actions menu for selected rows
To remove the lookup designation for a table:
- 1.In Table View or Graph View, click the table.
- 2.On the table details panel, from the dropdown list, select Remove.
Records that reference required subset records are considered upstream records. Unlike downstream records, upstream records are not required for referential integrity. Upstream records are optional.
To reduce the size of the subset, you can apply a filter to these optional records. To filter the records, you can either:
- Use a date column to specify an amount of time before the current date for which to include records. For example, you can only include records for which the update date is one week before the current date.
- Use a
WHERE
clause to identify the records to include.
You can filter a table that contains both upstream and downstream records. However, the filter only applies to the optional upstream records.
In the table list, when an upstream table is filtered, a Filtered icon displays.
The date filter allows you to filter optional records based on the value of a date-based column.

Filter Table fields for date filter
To filter an upstream table by date:
- 1.In Table View or Graph View, click the table. On the table details panel, under Filter Table, Select table filter is set by default to None, which indicates that the table is not filtered.
- 2.From Select table filter, select Filter By Date Column.
- 3.From the Date Column dropdown list, select the date column to use for the filter. To improve performance, select a column that is indexed.
- 4.Under Get data from the last, from the time unit dropdown list, select the unit of time to use for the filter. You can filter records based on their age in days, weeks, months, or years.
- 5.In the field, enter the number of the selected unit before the current date for which to include the upstream records. For example, you select days as the unit, and set the number to 4. Tonic then pulls related records for which the date column value is up to 4 days before the current date.
You can also filter the upstream records using a
WHERE
clause.
Filter Table fields for WHERE clause
To filter the upstream records using a
WHERE
clause:- 1.In Table View or Graph View, click the table. On the table details panel, under Filter Table, Select table filter is set by default to None, which indicates that the table is not filtered.
- 2.From Select table filter, select Filter by Where Clause.
- 3.In the Where Clause text area, enter the
WHERE
clause to use to filter the related records. - 4.For a more complex
WHERE
clause, you can display an editor with a larger text area.- 1.Click Open in Editor.
- 2.In the text area, enter the
WHERE
clause. - 3.Click Save.
- 5.To copy the
WHERE
clause to the clipboard, click Copy To Clipboard.
To remove an upstream filter, from Select table filter, select None.
As you make changes to the subsetting configuration, Table View and Graph View indicates how the changes affect the next run of the subsetting generation when compared to the most recent subsetting generation.
When a table's inclusion in the subset is affected, on Graph View, a colored marker is added to the bottom of the table box.

Graph View table box
marked as changed
On Table View, a colored icon displays next to the table. A tooltip indicates the type of change.

Change marker for a table in the
Table View list
The possible types of changes are:
- Added to the subset. For example:
- A new target table
- A table that is newly included because it is related to a new target table
- A new lookup table
- Removed from the subset. For example:
- A removed target table
- A table that is removed because it is related to a removed target table
- A removed lookup table
- Modified in the subset. This usually reflects a change to a target table configuration. You might:
- Change the type of target table (percentage or
WHERE
clause) - Change the percentage
- Change the
WHERE
clause - Change the upstream filter
When you run a subsetting generation, Tonic clears the markers.
Tables other than target tables, lookup tables, or related tables are not in the subset.
The subsetting configuration includes how to copy all of these tables to the destination database.
You can either:
- Use the table modes that are assigned to the out-of-subset tables.
- Truncate all of the out-of-subset tables. The table schema is preserved, but none of the data is copied to the destination database.
On Table View, on the Configuration tab, you use the Process tables that are out of subset toggle to determine how to handle these tables. After you run subsetting data generation, the toggle is on the Options tab.

Configuration for how to process tables that are not in the subset
By default, the setting is turned off, and Tonic truncates the out-of-subset tables.
To use the assigned table mode to process each table, toggle the setting to the on position.
If you configured subsetting, then when you run a data generation job, you can either generate the entire dataset, or use the subsetting configuration to generate a subset.
On Table View, on the Configuration tab, the Use Subsetting toggle indicates whether to generate a subset. After you run subsetting data generation, the Use Subsetting toggle is on the Options tab.
By default, the toggle is in the off position. When you run a data generation job, it generates the entire destination data dataset.
To instead generate a subset, toggle Use subsetting to the on position.
When you run a data generation job, you are also prompted to confirm whether to generate the entire dataset or a subset. These two toggles are synchronized. If you turn on the Use Subsetting toggle on the Configuration tab, then it is on by default on the generation confirmation panel.
You can sometimes use parallel processing to improve the performance of the subsetting process. Parallel processing allows multiple subsetting steps to be processed at the same time. The steps cannot rely on the output of other steps that are processed in parallel.
To enable parallel processing for subsetting, set the environment setting
TONIC_TABLE_PARALLELISM
to a number greater than 1 (the default). You can configure this setting from Tonic Settings. This setting determines the maximum number of subsetting steps that Tonic can process in parallel. For regular data generation, it also determines the number of tables that Tonic operates on at the same time.The effect of subsetting parallelism on performance depends on your subsetting configuration, the layout of your schema, the performance characteristics of the machine that runs Tonic, and the performance characteristics of your databases.
We recommend that you start with a relatively small number such as 4, and then run a data generation job to see how it affects performance. If performance improves, you can increase the number incrementally until the performance no longer improves.
The environment setting only controls the maximum number of steps that can be processed in parallel. Performance should not degrade if your system cannot support parallelism or won't benefit from using it.
If you have any other questions, contact [email protected].
Last modified 10d ago