# Other subsetting hints and tips

If you are having trouble with your subsetting configuration or results, the following hints and tips might be helpful.

Remember that subsetting is an iterative process. It can take multiple attempts to get the exact subset of data that you want.

## Start with a very small subset

One way to troubleshoot a subsetting configuration is to start with a very small subset. With a small subset, it is easier to verify that the data generation returns the data you are expecting.

For example, start with a [single target table](https://docs.tonic.ai/app/generation/subsetting-configure#subsetting-configure-target-tables), and use a query to limit the subset to a single record. Add the necessary lookup tables.

After you verify the results in this very small subset, gradually increase the subset size in subsequent iterations. Verify the results after each iteration. Continue to increase the subset size until you reach the full subset.

For each iteration, verify that:

* The subset includes the target table records.
* The subset includes all of the lookup table data.
* The subset includes all of the related records - records that each target table record refers to, and records that refer to the target table record.

## Minimize the number of target tables <a href="#subsetting-hints-tips-few-target-tables" id="subsetting-hints-tips-few-target-tables"></a>

An ideal subset contains a small number of [target tables](https://docs.tonic.ai/app/generation/subsetting-about#subsetting-components-target-tables). A target table typically contains an important object that is well connected to everything else in the source data.

Tables that contain static values that are used in multiple other tables should be [lookup tables](https://docs.tonic.ai/app/generation/subsetting-about#subsetting-components-lookup-tables), not target tables.

For [other tables that are related to the target table](https://docs.tonic.ai/app/generation/subsetting-about#related-tables), allow the subsetting process to identify the necessary rows to include in the subset based on the [foreign key configuration](https://docs.tonic.ai/app/generation/foreign-keys). Do not add them as target tables.

Instead of configuring multiple target tables with the same `WHERE` clause, use [virtual foreign keys](https://docs.tonic.ai/app/generation/foreign-keys) to link the relevant tables. This is easier to maintain and helps you to understand the relationships within your database.

## Make circular foreign key columns NULLable <a href="#circular-foreign-keys-nullable" id="circular-foreign-keys-nullable"></a>

When Tonic Structural detects a circular foreign key dependency, to break the dependency, it sets all of the values of one of the columns to `NULL`. For more information, go to [#circular-dependencies](https://docs.tonic.ai/app/generation/subsetting-foreign-keys#circular-dependencies "mention").

If your source data includes circular foreign keys, make sure that at least one of those columns is `NULL`able.

## Ensure unique values for incrementing columns

If your subset includes a table that contains incrementing columns, then to ensure that the output table does not contain duplicate values, add the following [post-job script](https://docs.tonic.ai/app/workflows/scripts) to the workspace.

```
DBCC CHECKIDENT (<column name>, RESEED, <number higher than row count> )
```

In the script:

* Replace `<column name>` with the name of the column.
* Replace `<number higher than row count>` with a number that is greater than the number of rows in the original table.

## Improve subsetting performance <a href="#subsetting-hints-tips-performance" id="subsetting-hints-tips-performance"></a>

Note that subsetting performance does not improve in a linear fashion.

For example, if you subset 10% of a target table, the generation does not take 10% of the time of a standard de-identification run. It might take 90% of the time.

To improve performance:

* Make sure to properly configure all relevant tables as lookup tables. Lookup tables are automatically copied to the subset in their entirety, and do not require processing to identify the rows to include. Structural also does not look for records that are upstream of a lookup table.
* For upstream tables, in particular large upstream tables, add indexes to virtual foreign key columns. If there are no indexes on the foreign key columns, subsetting can be much slower.
* Alternatively, instead of adding indexes, assign Truncate mode to the upstream tables.

## Prune related subset tables <a href="#subset-hints-tips-prune-tables" id="subset-hints-tips-prune-tables"></a>

Here are some other options to further reduce the size of the subset. These options reduce the amount of related data for the included target table records.

### Truncate related tables <a href="#subset-hints-tips-truncate-related-tables" id="subset-hints-tips-truncate-related-tables"></a>

You can truncate related tables in a subset, which removes that entire section of the subset. When you truncate a related table, Structural removes both that table and any tables that are included because of that table.

For example, in this subsetting graph, the `products` table is a related table. For the records in `products`, the subset also includes related records from `wholesale_orders`, `vendors`, and `wo_date`.

<figure><img src="https://3378426797-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2F-LSQCLFQ4bslJ-HYc8c3%2Fuploads%2F3yZWPBkEYTiQAbGTF1lN%2FSubsettingHintGraphBeforeTruncation.png?alt=media&#x26;token=c94e2657-f287-45d3-9442-2e30bbb09b05" alt=""><figcaption><p>Subsetting graph before the products table is truncated</p></figcaption></figure>

After you truncate the `products` table, the subsetting graph indicates that during the next data generation, the `products` table and its connected tables will all be removed from the subset.

<figure><img src="https://3378426797-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2F-LSQCLFQ4bslJ-HYc8c3%2Fuploads%2FscdAP9izEX2rMB0HM2PA%2FSubsettingHintGraphAfterTruncation.png?alt=media&#x26;token=26d5a3a6-24e4-42e8-b589-7783d9f0b642" alt=""><figcaption><p>Subsetting graph after the products table is truncated</p></figcaption></figure>

Note that for a related table that contains primary keys that are used to populate a foreign key column in an included table, you can only truncate the related table if the foreign key column is nullable.

For example, the `product_id` primary key column in the `products` table populates the `purchased_product_id` foreign key column in the `transactions` table. The `transactions` table is included in the subset. If `purchased_product_id` is not nullable, then you cannot truncate `products`.

### Exclude data from upstream tables

Upstream tables contain related data that is not required for referential integrity. To completely remove the table, you can truncate it.

You can also [configure a filter for an upstream table](https://docs.tonic.ai/app/generation/subsetting-configure#subsetting-config-filter-upstream-records) that is based on either a date column or a `WHERE` clause. When you filter the table, the subset only includes related records that match the filter.

To keep the upstream table, but exclude all of the related records, create a `WHERE` clause filter where the `WHERE` clause evaluates to `false`. For many data connectors, you can directly set the `WHERE` clause text to `false`. Otherwise, use an obviously false statement such as `1=0`.

When you exclude all of the records in an upstream table, the subset also excludes all records from its related tables.

For example, in this subsetting graph, before you add an upstream filter to the `wholesale_orders` table, the subset includes records from `wholesale_orders`, `vendors`, and `wo_date`.

<figure><img src="https://3378426797-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2F-LSQCLFQ4bslJ-HYc8c3%2Fuploads%2FuNc162BjUwmT5JTy3N2P%2FSubsettingHintGraphBeforeUpstreamFilter.png?alt=media&#x26;token=65838674-8198-4ae8-aa3f-af5dd8c6ef95" alt=""><figcaption><p>Subsetting graph before an upstream filter is added to wholesale_orders</p></figcaption></figure>

After you add a `WHERE` clause filter that is set to `false` to the `wholesale_orders` table, the subset includes zero records from `wholesale_orders`, `vendors`, and `wo_date`.

<figure><img src="https://3378426797-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2F-LSQCLFQ4bslJ-HYc8c3%2Fuploads%2FHkY9MrqGC6ryujTTNcbB%2FSubsettingHintGraphAfterUpstreamFilter.png?alt=media&#x26;token=1d71c1c3-6f4d-40fb-b9c6-5fc27ca405a9" alt=""><figcaption><p>Subsetting graph after the upstream filter is applied to wholesale_orders</p></figcaption></figure>
