Table modes

Each table is assigned a table mode. The table mode determines at a high level how the table is populated in the destination database.

Selecting the table mode for a table

Required workspace permission: Assign table modes

Both Database View and Table View allow you to view and update the selected table mode for a table.

For Database View, go to Assigning table modes to tables.

For Table View, go to Selecting the table mode.

Available table modes

De-Identify

This is the default table mode for new tables.

In this mode, Tonic Structural copies over all of the rows to the destination database.

For columns that have the generator set to Passthrough, Structural copies the original source data to the destination database.

For columns that are assigned a generator other than Passthrough, Structural uses the generator to replace the column data in the destination database.

Truncate

This mode drops all data for the table in the destination database.

For data connectors other than Spark-based data connectors, the table schema and any constraints associated with the table are included in the destination database.

For Spark-based data connectors (Amazon EMR, Databricks, Spark SDK, Spark with Livy), the table is ignored completely.

For the file connector, file groups are treated as tables. When a file group is assigned Truncate mode, the data generation process ignores the files that are in that file group.

Any existing data in the destination database is removed. For example, if you change the table mode to Truncate after an initial data generation, the next data generation clears the table data. For Spark-based data connectors, the table is removed.

If you assign Truncate mode to a table that has a foreign key constraint, it fails during data generation. If this is a requirement, contact support@tonic.ai for assistance.

Preserve Destination

This mode preserves the data in the destination database for this table. It does not add or update any records.

This feature is primarily used for very large tables that don't need to be de-identified during subsequent runs after the data exists in the destination database.

When you assign Preserve Destination mode to a table, Structural locks the generator configuration for the table columns.

The destination database must have the same schema as the source database.

You cannot use Preserve Destination mode when you:

  • Enable upsert for a workspace.

  • Write destination data to a container artifact.

Incremental

Incremental mode only processes the changes that occurred to the source table since the most recent data generation or other changes in the destination. This can greatly reduce generation time for large tables that don't have a lot of changes.

For Incremental mode to work, the following conditions must be satisfied:

  • The table must exist in the destination database. Either Structural created the table during data generation, or the table was created and populated in some other way.

  • A reliable updated date column must be present. When you select Incremental mode for a table, Structural prompts you to select the updated date column to use.

  • The table must have a primary key.

To maximize performance, we recommend that you have an index on the date updated field.

For tables that use Incremental mode, Structural checks the source database for records that have an updated date that that is greater than the maximum date in that column in the destination database.

When identifying records to update, Structural only checks the updated date. It does not check for other updates. Records where the generator configuration is changed are not updated if they do not meet the updated date requirement.

For the identified records, Structural checks for primary key matches between the source and destination databases, then does one of the following:

  • If the primary key value exists in the destination database, then Structural overwrites the record in the destination database.

  • If the primary key value does not exist in the destination database, then Structural adds a new record to the destination database.

This mode currently only updates and adds records. Rows that are deleted from the source database remain in the destination database.

To ensure accurate incremental processing of records, we recommend that you do not directly modify the destination database. A direct modification might cause the maximum updated date in the destination database to be after the date of the last data generation. This could prevent records from being identified for incremental processing.

Incremental mode is currently supported on PostgreSQL, MySQL, and SQL Server. If you want to use this table mode with another database type, contact support@tonic.ai.

You cannot use Incremental mode when you:

  • Enable upsert for a workspace.

  • Write destination data to a container artifact.

Scale

In this mode, Structural generates an arbitrary number of new rows, as specified by the user, using the generators that are assigned to the table columns.

You can use linking and partitioning to create complex relationships between columns.

Structural generates primary and foreign keys that reflect the distribution (1:1 or 1:many) between the tables in the source database.

You cannot use Scale mode when you enable upsert for a workspace.

Indicating whether to return an error when destination data already exists (Databricks only)

For the Databricks data connector, the table mode configuration includes an Error on Overwrite setting. The setting indicates whether to return an error when Structural attempts to write data to a destination table that already contains data. The option is not available when you write destination data to Databricks Delta tables.

To return the error, toggle the setting to the on position.

To not return the error, toggle the setting to the off position.

Applying a filter to tables

For workspaces that use following data connectors, the table mode configuration for De-Identify mode includes an option to apply a filter to the table:

Table filters provide a way to generate a smaller set of data when a data connector does not support subsetting. For more information, go to Using table filtering for data warehouses and Spark-based data connectors.

Configuring partitioning for the destination database

This option is only available for workspaces that use the following data connectors:

On the table mode configuration panel, you can use the Repartition or Coalesce option to indicate a number of partitions to generate.

By default, the destination database uses the same partitioning as the source database. The partition option is set to Neither.

The Repartition option allows you to provide a specific number of partitions to generate.

To use the Repartition option:

  1. Click Repartition.

  2. In the field, enter the number of partitions.

The Coalesce option allows you to provide a maximum number of partitions to generate. If the source data has fewer partitions than the number you specify, then Structural only generates that number. The Coalesce option should be more efficient than the Repartition option.

To use the Coalesce option:

  1. Click Coalesce.

  2. In the field, enter the number of partitions.

Last updated