Enabling and configuring upsert

Required license: Professional or Enterprise

Not compatible with writing output to a container repository or a Tonic Ephemeral snapshot.

By default, Tonic Structural data generation replaces the existing destination database with the transformed data from the current job.

Upsert adds and updates rows in the destination database, but keeps all of the other existing rows intact. For example, you might have a standard set of test records that you do not want to replace every time you generate data in Structural.

If you enable upsert, then you cannot write the destination data to a container repository or to a Tonic Ephemeral snapshot. You must write the data to a database server.

Upsert is currently only supported for the following data connectors:

  • MySQL

  • Oracle

  • PostgreSQL

  • SQL Server

For an overview of upsert, you can also view the video tutorial.

About the upsert process

When upsert is enabled, the data generation job writes the generated data to an intermediate database. Structural then runs the upsert job to write the new and updated records to the destination database.

Data generation process with upsert

The destination database must already exist. Structural cannot run an upsert job to an empty destination database.

The upsert job adds and updates records based on the primary keys.

  • If the primary key for a record already exists in the destination database, the upsert job updates the record.

  • If the primary key for a record does not exist in the destination database, the upsert job inserts a new row.

To only update or insert records that Structural creates based on source records, and ignore other records that are already in the destination database, ensure that the primary keys for each set of records operate on different ranges. For example, allocate the integer range 1-1000 for existing destination database records that you add manually. Then ensure that the source database records, and by extension the records that Structural creates during data generation, use a different range.

Also note that when upsert is enabled, the Truncate table mode does not actually truncate the destination table. Instead, it works more like Preserve Destination table mode, which preserves existing records in the destination table.

Enabling upsert

To enable upsert, in the Upsert section of the workspace details, toggle Enable Upsert to the on position.

When you enable upsert for a workspace, you are prompted to configure the upsert processing and provide the connection details for the intermediate database.

Configuring upsert processing

When you enable upsert, Structural displays the following settings to configure the upsert process.

Disable Triggers

Indicates whether to disable any user-defined triggers before the upsert job runs. This prevents duplicate rows from being added to the destination database. By default, this is enabled.

Automatically Start Upsert After Successful Data Generation

Indicates whether to immediately run the upsert job after the initial data data generation to the intermediate database. By default, this is enabled. If you turn this off, then after the initial data generation, you must start the upsert job manually. For more information, go to Starting an upsert job based on the most recent data generation.

Persist Conflicting Data Tables

When an upsert job cannot process rows with unique constraint conflicts, as well as rows that have foreign keys to those rows, this setting indicates whether to preserve the temporary tables that contain those rows. By default, this is disabled. Structural only keeps the applicable temporary tables from the most recent upsert job.

Warn on Mismatched Constraints

Indicates whether to treat mismatched foreign key and unique constraints between the source and destination databases as warnings instead of errors, so that the upsert job does not fail. By default, this is disabled.

Connecting to migration scripts for schema changes

Required license: Enterprise

The intermediate database must have the same schema as the destination database. If the schemas do not match, then the upsert process fails.

To ensure that schema changes are automatically reflected in the intermediate database, you can connect the workspace to your own database migration script or tool. Structural then runs the migration script or tool whenever you run upsert data generation.

How upsert works with the migration process

When you start an upsert data generation job:

Upsert data generation process with migration
  1. If migration is enabled, Structural calls the endpoint to start the migration.

  2. Structural cannot start the upsert data generation until the migration completes successfully. It regularly calls the status check endpoint to check whether the migration is complete.

  3. When the migration is complete, Structural starts the upsert data generation.

POST Start Schema Changes endpoint

Required. Structural calls this endpoint to start the migration process specified by the provided URL.

The request includes:

  • Any custom parameter values that you add.

  • The connection information for the intermediate database.

The request uses the following format:

{ 
  "parameters": {/* user supplied parameters */ },
  "databaseConnectionDetails": {
        "server": "rds.amazon.com",
        "port": "54321",
        "username": "user",
        "password": "password",
        "databaseName": "tonic_upsert",
        "schemaName": "<Oracle schema to use>",
        "sslEnabled": true,
        "trustServerCertificate": false
  }
}

The response contains the identifier of the migration task.

The response uses the following format:

{ "id": "<unique-string-identifier>" }

GET Status of Schema Change endpoint

Required. Structural calls this endpoint to check the current status of the migration process.

The request includes the task identifier that was returned when the migration process started. The request URL must be able to pass the request identifier as either a path or a query parameter.

The response provides the current status of the migration task. The possible status values are:

  • Unknown

  • Queued

  • Running

  • Canceled

  • Completed

  • Failed

The response uses the following format:

{
  "id": "a0c5c4c3-a593-4daa-a935-53c45ec255ea",
  "status": "Completed",
  "errors": []
}

GET Schema Change Logs endpoint

Optional. Structural calls this endpoint to retrieve the log entries for the migration process. It adds the migration logs to the upsert logs.

The request includes the task identifier that was returned when the migration process started. The request URL must be able to pass the request identifier as either a path or a query parameter

The response body of the request should be 'text/plain'. It contains the raw logs.

DELETE Cancel Schema Changes endpoint

Optional. Structural calls this endpoint to cancel the migration process.

The request includes the task identifier that was returned when the migration process started. The request URL must be able to pass the request identifier as either a path or query parameter.

Enabling and configuring the migration process

To enable the migration process, toggle Enable Migration Service to the on position.

When you enable the migration process, you must configure the POST Start Schema Changes and GET Status of Schema Change endpoints.

You can optionally configure the GET Schema Change Logs and DELETE Cancel Schema Changes endpoints.

To configure the endpoints:

  1. To configure the POST Start Schema Changes endpoint:

    1. In the URL field, provide the URL of the migration script.

    2. Optionally, in the Parameters field, provide any additional parameter values that your migration scripts need.

  2. To configure the GET Status of Schema Change endpoint, in the URL field, provide the URL for the status check.

    The URL must include an {id} placeholder. This is used to pass the identifier that is returned from the Start Schema Changes endpoint.

  3. To configure the GET Schema Change Logs endpoint, in the URL field, provide the URL to use to retrieve the logs. The URL must include an {id} placeholder. This is used to pass the identifier that is returned from the Start Schema Changes endpoint.

  4. To configure the DELETE Cancel Schema Changes endpoint, in the URL field, provide the URL to use for the cancellation. The URL must include an {id} placeholder. This is used to pass the identifier that is returned from the Start Schema Changes endpoint.

Connecting to the intermediate database

When you enable upsert, you must provide the connection information for the intermediate database.

For details, go to the workspace configuration information for the data connector.

How Structural responds to inconsistencies in the source and destination schemas

During upsert data generation, when Structural finds inconsistencies between the source and destination database schemas:

  • Where possible, Structural attempts to address the issue so that the data generation can succeed.

  • Structural does not change the schema of the destination database.

  • For constraint-related schema issues, Structural only attempts to address the issues if Warn on Mismatched Constraints is enabled for the workspace. If the setting is turned off, then the job fails.

Here are some common schema issues that can occur, and how Structural responds to them.

Source column is not in the destination schema

In this case, a column that is present in the source schema is not present in the destination schema.

Source column missing from destination schema

For example, a new column is added to a production source table, but is not in the schema of the de-identified destination database that is used for testing.

When this occurs, Structural ignores the column. It does not add the column to the destination schema.

Structural adds a warning to the job logs.

Destination column is not in the source schema

In this case, a column that is present in the destination schema is not present in the source schema.

Destination column is missing from source schema

For example, a developer adds a column to the de-identified destination database so that they can test a new feature. The new feature is not yet released, so the source production data doesn't include the column.

When this occurs:

Flow to address a column missing from the source schema
  1. If the destination column is nullable, then Structural sets the value to NULL.

  2. If the destination column is not nullable, but the column has a default value, then Structural sets the destination value to the default.

  3. If the non-nullable destination column does not have a default value, then Structural attempts to set a value based on the column data type. For example, Structural might set an integer column to 0, or a varchar column to an empty string.

  4. If Structural is unable to set a value, then the data generation fails and Structural returns an error.

Source and destination columns have different data types

In this case, the same column has different data types in the source and destination schemas.

Data type mismatch between the source and destination columns

For example, a column might be a string in the source schema and a timestamp in the destination schema.

When this occurs, for each record:

Flow to address a data type mismatch
  1. If possible, Structural converts the values. For example, the source column is a string and contains datetime values. The generator also produces datetime values. In that case, Structural should be able to populate a datetime destination column.

  2. If it cannot convert the value, and the column is nullable, then Structural sets the destination column value to NULL.

  3. If it cannot convert the value, and the column is not nullable, then the record is excluded from the upsert.

For each of these actions, Structural also adds warnings to the job logs.

If Structural cannot perform any of those actions to work around the issue, then the data generation fails and Structural returns an error.

Source constraint is not in the destination schema

In this case, a constraint on a source column is not present in the destination schema.

Constraint in source schema is not in the destination schema

For example, a column is required in the source schema but optional in the destination schema.

If Warn On Mismatched Constraints is enabled for the workspace, then Structural does not have to make any changes to the data. It populates the destination column correctly.

Structural also adds a warning to the job logs.

If Warn On Mismatched Constraints is turned off, then the job fails.

Destination constraint is not in the source schema

In this case, a constraint on a destination column is not present in the source schema.

Constraint in destination schema is not in the destination schema

For example, a column has no constraints in the source schema, but has a uniqueness constraint in the destination schema.

When this occurs, if Warn on Mismatched Constraints is enabled for the workspace, Structural removes any records that fail the constraint. For example, for a uniqueness constraint, Structural removes duplicate records.

Structural also adds warnings to the job logs.

If Warn on Mismatched Constraints is turned off, then the job fails.

Source table is not in the destination schema

In this case, a table in the source schema is not present in the destination schema.

Source table column is not in the destination schema

For example, a new table is added to a production source table, but is not yet in the schema of the de-identified destination database that is used for testing.

When this occurs, Structural ignores the table. It does not add the table to the destination schema.

Structural also adds warnings to the job logs.

Destination table is not in the source schema

In this case, a table in the destination schema is not present in the source schema.

Destination table is not in the source schema

For example, a developer adds a table to the de-identified destination database so that they can test a new feature. Because the new feature is not yet released, the source production data doesn't include the table.

When this occurs, Structural ignores the table. It does not attempt to populate the destination table.

Structural also adds warnings to the job logs.

Source or destination table is renamed

Structural cannot detect that a table is renamed.

From Structural's perspective, the original table is removed, and the table with the new name is added.

For example, a source and destination schema both contain a table called Users.

In the source database, the Users table is renamed to People.

Structural would detect the following schema issues:

Last updated

Was this helpful?