1 of 4

Workspace configuration settings

The workspace details for a new or edited workspace specify information about the workspace and the workspace data.

Common workspace fields

All workspaces have the following fields, used to identify the workspace and indicate the connector type:

In the Workspace name field, enter the name of the workspace.
In the Workspace description field, provide a brief description of the workspace. The description can contain up to 200 characters.
In the Tags field, provide a comma-separated list of tags to assign to the workspace. For more information on managing tags, go to Assigning tags to a workspace.

Workspace type (data generation or data science mode)

Depending on your Tonic Structural license agreement, you can either:

Only create data generation workspaces
Only create data science mode workspaces
Create either data generation or data science mode workspaces

Under Data Science Mode, the Enable Data Science Mode toggle determines whether the workspace is a data generation workspace or a data science mode workspace.

If your instance only supports data generation workspaces, then the toggle is not displayed.
If your instance only supports data science mode workspaces, then the toggle is displayed and locked in the on position.
If your instance supports both data generation and data science mode workspaces, then the toggle is displayed. By default, it is in the off position, indicating to create a data generation workspace. To create a data science mode workspace, toggle Enable Data Science Mode to the on position.

Connection type

Under Connection Type, select the type of database to connect to. You cannot change the connection type on a child workspace.

For data generation, the source and destination databases are always of the same type.

The Basic and Professional licenses limit the number and type of data connectors you can use.

A Basic instance can only use one data connector type, which can be either PostgreSQL or MySQL. After you create your first workspace, any subsequent workspaces must use the same data connector type.
A Professional instance can use up two different data connector types, which can be any type other than Oracle. After you create workspaces that use two different data connector types, any subsequent workspaces must use one of those data connector types.

For a data science mode workspace, there is also a CSV option, which allows you to use uploaded CSV files as the source of your model data.

If you don't see the database that you want to connect to, or you want to have different database types for your source and destination database, contact support@tonic.ai.

When you select a connector type, Structural updates the view to display the connection fields used for that connector type. The specific fields vary based on the connector type.

Source data location

After you select the connector type, you first configure the connection to the source data.

For a workspace that connects to a database, the Source Settings section provides connection information for the source database. For information about the source connection fields for a specific data connector, go to the workspace configuration topic for that connector type.

For a file connector workspace, which uses files for source data, the File Location section indicates where the source files are obtained from - a local file system, Amazon S3, or Google Cloud Storage. For more information, go to Configuring the file connector storage type and output options.

You cannot change the source data configuration for a child workspace.

Destination data location

For a data generation workspace, the Destination Settings section provides information about where and how Structural writes the output data from data generation.

For a data science mode workspace, you do not configure destination information.

For data connectors other than the file connector, depending on the connector type, you can write to either:

Destination database - Writes the output data to a destination database on a database server.
Ephemeral snapshot - Writes the output data to a Tonic Ephemeral user snapshot.
Container repository - Writes the output data to a data volume in a container repository.

For the file connector, you might need to provide a cloud storage location for the transformed files.

Writing to a destination database

When you write the output to a destination database, the destination database must be of the same type as the source database.

Structural does not create the destination database. It must exist before you generate data.

In Destination Settings, you provide the connection information for the destination database. For information about the destination database connection fields for a specific data connector, go to the workspace configuration topic for that connector type.

If available, the Copy Settings from Source allows you to copy the source connection details to the destination database, if both databases are in the same location. Structural does not copy the connection password.

Upsert configuration

For data connectors that support upsert, when you write the output to a destination database, the connection details include an Upsert section to allow you to enable and configure upsert.

Upsert is not available for output to an Ephemeral database or to a container repository.

For more information, go to Enabling and configuring upsert.

Writing to a Tonic Ephemeral snapshot

If Ephemeral supports your workspace database type, then you can choose to write the destination data to a snapshot in Ephemeral. For data larger than 10 GB, this option is recommended instead of writing to a container repository.

From Ephemeral, you can use the snapshot to start new Ephemeral databases.

For more information, go to Writing data generation output to a Tonic Ephemeral snapshot.

Writing to a container repository

Some data connectors allow you to choose to write the transformed data to a data volume in a container repository instead of to a database server.

You can use the resulting data volume to create a database in Tonic Ephemeral. If you do plan to use the data to start an Ephemeral database, and the size of the data is larger than 10 GB, then the recommendation is to write the data to an Ephemeral user snapshot instead.

For more information, go to Writing data generation output to a container repository.

Output cloud storage location for the file connector

For a file connector workspace that transforms files from cloud storage (Amazon S3 or Google Cloud Storage), you provide the output location.

For more information, go to Configuring the file connector storage type and output options.

Testing database connections

Whenever you provide connection details for a database server, Structural provides a Test Connection button to test the connection, and verify that Structural can use the connection details to connect to the database. Structural uses the connection details to try to reach the database, and indicates whether it succeeded or failed. We strongly recommend that you test the connections.

The environment setting TONIC_TEST_CONNECTION_TIMEOUT_IN_SECONDS determines the number of seconds before a connection test times out. You can configure this setting from the Environment Settings tab on Tonic Settings. By default, the connection test times out after 15 seconds.

Blocking data generation for all schema changes

Most data generation workspaces have a Block data generation if schema changes detected toggle. The setting is usually in the Source Settings section.

By default, the option is turned off. When the option is off, Structural only blocks data generation when there are conflicting schema changes. Structural does not block data generation when there are non-conflicting schema changes.

If this option is turned on, then if Structural detects any changes at all to the schema, then data generation is blocked until you resolve the schema changes. For more information, go to Viewing and resolving schema changes.

Workspace statistics seed for cross-run consistency

For generators where consistency is enabled, a statistics seed enables consistency across data generation runs. The Structural-wide statistics seed value ensures consistency across both data generation runs and workspaces.

In the workspace configuration, under Destination Settings, use the Override Statistics Seed setting to override the Structural-wide statistics seed value. You can either disable consistency across data generations, or provide a seed value for the workspace. The workspace seed value ensures consistency across data generation runs for that workspace, and across other workspaces that have the same seed value.

For details about using seed values to ensure consistency across data generation runs and databases, go to #enabling-consistency-across-runs-or-multiple-databases.

Uploading CSV files for data science mode

For a data science mode workspace, instead of connecting to a database, you can upload one or more CSV files that contain the data that you want to use. Each file that you upload becomes a table in your source data. You can then issue model queries against the data.

To indicate to use CSV files to provide the source data, for Connection Type, under Upload your own data, click CSV.

Adding CSV files

Under Add dataset files, to add files to the list, either:

Click Select files to upload, then select the files.
Drag and drop the files from your machine.

You cannot upload a file with the same name as an existing file in the list. To replace the data in an existing file, you must delete the file and then upload the updated file.

Configuring an uploaded file

To configure the options for a file:

If the file includes a heading row, then toggle Treat first row as column header to the on position.
In the Column Delimiter field, provide the character that is used as delimiter. The default is a comma.
In the Escape Character field, provide the character that is used to escape characters. The default is a backslash (\).
In the Quote Character field, provide the character that is used to quote text. The default is the double quote.
In the NULL Character field, provide the text used to indicate a null value. The default is \N.
To display a preview of the data in the file, click Expand.

Removing a file

To remove a file, click Remove.

Writing data generation output to a Tonic Ephemeral snapshot

Only available for PostgreSQL, MySQL, and SQL Server.

Not compatible with upsert.

Not compatible with Preserve Destination or Incremental table modes.

If Ephemeral supports your workspace database type, then you can choose to write the destination data to a snapshot in Ephemeral. You can then use the snapshot to start Ephemeral databases.

To write the transformed data to Ephemeral, under Destination Settings, click Ephemeral Database.

Selecting the Ephemeral instance type

Structural can write the data snapshot to either Ephemeral Cloud or to a self-hosted instance of Ephemeral. By default, Structural writes the data snapshot to Ephemeral Cloud.

For Ephemeral Cloud, Structural writes the snapshot to the account for the user who runs the data generation job. If that user has an Ephemeral account on Ephemeral Cloud, then Structural uses that account. If the user does not have an account, then Structural creates a two-week Ephemeral free trial account for the user.

Note that if you are on a self-hosted instance of Ephemeral, then you must always provide an Ephemeral API key.

To write a snapshot to Ephemeral Cloud:

Click Tonic Ephemeral cloud.
If you are on a self-hosted instance of Structural, in the API Key field, provide an Ephemeral API key from your Ephemeral account.

To write the snapshot to a self-hosted instance of Ephemeral:

Click Tonic Ephemeral self-hosted.
In the API Key field, provide an Ephemeral API key from your Ephemeral account. Structural writes the snapshot to the Ephemeral account that is associated with the API key.
In the Tonic Ephemeral URL field, provide the URL to your self-hosted Ephemeral instance.

Displaying advanced settings for the snapshot

If you do not configure any advanced settings, then:

The snapshot uses the same name as the workspace, and has no description.
The snapshot size allocation is determined by the source data size.
Structural discards the temporary Ephemeral database that is created during the data generation.

To change any of these settings, click Advanced settings.

Providing a snapshot name and description

By default, the snapshot name uses the workspace name.

When you run data generation, if a snapshot with the same name already exists in Ephemeral, then Structural overwrites that snapshot with the new snapshot.

Under Advanced settings:

In the Snapshot name field, provide the name of the snapshot. The snapshot name can use the following placeholder values to help identify the snapshot:
- {workspaceName} - Inserts the name of the workspace.
- {workspaceId} - Inserts the identifier of the workspace.
- {jobId} - Inserts the identifier of the data generation job that created the snapshot.
- {timestamp} - Inserts the timestamp when the snapshot was created.
Including the job ID or timestamp ensures that a data generation job does not overwrite a previous snapshot.
Optionally, in the Snapshot description field, provide a longer description of the snapshot.

Setting the size allocation for the snapshot

By default, the Ephemeral size allocation for the snapshot is based on the size of the source data.

To instead provide a custom data size allocation, under Advanced settings:

Toggle Custom data size allocation to the on position.
In the field, enter the size allocation in gigabytes.

Indicating whether to keep the temporary Ephemeral database

When Structural creates the Ephemeral snapshot, it creates a temporary Ephemeral database.

By default, Structural deletes that database when the data generation is complete.

To instead keep the database, under Advanced settings, toggle Keep database active in Tonic Ephemeral after data generation to the on position.

Providing a customization file for MySQL

For a MySQL workspace, you can provide a customization file that helps to ensure that the temporary Ephemeral database is configured correctly.

To provide the customization details:

Toggle Use custom configuration to the on position.
In the text area, paste the contents of the customization file.

Writing data generation output to a container repository

Requires Kubernetes.

For self-hosted Docker deployments, you can install and configure a separate Kubernetes cluster to use. For more information, go to Setting up a Kubernetes cluster to use to write output data to container artifacts.

For information about required Kubernetes permissions, go to Required access to write destination data to container artifacts.

Not compatible with upsert.

Not compatible with Preserve Destination or Incremental table modes.

Only supported for PostgreSQL, MySQL, and SQL Server.

You can configure a workspace to write destination data to a container repository instead of a database server.

When it writes data generation output to a repository, Structural writes the destination data to a container volume. From the list of container artifacts, you can copy the volume digest, and download a Docker Compose file that provides connection settings for the database on the volume. Structural generates the Compose file when you make the request to download it. For more information about getting access to the container artifacts, go to Viewing and downloading container artifacts.

You can also use the data volume to start a Tonic Ephemeral database. However, if the data is larger than 10 GB, we recommend that you write the data to an Ephemeral user snapshot instead. For information about writing to an Ephemeral snapshot, go to Writing data generation output to a Tonic Ephemeral snapshot.

For an overview of writing destination data to container artifacts, you can also view the video tutorial.

Indicating to write destination data to container artifacts

Under Destination Settings, to indicate to write the destination data to container artifacts, click Container Repository.

For a Structural instance that is deployed on Docker, unless you set up a separate Kubernetes cluster, the Container Repository option is hidden.

You can switch between writing to a database server and writing to a container repository at any time. Structural preserves the configuration details for both options. When you run data generation, it uses the currently selected option for the workspace.

Identifying the base image to use to create the container artifacts

From the Database Image dropdown list, select the image to use to create the container artifacts.

Select an image version that is compatible with the version of the database that is used in the workspace.

Providing a customization file for MySQL

For a MySQL workspace, you can provide a customization file that helps to ensure that the temporary destination database is configured correctly.

To provide the customization details:

Toggle Use customization to the on position.
In the text area, paste the contents of the customization file.

Setting the location for the container artifacts

To provide the location where Structural publishes the container artifacts:

In the Registry field, type the path to the container registry where Structural publishes the data volume.
In the Repository Path field, provide the path within the registry where Structural publishes the data volume.

Providing the credentials to write to the registry

You next provide the credentials that Structural uses to read from and write to the registry.

When you provide the registry, Structural detects whether the registry is from Amazon Elastic Container Registry (Amazon ECR), Google Artifact Registry (GAR), or a different container solution.

It displays the appropriate fields based on the registry type.

Fields for registries other than Amazon ECR or GAR

For a registry other than an Amazon ECR or a GAR registry, the credentials can be either a username and access token, or a secret.

The option to use a secret is not available on Structural Cloud.

In general, the credentials must be for a user that has read and write permissions for the registry.

The secret is the name of a Kubernetes secret that lives on the pod that the Structural worker runs on. The secret type must be kubernetes.io/dockerconfigjson. The Kubernetes documentation provides information on how to create a registry credentials secret.

To use a username and access token:

Click Access token.
In the Username field, provide the username.
In the Access Token field, provide the access token.

To use a secret:

Click Secret name.
In the Secret Name field, provide the name of the secret.

Azure Container Registry (ACR) permission requirements

For ACR, the provided credentials must be for a service principal that has sufficient permissions on the registry.

For Structural, the service principal must at least have the permissions that are associated with the AcrPush role.

Providing a service file for GAR

Structural only supports Google Artifact Registry (GAR). It does not support Google Container Registry (GCR).

For a GAR registry, you upload a service account file, which is a JSON file that contains credentials that provide access to Google Cloud Platform (GCP).

The associated service account must have the Artifact Registry Writer role.

For Service Account File, to search for and select the file, click Browse.

Amazon ECR registries

For an Amazon ECR registry, you can either:

Provide the AWS access and secret key that is associated with the IAM user that will connect to the registry
(Self-hosted only) Use the credentials configured in the Structural environment settings TONIC_AWS_ACCESS_KEY_ID and TONIC_AWS_SECRET_ACCESS_KEY.
(Self-hosted only) If Structural is deployed in Amazon Elastic Kubernetes Service (Amazon EKS), then you can use the AWS credentials that live on the EC2 instance.

On Structural Cloud, you must provide an AWS access key and secret key.

On a self-hosted instance, you can choose the source of the credentials. The default is Access Keys.

To provide an AWS access key and secret key, click Access Keys.
To use the credentials configured in the environment settings, click Environment Variables.
To use the AWS credentials from the EC2 instance, click Instance Profile.

The IAM user must have permission to list, push, and pull images from the registry. The following example policy includes the required permissions.

{
  {
    "Sid": "ManageTonicRepositoryContents",
    "Effect": "Allow",
    "Action": [
      "ecr:DescribeRepositories",
      "ecr:ListImages",
      "ecr:DescribeImages",
      "ecr:BatchGetImage",
      "ecr:BatchCheckLayerAvailability",
      "ecr:InitiateLayerUpload",
      "ecr:UploadLayerPart",
      "ecr:CompleteLayerUpload",
      "ecr:PutImage"
    ],
    "Resource": [
       "arn:aws:ecr:<region>:<account_id>:repository/<optional name filter>"
    ]
  },
  {
    "Sid": "GetAuthorizationToken",
    "Effect": "Allow",
    "Action": [
      "ecr:GetAuthorizationToken"
    ],
    "Resource": "*"
  }
}

For additional security, a repository name filter allows you to limit access to only the repositories that are used in Structural. You need to make sure that the repositories that you create for Structural match the filter.

For example, you could prefix Structural repository names with tonic-. In the policy, you include a filter based on the tonic- prefix:

"Resource": [
  "arn:aws:ecr:<region>:<account_id>:repository/tonic-*"
]

Providing tags for the container artifacts

In the Tags field, provide the tag values to apply to the container artifacts. You can also change the tag configuration for individual data generation jobs.

Use commas to separate the tags.

A tag cannot contain spaces. Structural provides the following built-in values for you to use in tags:

{workspaceId} - The identifier of the workspace.
{workspaceName} - The name of the workspace.
{timestamp} - The timestamp when the data generation job that created the artifact completed.
{jobId} - The identifier of the data generation job that created the artifact.

For example, the following creates a tag that contains the workspace name, job identifier, and timestamp:

{workspaceName}_{jobId}_{timestamp}

To also tag the artifacts as latest, check the Tag as "latest" in your repository checkbox.

Specifying custom resources for the Kubernetes pods

You can also optionally configure custom resource values for the Kubernetes pods. You can specify the ephemeral storage, memory, and CPU millicores.

To provide custom resources:

Toggle Set custom pod resources to the on position.
Under Storage Size:
1. In the field, provide the number of megabytes or gigabytes of storage.
2. From the dropdown list, select the unit to use.
The storage can be between 32MB and 25GB.
Under Memory Size:
1. In the field, provide the number of megabytes or gigabytes of RAM.
2. From the dropdown list, select the unit to use.
The memory can be between 512MB and 4 GB.
Under Processor Size:
1. In the field, provide the number of millicores.
2. From the dropdown list, select the unit.
The processor size can be between 250m and 1000m.

Enabling and configuring upsert

Required license: Professional or Enterprise

Not compatible with writing output to a container repository or a Tonic Ephemeral snapshot.

By default, Tonic Structural data generation replaces the existing destination database with the transformed data from the current job.

Upsert allows you to add and update rows in the destination database, but keep all other existing rows intact. For example, you might have a standard set of test records that you do not want to have to replace every time you generate data in Structural.

If you enable upsert, then you cannot write the destination data to a container repository or to a Tonic Ephemeral snapshot. You must write the data to a database server.

Upsert is currently only supported for the following data connectors:

MySQL
Oracle
PostgreSQL
SQL Server

For an overview of upsert, you can also view the video tutorial.

About the upsert process

When upsert is enabled, the data generation job writes the generated data to an intermediate database. Structural then runs the upsert job to write the new and updated records to the destination database.

The destination database must already exist. Structural cannot run an upsert job to an empty destination database.

The upsert job adds and updates records based on the primary keys.

If the primary key for a record already exists in the destination database, the upsert job updates the record.
If the primary key for a record does not exist in the destination database, the upsert job inserts a new row.

To only update or insert records that Structural creates based on source records, and ignore other records that are already in the destination database, ensure that the primary keys for each set of records operate on different ranges. For example, allocate the integer range 1-1000 for existing destination database records that you add manually. Then ensure that the source database records, and by extension the records that Structural creates during data generation, use a different range.

Also note that when upsert is enabled, the Truncate table mode does not actually truncate the destination table. Instead, it works more like Preserve Destination table mode, which preserves existing records in the destination table.

Enabling upsert

To enable upsert, in the Upsert section of the workspace details, toggle Enable Upsert to the on position.

When you enable upsert for a workspace, you are prompted to configure the upsert processing and provide the connection details for the intermediate database.

Configuring upsert processing

When you enable upsert, Structural displays the following settings to configure the upsert process.

Connecting to migration scripts for schema changes

Required license: Enterprise

The intermediate database must have the same schema as the destination database. If the schemas do not match, then the upsert process fails.

To ensure that schema changes are automatically reflected in the intermediate database, you can connect the workspace to your own database migration script or tool. Structural then runs the migration script or tool whenever you run upsert data generation.

How upsert works with the migration process

When you start an upsert data generation job:

If migration is enabled, Structural calls the endpoint to start the migration.
Structural cannot start the upsert data generation until the migration completes successfully. It regularly calls the status check endpoint to check whether the migration is complete.
When the migration is complete, Structural starts the upsert data generation.

POST Start Schema Changes endpoint

Required. Structural calls this endpoint to start the migration process specified by the provided URL.

The request includes:

Any custom parameter values that you add.
The connection information for the intermediate database.

The request uses the following format:

{ 
  "parameters": {/* user supplied parameters */ },
  "databaseConnectionDetails": {
        "server": "rds.amazon.com",
        "port": "54321",
        "username": "user",
        "password": "password",
        "databaseName": "tonic_upsert",
        "schemaName": "<Oracle schema to use>",
        "sslEnabled": true,
        "trustServerCertificate": false
  }
}

The response contains the identifier of the migration task.

The response uses the following format:

{ "id": "<unique-string-identifier>" }

GET Status of Schema Change endpoint

Required. Structural calls this endpoint to check the current status of the migration process.

The request includes the task identifier that was returned when the migration process started. The request URL must be able to pass the request identifier as either a path or query parameter.

The response provides the current status of the migration task. The possible status values are:

Unknown
Queued
Running
Canceled
Completed
Failed

The response uses the following format:

{
  "id": "a0c5c4c3-a593-4daa-a935-53c45ec255ea",
  "status": "Completed",
  "errors": []
}

GET Schema Change Logs endpoint

Optional. Structural calls this endpoint to retrieve the log entries for the migration process. It adds the migration logs to the upsert logs.

The request includes the task identifier that was returned when the migration process started. The request URL must be able to pass the request identifier as either a path or query parameter

The response body of the request should be 'text/plain'. It contains the raw logs.

DELETE Cancel Schema Changes endpoint

Optional. Structural calls this endpoint to cancel the migration process.

The request includes the task identifier that was returned when the migration process started. The request URL must be able to pass the request identifier as either a path or query parameter.

Enabling and configuring the migration process

To enable the migration process, toggle Enable Migration Service to the on position.

When you enable the migration process, you must configure the POST Start Schema Changes and GET Status of Schema Change endpoints.

You can optionally configure the GET Schema Change Logs and DELETE Cancel Schema Changes endpoints.

To configure the endpoints:

To configure the POST Start Schema Changes endpoint:
1. In the URL field, provide the URL of the migration script.
2. Optionally, in the Parameters field, provide any additional parameter values that your migration scripts need.
To configure the GET Status of Schema Change endpoint, in the URL field, provide the URL for the status check.
The URL must include an {id} placeholder. This is used to pass the identifier that is returned from the Start Schema Changes endpoint.
To configure the GET Schema Change Logs endpoint, in the URL field, provide the URL to use to retrieve the logs. The URL must include an {id} placeholder. This is used to pass the identifier that is returned from the Start Schema Changes endpoint.
To configure the DELETE Cancel Schema Changes endpoint, in the URL field, provide the URL to use for the cancellation. The URL must include an {id} placeholder. This is used to pass the identifier that is returned from the Start Schema Changes endpoint.

Connecting to the intermediate database

When you enable upsert, you must provide the connection information for the intermediate database.

For details, go to the workspace configuration information for the data connector.

Workspace configuration settings

The workspace details for a new or edited workspace specify information about the workspace and the workspace data.

Common workspace fields

All workspaces have the following fields, used to identify the workspace and indicate the connector type:

In the Workspace name field, enter the name of the workspace.
In the Workspace description field, provide a brief description of the workspace. The description can contain up to 200 characters.
In the Tags field, provide a comma-separated list of tags to assign to the workspace. For more information on managing tags, go to Assigning tags to a workspace.

Workspace type (data generation or data science mode)

Depending on your Tonic Structural license agreement, you can either:

Only create data generation workspaces
Only create data science mode workspaces
Create either data generation or data science mode workspaces

Under Data Science Mode, the Enable Data Science Mode toggle determines whether the workspace is a data generation workspace or a data science mode workspace.

If your instance only supports data generation workspaces, then the toggle is not displayed.
If your instance only supports data science mode workspaces, then the toggle is displayed and locked in the on position.
If your instance supports both data generation and data science mode workspaces, then the toggle is displayed. By default, it is in the off position, indicating to create a data generation workspace. To create a data science mode workspace, toggle Enable Data Science Mode to the on position.

Connection type

Under Connection Type, select the type of database to connect to. You cannot change the connection type on a child workspace.

For data generation, the source and destination databases are always of the same type.

The Basic and Professional licenses limit the number and type of data connectors you can use.

A Basic instance can only use one data connector type, which can be either PostgreSQL or MySQL. After you create your first workspace, any subsequent workspaces must use the same data connector type.
A Professional instance can use up two different data connector types, which can be any type other than Oracle. After you create workspaces that use two different data connector types, any subsequent workspaces must use one of those data connector types.

For a data science mode workspace, there is also a CSV option, which allows you to use uploaded CSV files as the source of your model data.

If you don't see the database that you want to connect to, or you want to have different database types for your source and destination database, contact support@tonic.ai.

When you select a connector type, Structural updates the view to display the connection fields used for that connector type. The specific fields vary based on the connector type.

Source data location

After you select the connector type, you first configure the connection to the source data.

You cannot change the source data configuration for a child workspace.

Destination data location

For a data generation workspace, the Destination Settings section provides information about where and how Structural writes the output data from data generation.

For a data science mode workspace, you do not configure destination information.

For data connectors other than the file connector, depending on the connector type, you can write to either:

Destination database - Writes the output data to a destination database on a database server.
Ephemeral snapshot - Writes the output data to a Tonic Ephemeral user snapshot.
Container repository - Writes the output data to a data volume in a container repository.

For the file connector, you might need to provide a cloud storage location for the transformed files.

Writing to a destination database

When you write the output to a destination database, the destination database must be of the same type as the source database.

Structural does not create the destination database. It must exist before you generate data.

Upsert configuration

For data connectors that support upsert, when you write the output to a destination database, the connection details include an Upsert section to allow you to enable and configure upsert.

Upsert is not available for output to an Ephemeral database or to a container repository.

For more information, go to Enabling and configuring upsert.

Writing to a Tonic Ephemeral snapshot

Tonic Ephemeral is a separate Tonic.ai product that allows you to create temporary databases to use for testing and demos. For more information about Tonic Ephemeral, go to the .

From Ephemeral, you can use the snapshot to start new Ephemeral databases.

For more information, go to Writing data generation output to a Tonic Ephemeral snapshot.

Writing to a container repository

Some data connectors allow you to choose to write the transformed data to a data volume in a container repository instead of to a database server.

For more information, go to Writing data generation output to a container repository.

Output cloud storage location for the file connector

For a file connector workspace that transforms files from cloud storage (Amazon S3 or Google Cloud Storage), you provide the output location.

For more information, go to Configuring the file connector storage type and output options.

Testing database connections

Blocking data generation for all schema changes

Most data generation workspaces have a Block data generation if schema changes detected toggle. The setting is usually in the Source Settings section.

Workspace statistics seed for cross-run consistency

For details about using seed values to ensure consistency across data generation runs and databases, go to #enabling-consistency-across-runs-or-multiple-databases.

Uploading CSV files for data science mode

To indicate to use CSV files to provide the source data, for Connection Type, under Upload your own data, click CSV.

Adding CSV files

Under Add dataset files, to add files to the list, either:

Click Select files to upload, then select the files.
Drag and drop the files from your machine.

You cannot upload a file with the same name as an existing file in the list. To replace the data in an existing file, you must delete the file and then upload the updated file.

Configuring an uploaded file

To configure the options for a file:

If the file includes a heading row, then toggle Treat first row as column header to the on position.
In the Column Delimiter field, provide the character that is used as delimiter. The default is a comma.
In the Escape Character field, provide the character that is used to escape characters. The default is a backslash (\).
In the Quote Character field, provide the character that is used to quote text. The default is the double quote.
In the NULL Character field, provide the text used to indicate a null value. The default is \N.
To display a preview of the data in the file, click Expand.

Removing a file

To remove a file, click Remove.

Writing data generation output to a container repository

Requires Kubernetes.

For information about required Kubernetes permissions, go to Required access to write destination data to container artifacts.

Not compatible with upsert.

Not compatible with Preserve Destination or Incremental table modes.

Only supported for PostgreSQL, MySQL, and SQL Server.

You can configure a workspace to write destination data to a container repository instead of a database server.

For an overview of writing destination data to container artifacts, you can also view the video tutorial.

Indicating to write destination data to container artifacts

Under Destination Settings, to indicate to write the destination data to container artifacts, click Container Repository.

For a Structural instance that is deployed on Docker, unless you set up a separate Kubernetes cluster, the Container Repository option is hidden.

Identifying the base image to use to create the container artifacts

From the Database Image dropdown list, select the image to use to create the container artifacts.

Select an image version that is compatible with the version of the database that is used in the workspace.

Providing a customization file for MySQL

For a MySQL workspace, you can provide a customization file that helps to ensure that the temporary destination database is configured correctly.

To provide the customization details:

Toggle Use customization to the on position.
In the text area, paste the contents of the customization file.

Setting the location for the container artifacts

To provide the location where Structural publishes the container artifacts:

In the Registry field, type the path to the container registry where Structural publishes the data volume.
In the Repository Path field, provide the path within the registry where Structural publishes the data volume.

Providing the credentials to write to the registry

You next provide the credentials that Structural uses to read from and write to the registry.

When you provide the registry, Structural detects whether the registry is from Amazon Elastic Container Registry (Amazon ECR), Google Artifact Registry (GAR), or a different container solution.

It displays the appropriate fields based on the registry type.

Fields for registries other than Amazon ECR or GAR

For a registry other than an Amazon ECR or a GAR registry, the credentials can be either a username and access token, or a secret.

The option to use a secret is not available on Structural Cloud.

In general, the credentials must be for a user that has read and write permissions for the registry.

To use a username and access token:

Click Access token.
In the Username field, provide the username.
In the Access Token field, provide the access token.

To use a secret:

Click Secret name.
In the Secret Name field, provide the name of the secret.

Azure Container Registry (ACR) permission requirements

For ACR, the provided credentials must be for a service principal that has sufficient permissions on the registry.

For Structural, the service principal must at least have the permissions that are associated with the AcrPush role.

Providing a service file for GAR

Structural only supports Google Artifact Registry (GAR). It does not support Google Container Registry (GCR).

For a GAR registry, you upload a service account file, which is a JSON file that contains credentials that provide access to Google Cloud Platform (GCP).

The associated service account must have the Artifact Registry Writer role.

For Service Account File, to search for and select the file, click Browse.

Amazon ECR registries

For an Amazon ECR registry, you can either:

Provide the AWS access and secret key that is associated with the IAM user that will connect to the registry
(Self-hosted only) Use the credentials configured in the Structural environment settings TONIC_AWS_ACCESS_KEY_ID and TONIC_AWS_SECRET_ACCESS_KEY.
(Self-hosted only) If Structural is deployed in Amazon Elastic Kubernetes Service (Amazon EKS), then you can use the AWS credentials that live on the EC2 instance.

On Structural Cloud, you must provide an AWS access key and secret key.

On a self-hosted instance, you can choose the source of the credentials. The default is Access Keys.

To provide an AWS access key and secret key, click Access Keys.
To use the credentials configured in the environment settings, click Environment Variables.
To use the AWS credentials from the EC2 instance, click Instance Profile.

The IAM user must have permission to list, push, and pull images from the registry. The following example policy includes the required permissions.

{
  {
    "Sid": "ManageTonicRepositoryContents",
    "Effect": "Allow",
    "Action": [
      "ecr:DescribeRepositories",
      "ecr:ListImages",
      "ecr:DescribeImages",
      "ecr:BatchGetImage",
      "ecr:BatchCheckLayerAvailability",
      "ecr:InitiateLayerUpload",
      "ecr:UploadLayerPart",
      "ecr:CompleteLayerUpload",
      "ecr:PutImage"
    ],
    "Resource": [
       "arn:aws:ecr:<region>:<account_id>:repository/<optional name filter>"
    ]
  },
  {
    "Sid": "GetAuthorizationToken",
    "Effect": "Allow",
    "Action": [
      "ecr:GetAuthorizationToken"
    ],
    "Resource": "*"
  }
}

For example, you could prefix Structural repository names with tonic-. In the policy, you include a filter based on the tonic- prefix:

"Resource": [
  "arn:aws:ecr:<region>:<account_id>:repository/tonic-*"
]

Providing tags for the container artifacts

In the Tags field, provide the tag values to apply to the container artifacts. You can also change the tag configuration for individual data generation jobs.

Use commas to separate the tags.

A tag cannot contain spaces. Structural provides the following built-in values for you to use in tags:

{workspaceId} - The identifier of the workspace.
{workspaceName} - The name of the workspace.
{timestamp} - The timestamp when the data generation job that created the artifact completed.
{jobId} - The identifier of the data generation job that created the artifact.

For example, the following creates a tag that contains the workspace name, job identifier, and timestamp:

{workspaceName}_{jobId}_{timestamp}

To also tag the artifacts as latest, check the Tag as "latest" in your repository checkbox.

Specifying custom resources for the Kubernetes pods

You can also optionally configure custom resource values for the Kubernetes pods. You can specify the ephemeral storage, memory, and CPU millicores.

To provide custom resources:

Toggle Set custom pod resources to the on position.
Under Storage Size:
1. In the field, provide the number of megabytes or gigabytes of storage.
2. From the dropdown list, select the unit to use.
The storage can be between 32MB and 25GB.
Under Memory Size:
1. In the field, provide the number of megabytes or gigabytes of RAM.
2. From the dropdown list, select the unit to use.
The memory can be between 512MB and 4 GB.
Under Processor Size:
1. In the field, provide the number of millicores.
2. From the dropdown list, select the unit.
The processor size can be between 250m and 1000m.

Enabling and configuring upsert

Required license: Professional or Enterprise

Not compatible with writing output to a container repository or a Tonic Ephemeral snapshot.

By default, Tonic Structural data generation replaces the existing destination database with the transformed data from the current job.

If you enable upsert, then you cannot write the destination data to a container repository or to a Tonic Ephemeral snapshot. You must write the data to a database server.

Upsert is currently only supported for the following data connectors:

MySQL
Oracle
PostgreSQL
SQL Server

For an overview of upsert, you can also view the video tutorial.

About the upsert process

The destination database must already exist. Structural cannot run an upsert job to an empty destination database.

The upsert job adds and updates records based on the primary keys.

If the primary key for a record already exists in the destination database, the upsert job updates the record.
If the primary key for a record does not exist in the destination database, the upsert job inserts a new row.

Enabling upsert

To enable upsert, in the Upsert section of the workspace details, toggle Enable Upsert to the on position.

When you enable upsert for a workspace, you are prompted to configure the upsert processing and provide the connection details for the intermediate database.

Configuring upsert processing

When you enable upsert, Structural displays the following settings to configure the upsert process.

Connecting to migration scripts for schema changes

Required license: Enterprise

The intermediate database must have the same schema as the destination database. If the schemas do not match, then the upsert process fails.

How upsert works with the migration process

When you start an upsert data generation job:

If migration is enabled, Structural calls the endpoint to start the migration.
Structural cannot start the upsert data generation until the migration completes successfully. It regularly calls the status check endpoint to check whether the migration is complete.
When the migration is complete, Structural starts the upsert data generation.

POST Start Schema Changes endpoint

Required. Structural calls this endpoint to start the migration process specified by the provided URL.

The request includes:

Any custom parameter values that you add.
The connection information for the intermediate database.

The request uses the following format:

{ 
  "parameters": {/* user supplied parameters */ },
  "databaseConnectionDetails": {
        "server": "rds.amazon.com",
        "port": "54321",
        "username": "user",
        "password": "password",
        "databaseName": "tonic_upsert",
        "schemaName": "<Oracle schema to use>",
        "sslEnabled": true,
        "trustServerCertificate": false
  }
}

The response contains the identifier of the migration task.

The response uses the following format:

{ "id": "<unique-string-identifier>" }

GET Status of Schema Change endpoint

Required. Structural calls this endpoint to check the current status of the migration process.

The request includes the task identifier that was returned when the migration process started. The request URL must be able to pass the request identifier as either a path or query parameter.

The response provides the current status of the migration task. The possible status values are:

Unknown
Queued
Running
Canceled
Completed
Failed

The response uses the following format:

{
  "id": "a0c5c4c3-a593-4daa-a935-53c45ec255ea",
  "status": "Completed",
  "errors": []
}

GET Schema Change Logs endpoint

Optional. Structural calls this endpoint to retrieve the log entries for the migration process. It adds the migration logs to the upsert logs.

The request includes the task identifier that was returned when the migration process started. The request URL must be able to pass the request identifier as either a path or query parameter

The response body of the request should be 'text/plain'. It contains the raw logs.

DELETE Cancel Schema Changes endpoint

Optional. Structural calls this endpoint to cancel the migration process.

The request includes the task identifier that was returned when the migration process started. The request URL must be able to pass the request identifier as either a path or query parameter.

Enabling and configuring the migration process

To enable the migration process, toggle Enable Migration Service to the on position.

When you enable the migration process, you must configure the POST Start Schema Changes and GET Status of Schema Change endpoints.

You can optionally configure the GET Schema Change Logs and DELETE Cancel Schema Changes endpoints.

To configure the endpoints:

To configure the POST Start Schema Changes endpoint:
1. In the URL field, provide the URL of the migration script.
2. Optionally, in the Parameters field, provide any additional parameter values that your migration scripts need.
To configure the GET Status of Schema Change endpoint, in the URL field, provide the URL for the status check.
The URL must include an {id} placeholder. This is used to pass the identifier that is returned from the Start Schema Changes endpoint.
To configure the GET Schema Change Logs endpoint, in the URL field, provide the URL to use to retrieve the logs. The URL must include an {id} placeholder. This is used to pass the identifier that is returned from the Start Schema Changes endpoint.
To configure the DELETE Cancel Schema Changes endpoint, in the URL field, provide the URL to use for the cancellation. The URL must include an {id} placeholder. This is used to pass the identifier that is returned from the Start Schema Changes endpoint.

Connecting to the intermediate database

When you enable upsert, you must provide the connection information for the intermediate database.

For details, go to the workspace configuration information for the data connector.