Workspace configuration settings

The workspace details for a new or edited workspace specify information about the workspace and the workspace data.

Common workspace fields

All workspaces have the following fields, used to identify the workspace and indicate the connector type:

  1. In the Workspace name field, enter the name of the workspace.

  2. In the Workspace description field, provide a brief description of the workspace. The description can contain up to 200 characters.

  3. In the Tags field, provide a comma-separated list of tags to assign to the workspace. For more information on managing tags, go to Assigning tags to a workspace.

Workspace type (data generation or data science mode)

Depending on your Tonic Structural license agreement, you can either:

Under Data Science Mode, the Enable Data Science Mode toggle determines whether the workspace is a data generation workspace or a data science mode workspace.

  • If your instance only supports data generation workspaces, then the toggle is not displayed.

  • If your instance only supports data science mode workspaces, then the toggle is displayed and locked in the on position.

  • If your instance supports both data generation and data science mode workspaces, then the toggle is displayed. By default, it is in the off position, indicating to create a data generation workspace. To create a data science mode workspace, toggle Enable Data Science Mode to the on position.

Connection type

Under Connection Type, select the type of database to connect to. You cannot change the connection type on a child workspace.

For data generation, the source and destination databases are always of the same type.

The Basic and Professional licenses limit the number and type of data connectors you can use.

  • A Basic instance can only use one data connector type, which can be either PostgreSQL or MySQL. After you create your first workspace, any subsequent workspaces must use the same data connector type.

  • A Professional instance can use up two different data connector types, which can be any type other than Oracle. After you create workspaces that use two different data connector types, any subsequent workspaces must use one of those data connector types.

For a data science mode workspace, there is also a CSV option, which allows you to use uploaded CSV files as the source of your model data.

If you don't see the database that you want to connect to, or you want to have different database types for your source and destination database, contact support@tonic.ai.

When you select a connector type, Structural updates the view to display the connection fields used for that connector type. The specific fields vary based on the connector type.

Source data location

After you select the connector type, you first configure the connection to the source data.

For a workspace that connects to a database, the Source Settings section provides connection information for the source database. For information about the source connection fields for a specific data connector, go to the workspace configuration topic for that connector type.

For a file connector workspace, which uses files for source data, the File Location section indicates where the source files are obtained from - a local file system, Amazon S3, or Google Cloud Storage. For more information, go to Configuring the file connector storage type and output options.

You cannot change the source data configuration for a child workspace.

Destination data location

For a data generation workspace, the Destination Settings section provides information about where and how Structural writes the output data from data generation.

For a data science mode workspace, you do not configure destination information.

For data connectors other than the file connector, depending on the connector type, you can write to either:

  • Destination database - Writes the output data to a destination database on a database server.

  • Ephemeral snapshot - Writes the output data to a Tonic Ephemeral user snapshot.

  • Container repository - Writes the output data to a data volume in a container repository.

For the file connector, you might need to provide a cloud storage location for the transformed files.

Writing to a destination database

When you write the output to a destination database, the destination database must be of the same type as the source database.

Structural does not create the destination database. It must exist before you generate data.

In Destination Settings, you provide the connection information for the destination database. For information about the destination database connection fields for a specific data connector, go to the workspace configuration topic for that connector type.

If available, the Copy Settings from Source allows you to copy the source connection details to the destination database, if both databases are in the same location. Structural does not copy the connection password.

Upsert configuration

For data connectors that support upsert, when you write the output to a destination database, the connection details include an Upsert section to allow you to enable and configure upsert.

Upsert is not available for output to an Ephemeral database or to a container repository.

For more information, go to Enabling and configuring upsert.

Writing to a Tonic Ephemeral snapshot

If Ephemeral supports your workspace database type, then you can choose to write the destination data to a snapshot in Ephemeral. For data larger than 10 GB, this option is recommended instead of writing to a container repository.

From Ephemeral, you can use the snapshot to start new Ephemeral databases.

For more information, go to Writing data generation output to a Tonic Ephemeral snapshot.

Writing to a container repository

Some data connectors allow you to choose to write the transformed data to a data volume in a container repository instead of to a database server.

You can use the resulting data volume to create a database in Tonic Ephemeral. If you do plan to use the data to start an Ephemeral database, and the size of the data is larger than 10 GB, then the recommendation is to write the data to an Ephemeral user snapshot instead.

For more information, go to Writing data generation output to a container repository.

Output cloud storage location for the file connector

For a file connector workspace that transforms files from cloud storage (Amazon S3 or Google Cloud Storage), you provide the output location.

For more information, go to Configuring the file connector storage type and output options.

Testing database connections

Whenever you provide connection details for a database server, Structural provides a Test Connection button to test the connection, and verify that Structural can use the connection details to connect to the database. Structural uses the connection details to try to reach the database, and indicates whether it succeeded or failed. We strongly recommend that you test the connections.

The environment setting TONIC_TEST_CONNECTION_TIMEOUT_IN_SECONDS determines the number of seconds before a connection test times out. You can configure this setting from the Environment Settings tab on Tonic Settings. By default, the connection test times out after 15 seconds.

Blocking data generation for all schema changes

Most data generation workspaces have a Block data generation if schema changes detected toggle. The setting is usually in the Source Settings section.

By default, the option is turned off. When the option is off, Structural only blocks data generation when there are conflicting schema changes. Structural does not block data generation when there are non-conflicting schema changes.

If this option is turned on, then if Structural detects any changes at all to the schema, then data generation is blocked until you resolve the schema changes. For more information, go to Viewing and resolving schema changes.

Workspace statistics seed for cross-run consistency

For generators where consistency is enabled, a statistics seed enables consistency across data generation runs. The Structural-wide statistics seed value ensures consistency across both data generation runs and workspaces.

In the workspace configuration, under Destination Settings, use the Override Statistics Seed setting to override the Structural-wide statistics seed value. You can either disable consistency across data generations, or provide a seed value for the workspace. The workspace seed value ensures consistency across data generation runs for that workspace, and across other workspaces that have the same seed value.

Uploading CSV files for data science mode

For a data science mode workspace, instead of connecting to a database, you can upload one or more CSV files that contain the data that you want to use. Each file that you upload becomes a table in your source data. You can then issue model queries against the data.

To indicate to use CSV files to provide the source data, for Connection Type, under Upload your own data, click CSV.

Adding CSV files

Under Add dataset files, to add files to the list, either:

  • Click Select files to upload, then select the files.

  • Drag and drop the files from your machine.

You cannot upload a file with the same name as an existing file in the list. To replace the data in an existing file, you must delete the file and then upload the updated file.

Configuring an uploaded file

To configure the options for a file:

  1. If the file includes a heading row, then toggle Treat first row as column header to the on position.

  2. In the Column Delimiter field, provide the character that is used as delimiter. The default is a comma.

  3. In the Escape Character field, provide the character that is used to escape characters. The default is a backslash (\).

  4. In the Quote Character field, provide the character that is used to quote text. The default is the double quote.

  5. In the NULL Character field, provide the text used to indicate a null value. The default is \N.

  6. To display a preview of the data in the file, click Expand.

Removing a file

To remove a file, click Remove.

Last updated