Tonic.ai recommends that you create separate accounts for the source and destination databases, even if the databases are in the same Snowflake account. This allows each account to have the minimum permissions needed in the source and destination databases.
If you use the same account, then you must combine the relevant permissions for the source and destination databases into a single role. Snowflake only allows each account to have a single active primary role.
The below permissions create a role with the necessary permissions to act as the source database user. A user is then created and assigned into that role.
The below permissions give read-only data access to a specific database on all current and future schemas, tables, and sequences.
It also grants the role access to a specified warehouse.
The destination database must exist before Structural can connect to it. The user that you provide to Tonic Structural for connecting to the destination database should be the owner of the database and of all of the objects in the database.
We suggest that you first create a specific Structural destination database user. Then create the destination database from that user's account. If you create the database with another account such as ACCOUNTADMIN
, then you must transfer ownership of the database and all of its objects to the new account.
As part of a Structural data generation job, this user must be able to:
DROP and then create schemas on the output database
Copy data from Azure Blob Storage into tables in the database
The below permissions create a role with the necessary permissions to act as the destination database user. A user is then created and assigned into that role.
The following permissions gives full access to the destination database. They also grant the role access to a specified warehouse. To accommodate a situation where the database was created by another user such as ACCOUNTADMIN
, this includes the required transfer of ownership to the role.
Structural requires that the parameter QUOTED_IDENTIFIERS_IGNORE_CASE = false
at either the account or user level.
To set it at the user level, run:
Structural uses an Azure storage account to load and unload data to and from Azure Blob Storage. You provide the storage account name in the workspace configuration.
You provide the access key for the storage account as the value of the environment setting TONIC_AZURE_BLOB_STORAGE_ACCOUNT_KEY
. See Configuring environment settings.
You can obtain the access key from the storage account portal under Access keys. Structural uses this key to authorize API calls when it unloads and loads data to and from Azure Blob Storage.
By default, during each data generation job, Structural creates the database schema for the destination database tables, then populates the database tables based on the workspace configuration.
If you prefer to manage the destination database schema yourself, then set the environment setting TONIC_SNOWFLAKE_SKIP_CREATE_DB
to true
. You can add this setting manually to the Environment Settings list on Structural Settings.
The environment setting applies to both Snowflake on AWS and Snowflake on Azure.
When TONIC_SNOWFLAKE_SKIP_CREATE_DB
is true
, then Structural does not create the destination database schema. Before you run data generation, you must create the destination database with the full schema.
During data generation, Structural deletes the data from the destination database tables, except for tables that use Preserve Destination mode. It then populates the tables with the new destination data.
Required license: Professional or Enterprise
Snowflake on Azure workspaces cannot use the following table modes:
Scale
Incremental
Snowflake on Azure workspaces cannot use the following generators:
Algebraic
Array JSON Mask
Array Regex Mask
Cross Table Sum
Current Date
Event Timestamps
Geo
Sequential Integer
Snowflake on Azure workspaces support both subsetting and table filtering.
Table filtering means that for tables that use the De-Identify table mode, you can provide a WHERE
clause to filter the table. See Using table filtering for data warehouses and Spark-based data connectors.
Snowflake on Azure workspaces do not support upsert.
For Snowflake on Azure workspaces, you cannot write the destination data to container artifacts.
For Snowflake on Azure workspaces, you cannot write the destination data to an Ephemeral snapshot.
Snowflake is a cloud-based data warehousing platform.
Tonic Structural supports moving data from one database to another within a single Snowflake instance. Structural can also move data between Snowflake instances.
For the Snowflake on Azure data connector, Structural uses Azure Blob Storage as an intermediate stage to host both the original data and the masked data.
The following high-level diagram describes how Tonic Structural orchestrates the processing and moving of data in Snowflake on Azure.
Structural orchestrates the moving and transforming of data between Snowflake databases that are hosted on Azure. Structural uses Azure Blob Storage for interim storage of files that contain the source and destination data.
At a high level, the data generation process is:
Structural copies the table data from the Snowflake database to files in Azure Blob Storage. You specify the container path in the Structural workspace configuration. Structural places the files in an input folder within the container path.
Structural applies the configured generators to the data in the files, then writes the resulting files to an output folder in the container path.
As it finishes processing each file, Structural copies the data from the container path’s output folder into the Snowflake destination database.
Structural process for Snowflake on Azure
How Structural data generation works with Snowflake on Azure
Structural differences and limitations
Features that are unavailable or work differently in Snowflake on Azure workspaces
Required Snowflake configuration
Configure a required environment setting
Configure workspace data connections
Data connection settings for Snowflake on Azure workspaces
In the workspace configuration, under Connection Type, select Snowflake.
In the Source Settings section, under Snowflake Type, click Azure.
In the Source Settings section, provide the details about connecting to the source database.
To connect to the source database, you can either populate the connection fields or use a connection string.
You can also opt to use key pair authentication to connect to the database.
By default, Use connection string is off, and you provide the connection values in the individual fields:
In the Server field, provide the server where the database is located.
You must provide the full path to the server. https://
is optional.
So the format of the server value can be either:
<account>.<region>.snowflakecomputing.com
https://<account>.<region>.snowflakecomputing.com
For example: abc123456.us-east-1.snowflakecomputing.com
or https://abc123456.us-east-1.snowflakecomputing.com
In the Database field, provide the name of the database.
In the Username field, provide the username for the account to use to connect to the database.
In the Password field, provide the password for the specified username.
To use a connection string to connect to the source database:
Toggle Use connection string to the on position.
In the Connection String field, provide the connection string.
In the Password field, provide the password for the user.
To test the connection to the source database, click Test Source Connection.
The connection string uses the following format:
Instead of providing a password, you can instead use key pair authentication.
To do this:
Toggle Use Key Pair Authentication to the on position.
Expand the Key Pair Authentication Settings.
For RSA Private Key, click Browse, then select the key file.
If the key is encrypted, then in the Encrypted Key Passphrase field, provide the passphrase to use to decrypt the key.
To trust the server certificate, and ignore the certificate authority's revocation list, toggle Trust Server Certificate to the on position.
This option can be useful when your Tonic Structural instance cannot connect to the certificate authority.
You can use a proxy server to connect to the source database.
When you use a connection string to connect to the source database, Structural automatically adds the configured proxy connection parameters to the connection string.
If you manually include proxy connection parameters in the connection string, and also configure the proxy connection settings, the connection string will have duplicate proxy connection parameters.
We recommend that you use the configuration fields to enable the proxy connection, and do not include proxy connection parameters in the connection string.
To use a proxy server to connect to the source database:
Toggle Enable proxy connection to the on position.
In the Proxy Host field, provide the host name for the proxy connection.
In the Proxy Port field, provide the port for the proxy connection.
Optionally, in the Proxy User field, provide the name of the user for the proxy connection.
If you provide a proxy user, then in the Proxy Password field, provide the password for the specified user.
Optionally, in the Non-Proxy Hosts field, provide the list of hosts for which to bypass the proxy server and connect to directly.
Use a pipe symbol (|
) to separate the host names. For example, host1|host2|host3
.
You can also use an asterisk (*
) as a wildcard. For example, to connect directly to all hosts whose host names start with myhost, use myhost*
.
By default, the source database includes all of the schemas. To specify a list of specific schemas to either include or exclude:
Toggle Limit Schemas to the on position.
From the filter option dropdown list, select whether to include or exclude the listed schemas.
In the field, provide the list of schemas to either include or exclude. Use commas or semicolons to separate the schemas.
Do not exclude schemas that are referred to by included schemas, unless you create those schemas manually outside of Structural.
To test the connection to the source database, click Test Source Connection.
By default, data generation is not blocked as long as schema changes do not conflict with your workspace configuration.
To block data generation when there are any schema changes, regardless of whether they conflict with your workspace configuration, toggle Block data generation on schema changes to the on position.
In the Destination Settings section, you specify the connection information for the destination database.
If the destination database is in the same location as the source database, then you can copy the connection and authentication details from the source database. The copied details include the proxy connection configuration.
If the destination database is in a different location, then you can either populate the connection fields or use a connection string.
You can also opt to use key pair authentication to connect to the database.
To copy the connection details from the source database:
Click Copy Settings from Source.
In the Password field, provide the password.
To test the connection to the destination database, click Test Destination Connection.
If you don't copy the details from the source database, then you can either populate the connection fields or use a connection string.
By default, Use connection string is off, and you provide the connection values in the individual fields:
In the Server field, provide the server where the database is located.
You must provide the full path to the server. https://
is optional.
So the format of the server value can be either:
<account>.<region>.snowflakecomputing.com
https://<account>.<region>.snowflakecomputing.com
For example: abc123456.us-east-1.snowflakecomputing.com
or https://abc123456.us-east-1.snowflakecomputing.com
In the Database field, provide the name of the database.
In the Username field, provide the username for the account to use to connect to the database.
In the Password field, provide the password for the specified username.
To use a connection string to connect to the destination database:
Toggle Use connection string to the on position.
In the Connection String field, provide the connection string.
In the Password field, provide the password for the user.
The connection string uses the following format:
To trust the server certificate, and ignore the certificate authority's revocation list, toggle Trust Server Certificate to the on position.
This option can be useful when your Structural instance cannot connect to the certificate authority.
You can use a proxy server to connect to the destination database.
When you use a connection string to connect to the destination database, Structural adds the proxy connection parameters to the connection string.
If you manually include proxy connection parameters in the connection string, and also configure the proxy connection settings, the connection string will have duplicate proxy connection parameters.
We recommend that you use the configuration fields to enable the proxy connection, and do not include proxy connection parameters in the connection string.
To enable and configure the proxy connection:
Toggle Enable proxy connection to the on position.
In the Proxy Host field, provide the host name for the proxy connection.
In the Proxy Port field, provide the port for the proxy connection.
Optionally, in the Proxy User field, provide the name of the user for the proxy connection.
If you provide a proxy user, then in the Proxy Password field, provide the password for the specified user.
Optionally, in the Non-Proxy Hosts field, provide the list of hosts for which to bypass the proxy server and connect to directly.
Use a pipe symbol (|
) to separate the host names. For example, host1|host2|host3
.
You can also use an asterisk (*
) as a wildcard. For example, to connect directly to all hosts whose host names start with myhost, use myhost*
.
Instead of providing a password, you can instead use key pair authentication.
To do this:
Toggle Use Key Pair Authentication to the on position.
Expand the Key Pair Authentication Settings.
For RSA Private Key, click Browse, then select the key file.
If the key is encrypted, then in the Encrypted Key Passphrase field, provide the passphrase to use to decrypt the key.
To test the connection to the destination database, click Test Destination Connection.
During data generation, Structural uses temporary CSV files to load and unload Snowflake tables.
To store these temporary files, you can use either:
Azure Blob Storage
External stages
During data generation, Structural copies CSV files containing the source data to an input
folder in the storage location. After the generators are applied, Structural copies CSV files containing the destination data to an output
folder.
Under File Storage Options, by default, the Use External Stage toggle is off, which indicates to use Azure Blob Storage for the temporary files.
To instead use external stages, toggle Use External Stage to the on position.
When Use External Stage is off, in the Source Azure Storage Account Name field, provide the name of the account that Structural uses to load and unload Snowflake data in Azure Blob Storage. Do not provide the URI. You only need the account name.
In the Source Azure Blob Storage Path field, provide the container path to the Azure Blob Storage location where Structural places temporary CSV files that it uses to load and unload Snowflake tables.
When Use External Stage is on, in the Source Snowflake External Stage Name field, provide the path to the external stage to use for temporary files.
The format is:
<database>.<schema>.<stage>
Where:
<database>
is the name of the database where the stage is located.
<schema>
is the name of the schema that contains the stage.
<stage>
is the name of the stage.