1 of 9

Snowflake on AWS

Snowflake is a cloud-based data warehousing platform that is built on top of AWS.

Tonic Structural can move data from one database to another within a single Snowflake instance. Structural can also move data between Snowflake instances.

Structural uses an S3 bucket or a Snowflake external stage as an intermediate location to host the source data and destination data.

Structural process overviews for Snowflake on AWS

The following high-level diagrams describe how Tonic Structural orchestrates the processing and moving of data in a Snowflake on AWS.

Structural manages the lifetimes of data and resources used in AWS. It only requires you to assign the necessary permissions to the IAM role that Structural uses.

Default process

By default, Structural uses the following data generation process:

At a high level:

Structural copies the table data into either an S3 bucket or an external stage as CSV files. You specify the S3 bucket path or stage in the Structural workspace configuration. If you use a single location for both source and destination data, Structural copies the data files into an input folder.
Structural applies the configured generators to the data in the files, then writes the resulting files to an S3 bucket or external stage. If you use a single location for both source and destination data, Structural copies the data files into an output folder.
After it processes all of the files, Structural copies the data from the S3 bucket or external stage into the Snowflake destination database.

Lambda process (for extremely high data volumes)

The default process cannot support data generation on data volumes that are hundreds of gigabytes or larger.

For those larger data volumes, the data generation process can use a Lambda function, Amazon S3, and Amazon SQS events:

Structural creates a Lambda function for your version of Structural. This occurs once for each version of Structural. The Lambda function is created when you run your first data generation job after you install or update Structural.
Structural creates an Amazon SQS queue and Amazon S3 event triggers. This occurs once for each job. The resource names are scoped to the specific data generation job.
Structural copies table data into Amazon S3 as CSV files. You specify the S3 bucket path in the Structural workspace configuration. Within the S3 bucket, Structural copies the data files into an input folder.
As files land in Amazon S3, Amazon S3 event notifications place messages in Amazon SQS. Messages in Amazon SQS trigger Lambda function invocations. By default, each file placed in Amazon S3 has a maximum file size of 16MB. Each Lambda invocation processes a single file. Lambda processes each file and writes the files to an output folder in the S3 bucket.
After it processes all of files for a table, Structural copies the data back into Snowflake, into the destination database.
After it processes all of the tables, Structural removes ephemeral AWS components such as the event notifications for Amazon SQS and Amazon S3.

Structural differences and limitations with Snowflake on AWS

Required license: Professional or Enterprise

Table mode limitations

Snowflake on AWS workspaces cannot use the following table modes:

Scale
Incremental

Generator limitations

Snowflake on AWS workspaces cannot use the following generators:

Algebraic
Array JSON Mask
Array Regex Mask
Cross Table Sum
Current Date
Event Timestamps
Geo
Sequential Integer

Support for table filtering

Snowflake on AWS workspaces support both subsetting and table filtering.

Table filtering means that for tables that use the De-Identify table mode, you can provide a WHERE clause to filter the table. For details, go to Using table filtering for data warehouses and Spark-based data connectors.

No upsert

Snowflake on AWS workspaces do not support upsert.

No output to a container repository

For Snowflake on AWS workspaces, you cannot write the destination data to a container repository.

No output to an Ephemeral snapshot

For Snowflake on AWS workspaces, you cannot write the destination data to an Ephemeral snapshot.

Before you create a Snowflake on AWS workspace

Before you create a workspace that uses the Snowflake on AWS data connector, complete the configuration outlined in these topics.

Required AWS instance profile permissions for Snowflake on AWS

When it uses Snowflake, Tonic Structural orchestrates the creation, usage, and deletion of several AWS components. The required permissions to do so are taken from the Instance Profile role of the machine that runs Structural's server. This role needs the below permissions.

For both types of Snowflake on AWS data generation, the instance profile role requires Amazon S3 permissions. For the Lambda-based processing, the instance profile role also requires Amazon SQS and Lambda permissions.

Note that these permissions are starting point. Based on your exact AWS setup, you might need to add other permissions. For example, if you use AWS Key Management Service (KMS) on your S3 buckets, then you might need to grant AWS KMS access.

These example policies allow Structural to properly orchestrate jobs in your AWS infrastructure. It assumes that you use default names for objects in AWS, and that your source and destination S3 bucket names begin with the tonic- prefix.

Required permissions for the default data generation

For the default data generation, the instance profile requires the following permissions:

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Action": [
                "s3:ListBucket"
            ],
            "Resource": [
                "arn:aws:s3:::tonic-*"
            ]
        },
        {
            "Effect": "Allow",
            "Action": [
                "s3:PutObject",
                "s3:GetObject",
                "s3:GetObjectVersion",
                "s3:DeleteObject",
                "s3:DeleteObjectVersion"
            ],
            "Resource": "arn:aws:s3:::tonic-*/*"
        },
        {
            "Effect": "Allow",
            "Action": "s3:ListBucket",
            "Resource": "arn:aws:s3:::tonic-*",
            "Condition": {
                "StringLike": {
                    "s3:prefix": [
                        "*"
                    ]
                }
            }
        }
    ]
}

Required permissions for the Lambda data generation

The Lambda data generation for Snowflake on AWS requires the following permissions:

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Action": [
                "sqs:ListQueues",
                "lambda:CreateEventSourceMapping",
                "lambda:DeleteEventSourceMapping",
                "lambda:GetEventSourceMapping",
                "lambda:ListFunctions"
            ],
            "Resource": "*"
        },
        {
            "Effect": "Allow",
            "Action": [
                "sqs:ChangeMessageVisibility",
                "sqs:CreateQueue",
                "sqs:DeleteMessage",
                "sqs:DeleteQueue",
                "sqs:GetQueueAttributes",
                "sqs:GetQueueUrl",
                "sqs:ListDeadLetterSourceQueues",
                "sqs:ListQueueTags",
                "sqs:ReceiveMessage",
                "sqs:SendMessage",
                "sqs:SetQueueAttributes",
                "s3:ListBucket",
                "s3:GetBucketNotification",
                "s3:PutBucketNotification",
                "lambda:CreateFunction",
                "lambda:GetFunctionConfiguration",
                "lambda:UpdateFunctionConfiguration"
            ],
            "Resource": [
                "arn:aws:sqs:*:*:tonic-*",
                "arn:aws:s3:::tonic-*",
                "arn:aws:lambda:*:*:function:tonic-*"

            ]
        },
        {
            "Effect": "Allow",
            "Action": [
                "iam:PassRole"
            ],
            "Resource": [
                <ARN to Tonic Lambda Role (see below docs)>
            ]
        },
        {
            "Effect": "Allow",
            "Action": [
                "s3:PutObject",
                "s3:GetObject",
                "s3:GetObjectVersion",
                "s3:DeleteObject",
                "s3:DeleteObjectVersion"
            ],
            "Resource": "arn:aws:s3:::tonic-*/*"
        },
        {
            "Effect": "Allow",
            "Action": [
                "logs:DescribeLogStreams",
                "logs:GetQueryResults",
                "logs:GetLogEvents",
                "logs:FilterLogEvents",
                "logs:StartQuery",
                "logs:StopQuery"
            ],
            "Resource": ["arn:aws:lambda:${region}:${account_id}:log-group:/aws/lambda/tonic-*"]
        },
        {
            "Effect": "Allow",
            "Action": "s3:ListBucket",
            "Resource": "arn:aws:s3:::tonic-*",
            "Condition": {
                "StringLike": {
                    "s3:prefix": [
                        "*"
                    ]
                }
            }
        }
    ]
}

Other configuration for Lambda processing

For the Lambda version of Snowflake on AWS data generation, you must also configure the AWS Lambda role and related Tonic Structural environment settings.

Setting up the AWS Lambda role for Snowflake on AWS

Creating the role

The AWS Lambda function that Structural sets up requires an AWS role. The name of this role is configured in the following environment setting:

TONIC_LAMBDA_ROLE

The policy for this role should look like this:

{
	"Version": "2012-10-17",
	"Statement": [{
		"Sid": "VisualEditor0",
		"Effect": "Allow",
		"Action": [
			"s3:PutObject",
			"s3:GetObject",
			"s3:ListBucket",
			"sqs:ReceiveMessage",
			"sqs:GetQueueAttributes",
			"sqs:SendMessage",
			"sqs:DeleteMessage",
			"logs:CreateLogGroup",
			"logs:PutLogEvents",
			"logs:CreateLogStream"
		],
		"Resource": [
			"arn:aws:sqs:*:<aws account id>:tonic-*",
			"arn:aws:s3:::tonic-*",
			"arn:aws:logs:*:*:*"
		]
	}]
}

The above policy grants the Lambda function the required access to Amazon SQS, Amazon S3, and CloudWatch.

This policy assumes that the S3 buckets and Amazon SQS queues that are used begin with the tonic- prefix.

Enabling Lambda to assume the role

After you create the role, you must allow the Lambda service to assume the role.

For the role, the Trust relationships in the AWS IAM role should be configured to look like the following:

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Principal": {
                "Service": "lambda.amazonaws.com"
            },
            "Action": "sts:AssumeRole"
        }
    ]
}

Configuring Structural environment settings for Lambda processing

Structural allows you to set the following Snowflake-specific environment settings that make it easier to adapt our Snowflake integration into your specific AWS environment. You configure these settings in the Structural worker container.

# No default value
# This setting is required to be set by user
# ARN of AWS Role to be assumed by Structural's Lambda function
TONIC_LAMBDA_ROLE

# Default value of 30 secs
# Timeout of Lambda used to process data files
# Maximum allowed duration of Lambda function is 15 min
TONIC_LAMBDA_TIMEOUT

# Default value of 1024MB
# Memory limit of Lambda used to process data files
# Maximum allowed memory of Lambda function is 10240 MB
TONIC_LAMBDA_MEMORY_SIZE

# Default value of 30 secs
# Visibility of Amazon SQS, which stores messages sent to Lambda
# Note that this value must be >= TONIC_LAMBDA_TIMEOUT
TONIC_LAMBDA_SQS_VISIBILITY_TIMEOUT

# No default value
# This setting is required to be set by user if using AWS KMS encryption
# AWS KMS Key ID for encrypting messages sent to Amazon SQS
TONIC_LAMBDA_KMS_MASTER_KEY

Source and destination database permissions for Snowflake on AWS

Tonic.ai recommends that you create separate accounts for the source and destination databases, even if the databases are in the same Snowflake account. This allows each account to have the minimum permissions needed in the source and destination databases.

If you use the same account, then you must combine the relevant permissions for the source and destination databases into a single role. .

User permissions on the source database

The below permissions create a role with the necessary permissions to act as the source database user. A user is then created and assigned into that role.

The below permissions give read-only data access to a specific database on all current and future schemas, tables, and sequences.

It also grants the role access to a specified warehouse.

CREATE ROLE TONIC_SOURCE_DATABASE_ROLE;

GRANT USAGE ON WAREHOUSE <warehouse name> TO ROLE TONIC_SOURCE_DATABASE_ROLE;

GRANT USAGE ON DATABASE <source database name> TO ROLE TONIC_SOURCE_DATABASE_ROLE;

GRANT USAGE ON ALL SCHEMAS IN DATABASE <source database name> TO ROLE TONIC_SOURCE_DATABASE_ROLE;
GRANT USAGE ON FUTURE SCHEMAS IN DATABASE <source database name> TO ROLE TONIC_SOURCE_DATABASE_ROLE;

GRANT SELECT ON FUTURE TABLES IN DATABASE <source database name> TO ROLE TONIC_SOURCE_DATABASE_ROLE;
GRANT SELECT ON ALL TABLES IN DATABASE <source database name> TO ROLE TONIC_SOURCE_DATABASE_ROLE;

GRANT USAGE ON FUTURE SEQUENCES IN DATABASE <source database name> TO ROLE TONIC_SOURCE_DATABASE_ROLE;
GRANT USAGE ON ALL SEQUENCES IN DATABASE <source database name> TO ROLE TONIC_SOURCE_DATABASE_ROLE;

CREATE USER <user name> password='<password>' default_role = TONIC_SOURCE_DATABASE_ROLE default_warehouse = <warehouse name>;

GRANT ROLE TONIC_SOURCE_DATABASE_ROLE TO USER <user name>;


-- Structural requires this parameter to be set at either the account or user level.
ALTER USER <user name> set QUOTED_IDENTIFIERS_IGNORE_CASE = false;

-- If using a source external stage
GRANT USAGE ON STAGE <source stage name> TO ROLE TONIC_SOURCE_DATABASE_ROLE;

User permissions on the destination database

The destination database must exist before Structural can connect to it. The user that you provide to Structural for the destination database connection should be the owner of the database and of all of the objects in the database.

We suggest that you first create a specific Structural destination database user. Then create the destination database from that user's account. If you create the database with another account such as ACCOUNTADMIN, then you must transfer ownership of the database and all of its objects to the new account.

As part of a Structural data generation job, this user must be able to:

DROP and then create schemas on the destination database.
Copy data from Amazon S3 into tables in the destination database.

The below permissions create a role that has the necessary permissions to act as the destination database user. A user is then created and assigned into that role.

The following permissions gives full access to the destination database. They also grant the role access to a specified warehouse. To accommodate a situation where the database was created by another user such as ACCOUNTADMIN, this includes the required transfer of ownership to the role.

CREATE ROLE TONIC_DESTINATION_DATABASE_ROLE;

GRANT OWNERSHIP ON DATABASE <destination database name> TO TONIC_DESTINATION_DATABASE_ROLE;
GRANT USAGE ON WAREHOUSE <warehouse name> TO TONIC_DESTINATION_DATABASE_ROLE;

CREATE USER <user name> password='<password>' default_role = TONIC_DESTINATION_DATABASE_ROLE default_warehouse = <warehouse name>;
GRANT ROLE TONIC_DESTINATION_DATABASE_ROLE TO USER <user name>;

-- If the destination database was already created with another user, 
-- existing schemas (including default PUBLIC) must have ownership transferred
-- to the Structural user role.
GRANT OWNERSHIP on SCHEMA <destination database name>.PUBLIC to TONIC_DESTINATION_DATABASE_ROLE;
GRANT OWNERSHIP on SCHEMA <destination database name>.<additional schemas> to TONIC_DESTINATION_DATABASE_ROLE;


-- Structural requires this parameter to be set at either the account or user level.
ALTER USER <user name> set QUOTED_IDENTIFIERS_IGNORE_CASE = false;

-- If using a destination external stage
GRANT USAGE ON STAGE <destination stage name> TO ROLE TONIC_DESTINATION_DATABASE_ROLE;

Additional user configuration

Structural requires that the parameter QUOTED_IDENTIFIERS_IGNORE_CASE = false at either the account or user level.

To set it at the user level, run:

ALTER USER <user name> set QUOTED_IDENTIFIERS_IGNORE_CASE = false;

Configuring whether Structural creates the Snowflake on AWS destination database schema

By default, during each data generation job, Structural creates the database schema for the destination database tables, then populates the database tables based on the workspace configuration.

If you prefer to manage the destination database schema yourself, then set the TONIC_SNOWFLAKE_SKIP_CREATE_DB to true. You can add this setting manually to the Environment Settings list on Structural Settings.

The environment setting applies to both Snowflake on AWS and Snowflake on Azure.

When TONIC_SNOWFLAKE_SKIP_CREATE_DB is true, then Structural does not create the destination database schema. Before you run data generation, you must create the destination database with the full schema.

During data generation, Structural deletes the data from the destination database tables, except for tables that use Preserve Destination mode. It then populates the tables with the new destination data.

Configuring Snowflake on AWS workspace data connections

In the workspace configuration, under Connection Type, select Snowflake.

In the Source Settings section, under Snowflake Type, click AWS.

Connecting to the source database

In the Source Settings section, provide the details for the connection to the source database.

Providing the connection details

To connect to the source database, you can either:

Populate the connection fields.
Use a connection string.

You can also use key pair authentication instead of a password.

Populating the connection fields

By default, Use connection string is off, and you provide the connection values in the individual fields:

In the Server field, provide the server where the database is located. You must provide the full path to the server. https:// is optional. So the format of the server value can be either:
- <account>.<region>.snowflakecomputing.com
- https://<account>.<region>.snowflakecomputing.com
For example: abc123456.us-east-1.snowflakecomputing.com or https://abc123456.us-east-1.snowflakecomputing.com
In the Database field, provide the name of the database.
In the Username field, provide the username for the account to use to connect to the database.
For the user password, you can either specify the password manually, or you can select a secret name from a secrets manager. The selected secret must store a password. The secrets manager option only displays if at least one secrets manager is configured. For information about configuring the available secrets managers, go to Configuring secrets managers for database connections. To enter the password manually:
1. Click Provide Password.
2. In the password field, enter the password.
To use a secret name from a secrets manager:
1. Click Use Secrets Manager.
2. From the secrets manager dropdown list, select the secrets manager. Structural connects to the secrets manager and retrieves a list of available secret names.
3. From the secret name dropdown list, select the secret name.

Using a connection string

To use a connection string to connect to the source database:

Toggle Use connection string to the on position.
In the Connection String field, provide the connection string.
For the password, you can either specify the password manually, or you can select a secret name from a secrets manager. The selected secret must store a password. The secrets manager option only displays if at least one secrets manager is configured. For information about configuring the available secrets managers, go to Configuring secrets managers for database connections. To enter the password manually:
1. Click Provide Password.
2. In the password field, enter the password.
To use a secret name from a secrets manager:
1. Click Use Secrets Manager.
2. From the secrets manager dropdown list, select the secrets manager. Structural connects to the secrets manager and retrieves a list of available secret names.
3. From the secret name dropdown list, select the secret name.

The connection string uses the following format:

account=<account>;host=<account>.<region>.snowflakecomputing.com;user=<username>;db=<database>

Using key pair authentication for the connection

Instead of providing a password, you can use key pair authentication.

To do this:

Toggle Use Key Pair Authentication to the on position.
Expand the Key Pair Authentication Settings.
For RSA Private Key, click Browse, then select the key file.
If the key is encrypted, then in the Encrypted Key Passphrase field, provide the passphrase to use to decrypt the key.

Indicating whether to trust the server certificate

To trust the server certificate, and ignore the certificate authority's revocation list, toggle Trust Server Certificate to the on position.

This option can be useful when your Tonic Structural instance cannot connect to the certificate authority.

Enabling a proxy connection

You can use a proxy server to connect to the source database.

When you use a connection string to connect to the source database, Structural automatically adds the configured proxy connection parameters to the connection string.

If you manually include proxy connection parameters in the connection string, and also configure the proxy connection settings, the connection string will have duplicate proxy connection parameters.

We recommend that you use the configuration fields to enable the proxy connection, and do not include proxy connection parameters in the connection string.

To use a proxy server to connect to the source database:

Toggle Enable proxy connection to the on position.
In the Proxy Host field, provide the host name for the proxy connection.
In the Proxy Port field, provide the port for the proxy connection.
Optionally, in the Proxy User field, provide the name of the user for the proxy connection.
If you provide a proxy user, then in the Proxy Password field, provide the password for the specified user.
Optionally, in the Non-Proxy Hosts field, provide the list of hosts for which to bypass the proxy server and connect to directly.

Use a pipe symbol (|) to separate the host names. For example, host1|host2|host3.

You can also use an asterisk (*) as a wildcard. For example, to connect directly to all hosts with host names that start with myhost, use myhost*.

Limiting the included schemas

By default, the source database includes all of the schemas. To specify a list of specific schemas to either include or exclude:

Toggle Limit Schemas to the on position.
From the filter option dropdown list, select whether to include or exclude the listed schemas.
In the field, provide the list of schemas to either include or exclude. Use commas or semicolons to separate the schemas.

Do not exclude schemas that are referred to by included schemas, unless you create those schemas manually outside of Structural.

Testing the source connection

To test the connection to the source database, click Test Source Connection.

Blocking data generation on all schema changes

By default, data generation is not blocked for schema changes that do not conflict with your workspace configuration.

To block data generation when there are any schema changes, regardless of whether they conflict with your workspace configuration, toggle Block data generation on schema changes to the on position.

Enabling Lambda processing

The default data generation process for Snowflake on AWS cannot scale to extremely large volumes of data. For volumes of hundreds of gigabytes or larger, you must use the Lambda-based processing.

To enable Lambda processing, toggle Enable Lambda generation to the on position.

If you enable Lambda processing, make sure that you grant the required permissions to the IAM role, and complete the other required configuration.

Connecting to the destination database

In the Destination Settings section, you specify the connection information for the destination database.

If the destination database is in the same location as the source database, then you can copy the connection and authentication details from the source database. The copied details include the proxy connection configuration.

If the destination database is in a different location, then you can either:

Populate the connection fields.
Use a connection string.

You can also use key pair authentication instead of a password.

Copying the source database connection details

To copy the connection details from the source database:

Click Copy Settings from Source.
For the user password, you can either specify the password manually, or you can select a secret name from a secrets manager. The selected secret must store a password. The secrets manager option only displays if at least one secrets manager is configured. For information about configuring the available secrets managers, go to Configuring secrets managers for database connections. To enter the password manually:
1. Click Provide Password.
2. In the password field, enter the password.
To use a secret name from a secrets manager:
1. Click Use Secrets Manager.
2. From the secrets manager dropdown list, select the secrets manager. Structural connects to the secrets manager and retrieves a list of available secret names.
3. From the secret name dropdown list, select the secret name.
To test the connection to the destination database, click Test Destination Connection.

Providing destination database connection details

If you do not copy the details from the source database, then you can either populate the connection fields or use a connection string.

Populating the connection fields

By default, Use connection string is off, and you provide the connection values in the individual fields:

In the Server field, provide the server where the database is located. You must provide the full path to the server. The https:// is optional. So the format of the server value can be either:
- <account>.<region>.snowflakecomputing.com
- https://<account>.<region>.snowflakecomputing.com
For example: abc123456.us-east-1.snowflakecomputing.com or https://abc123456.us-east-1.snowflakecomputing.com
In the Database field, provide the name of the database.
In the Username field, provide the username for the account to use to connect to the database.
For the user password, you can either specify the password manually, or you can select a secret name from a secrets manager. The selected secret must store a password. The secrets manager option only displays if at least one secrets manager is configured. For information about configuring the available secrets managers, go to Configuring secrets managers for database connections. To enter the password manually:
1. Click Provide Password.
2. In the password field, enter the password.
To use a secret name from a secrets manager:
1. Click Use Secrets Manager.
2. From the secrets manager dropdown list, select the secrets manager. Structural connects to the secrets manager and retrieves a list of available secret names.
3. From the secret name dropdown list, select the secret name.

Using a connection string

To use a connection string to connect to the destination database:

Toggle Use connection string to the on position.
In the Connection String field, provide the connection string.
For the password, you can either specify the password manually, or you can select a secret name from a secrets manager. The selected secret must store a password. The secrets manager option only displays if at least one secrets manager is configured. For information about configuring the available secrets managers, go to Configuring secrets managers for database connections. To enter the password manually:
1. Click Provide Password.
2. In the password field, enter the password.
To use a secret name from a secrets manager:
1. Click Use Secrets Manager.
2. From the secrets manager dropdown list, select the secrets manager. Structural connects to the secrets manager and retrieves a list of available secret names.
3. From the secret name dropdown list, select the secret name.

The connection string uses the following format:

account=<account>;host=<account>.<region>.snowflakecomputing.com;user=<username>;db=<database>

Using key pair authentication for the connection

Instead of providing a password, you can use key pair authentication.

To do this:

Toggle Use Key Pair Authentication to the on position.
Expand the Key Pair Authentication Settings.
For RSA Private Key, click Browse, then select the key file.
If the key is encrypted, then in the Encrypted Key Passphrase field, provide the passphrase to use to decrypt the key.

Testing the destination database connection

To test the connection to the destination database, click Test Destination Connection.

Indicating whether to trust the server certificate

To trust the server certificate, and ignore the certificate authority's revocation list, toggle Trust Server Certificate to the on position.

This option can be useful when your Structural instance cannot connect to the certificate authority.

Enabling a proxy connection

You can use a proxy server to connect to the destination database.

When you use a connection string to connect to the destination database, Structural adds the proxy connection parameters to the connection string.

If you manually include proxy connection parameters in the connection string, and also configure the proxy connection settings, the connection string will have duplicate proxy connection parameters.

We recommend that you use the configuration fields to enable the proxy connection, and do not include proxy connection parameters in the connection string.

To enable and configure the proxy connection:

Toggle Enable proxy connection to the on position.
In the Proxy Host field, provide the host name for the proxy connection.
In the Proxy Port field, provide the port for the proxy connection.
Optionally, in the Proxy User field, provide the name of the user for the proxy connection.
If you provide a proxy user, then in the Proxy Password field, provide the password for the specified user.
Optionally, in the Non-Proxy Hosts field, provide the list of hosts for which to bypass the proxy server and connect to directly.

Use a pipe symbol (|) to separate the host names. For example, host1|host2|host3.

You can also use an asterisk (*) as a wildcard. For example, to connect directly to all hosts whose host names start with myhost, use myhost*.

Setting the storage location for temporary files

During data generation, Structural uses temporary CSV files to load and unload Snowflake tables.

For Lambda processing, you specify a single S3 bucket path.

If you do not use Lambda processing, then you can either:

Use external stages instead of S3 buckets.
Provide separate paths for the source and destination files.

Setting the type of storage to use

By default, the temporary files are stored in S3 buckets.

To instead use external stages, toggle Use External Stage to the on position.

The Use External Stage toggle does not display if Enable Lambda Generation is on.

Enabling separate paths for source and destination files

By default, you provide a single S3 bucket path or external stage. Within that path:

Structural copies the files that contain the source data into an input folder.
After it applies the generators, Structural copies the files that contain the destination data into an output folder.

To instead provide separate paths for the source and destination files, toggle Use Separate Destination Location to the on position.

The Use Separate Destination Location toggle does not display if Enable Lambda Generation is on.

Setting S3 bucket locations

If Use Separate Destination Location is off, then in the S3 Bucket Path field, specify the S3 bucket.

If Use Separate Destination Location is on, then:

In the Source S3 Bucket field, enter the path to the S3 bucket to use for the source files.
In the Destination S3 Bucket field, enter the path to the S3 bucket to use for the destination files.

Setting external stage locations

If Use External Stage is on, then you provide external stage locations instead of S3 buckets. For each stage, the format is:

<database>.<schema>.<stage>

Where:

<database> is the name of the database where the stage is located.
<schema> is the name of the schema that contains the stage.
<stage> is the name of the stage.

If Use Separate Destination Location is off, then in the Source Snowflake External Stage Name field, enter the external stage.

If Use Separate Destination Location is on, then:

in the Source Snowflake External Stage Name field, enter the external stage to use for the source files.
In the Destination Snowflake External Stage Name field, enter the external stage to use for the destination files.

Providing AWS credentials for the storage locations

For each S3 bucket or external stage, you can optionally provide specific AWS credentials.

If you do not provide credentials in the workspace configuration, then Structural uses either:

The credentials set in the following environment settings:
- TONIC_AWS_ACCESS_KEY_ID - An AWS access key that is associated with an IAM user or role.
- TONIC_AWS_SECRET_ACCESS_KEY - The secret key that is associated with the access key.
- TONIC_AWS_REGION - The AWS Region to send the authentication request to.
The credentials for the IAM role on the host machine.
The credentials in a credentials file.

To provide the credentials:

For the S3 bucket or external stage, to display the credentials field, click AWS Credentials.
In the AWS Access Key field, enter the AWS access key that is associated with an IAM user or role.
In the AWS Secret Key field, enter the secret key that is associated with the access key.
From the AWS Region dropdown list, select the AWS Region to send the authentication request to.

Configuring Snowflake on AWS workspace data connections

In the workspace configuration, under Connection Type, select Snowflake.

In the Source Settings section, under Snowflake Type, click AWS.

Connecting to the source database

In the Source Settings section, provide the details for the connection to the source database.

Providing the connection details

To connect to the source database, you can either:

Populate the connection fields.
Use a connection string.

You can also use key pair authentication instead of a password.

Populating the connection fields

By default, Use connection string is off, and you provide the connection values in the individual fields:

In the Server field, provide the server where the database is located. You must provide the full path to the server. https:// is optional. So the format of the server value can be either:
- <account>.<region>.snowflakecomputing.com
- https://<account>.<region>.snowflakecomputing.com
For example: abc123456.us-east-1.snowflakecomputing.com or https://abc123456.us-east-1.snowflakecomputing.com
In the Database field, provide the name of the database.
In the Username field, provide the username for the account to use to connect to the database.
For the user password, you can either specify the password manually, or you can select a secret name from a secrets manager. The selected secret must store a password. The secrets manager option only displays if at least one secrets manager is configured. For information about configuring the available secrets managers, go to Configuring secrets managers for database connections. To enter the password manually:
1. Click Provide Password.
2. In the password field, enter the password.
To use a secret name from a secrets manager:
1. Click Use Secrets Manager.
2. From the secrets manager dropdown list, select the secrets manager. Structural connects to the secrets manager and retrieves a list of available secret names.
3. From the secret name dropdown list, select the secret name.

Using a connection string

To use a connection string to connect to the source database:

Toggle Use connection string to the on position.
In the Connection String field, provide the connection string.
For the password, you can either specify the password manually, or you can select a secret name from a secrets manager. The selected secret must store a password. The secrets manager option only displays if at least one secrets manager is configured. For information about configuring the available secrets managers, go to Configuring secrets managers for database connections. To enter the password manually:
1. Click Provide Password.
2. In the password field, enter the password.
To use a secret name from a secrets manager:
1. Click Use Secrets Manager.
2. From the secrets manager dropdown list, select the secrets manager. Structural connects to the secrets manager and retrieves a list of available secret names.
3. From the secret name dropdown list, select the secret name.

The connection string uses the following format:

account=<account>;host=<account>.<region>.snowflakecomputing.com;user=<username>;db=<database>

Using key pair authentication for the connection

Instead of providing a password, you can use key pair authentication.

To do this:

Toggle Use Key Pair Authentication to the on position.
Expand the Key Pair Authentication Settings.
For RSA Private Key, click Browse, then select the key file.
If the key is encrypted, then in the Encrypted Key Passphrase field, provide the passphrase to use to decrypt the key.

Indicating whether to trust the server certificate

To trust the server certificate, and ignore the certificate authority's revocation list, toggle Trust Server Certificate to the on position.

This option can be useful when your Tonic Structural instance cannot connect to the certificate authority.

Enabling a proxy connection

You can use a proxy server to connect to the source database.

When you use a connection string to connect to the source database, Structural automatically adds the configured proxy connection parameters to the connection string.

If you manually include proxy connection parameters in the connection string, and also configure the proxy connection settings, the connection string will have duplicate proxy connection parameters.

We recommend that you use the configuration fields to enable the proxy connection, and do not include proxy connection parameters in the connection string.

To use a proxy server to connect to the source database:

Toggle Enable proxy connection to the on position.
In the Proxy Host field, provide the host name for the proxy connection.
In the Proxy Port field, provide the port for the proxy connection.
Optionally, in the Proxy User field, provide the name of the user for the proxy connection.
If you provide a proxy user, then in the Proxy Password field, provide the password for the specified user.
Optionally, in the Non-Proxy Hosts field, provide the list of hosts for which to bypass the proxy server and connect to directly.

Use a pipe symbol (|) to separate the host names. For example, host1|host2|host3.

You can also use an asterisk (*) as a wildcard. For example, to connect directly to all hosts with host names that start with myhost, use myhost*.

Limiting the included schemas

By default, the source database includes all of the schemas. To specify a list of specific schemas to either include or exclude:

Toggle Limit Schemas to the on position.
From the filter option dropdown list, select whether to include or exclude the listed schemas.
In the field, provide the list of schemas to either include or exclude. Use commas or semicolons to separate the schemas.

Do not exclude schemas that are referred to by included schemas, unless you create those schemas manually outside of Structural.

Testing the source connection

To test the connection to the source database, click Test Source Connection.

Blocking data generation on all schema changes

By default, data generation is not blocked for schema changes that do not conflict with your workspace configuration.

Enabling Lambda processing

The default data generation process for Snowflake on AWS cannot scale to extremely large volumes of data. For volumes of hundreds of gigabytes or larger, you must use the Lambda-based processing.

To enable Lambda processing, toggle Enable Lambda generation to the on position.

If you enable Lambda processing, make sure that you grant the required permissions to the IAM role, and complete the other required configuration.

Connecting to the destination database

In the Destination Settings section, you specify the connection information for the destination database.

If the destination database is in a different location, then you can either:

Populate the connection fields.
Use a connection string.

You can also use key pair authentication instead of a password.

Copying the source database connection details

To copy the connection details from the source database:

Click Copy Settings from Source.
For the user password, you can either specify the password manually, or you can select a secret name from a secrets manager. The selected secret must store a password. The secrets manager option only displays if at least one secrets manager is configured. For information about configuring the available secrets managers, go to Configuring secrets managers for database connections. To enter the password manually:
1. Click Provide Password.
2. In the password field, enter the password.
To use a secret name from a secrets manager:
1. Click Use Secrets Manager.
2. From the secrets manager dropdown list, select the secrets manager. Structural connects to the secrets manager and retrieves a list of available secret names.
3. From the secret name dropdown list, select the secret name.
To test the connection to the destination database, click Test Destination Connection.

Providing destination database connection details

If you do not copy the details from the source database, then you can either populate the connection fields or use a connection string.

Populating the connection fields

By default, Use connection string is off, and you provide the connection values in the individual fields:

In the Server field, provide the server where the database is located. You must provide the full path to the server. The https:// is optional. So the format of the server value can be either:
- <account>.<region>.snowflakecomputing.com
- https://<account>.<region>.snowflakecomputing.com
For example: abc123456.us-east-1.snowflakecomputing.com or https://abc123456.us-east-1.snowflakecomputing.com
In the Database field, provide the name of the database.
In the Username field, provide the username for the account to use to connect to the database.
For the user password, you can either specify the password manually, or you can select a secret name from a secrets manager. The selected secret must store a password. The secrets manager option only displays if at least one secrets manager is configured. For information about configuring the available secrets managers, go to Configuring secrets managers for database connections. To enter the password manually:
1. Click Provide Password.
2. In the password field, enter the password.
To use a secret name from a secrets manager:
1. Click Use Secrets Manager.
2. From the secrets manager dropdown list, select the secrets manager. Structural connects to the secrets manager and retrieves a list of available secret names.
3. From the secret name dropdown list, select the secret name.

Using a connection string

To use a connection string to connect to the destination database:

Toggle Use connection string to the on position.
In the Connection String field, provide the connection string.
For the password, you can either specify the password manually, or you can select a secret name from a secrets manager. The selected secret must store a password. The secrets manager option only displays if at least one secrets manager is configured. For information about configuring the available secrets managers, go to Configuring secrets managers for database connections. To enter the password manually:
1. Click Provide Password.
2. In the password field, enter the password.
To use a secret name from a secrets manager:
1. Click Use Secrets Manager.
2. From the secrets manager dropdown list, select the secrets manager. Structural connects to the secrets manager and retrieves a list of available secret names.
3. From the secret name dropdown list, select the secret name.

The connection string uses the following format:

account=<account>;host=<account>.<region>.snowflakecomputing.com;user=<username>;db=<database>

Using key pair authentication for the connection

Instead of providing a password, you can use key pair authentication.

To do this:

Toggle Use Key Pair Authentication to the on position.
Expand the Key Pair Authentication Settings.
For RSA Private Key, click Browse, then select the key file.
If the key is encrypted, then in the Encrypted Key Passphrase field, provide the passphrase to use to decrypt the key.

Testing the destination database connection

To test the connection to the destination database, click Test Destination Connection.

Indicating whether to trust the server certificate

To trust the server certificate, and ignore the certificate authority's revocation list, toggle Trust Server Certificate to the on position.

This option can be useful when your Structural instance cannot connect to the certificate authority.

Enabling a proxy connection

You can use a proxy server to connect to the destination database.

When you use a connection string to connect to the destination database, Structural adds the proxy connection parameters to the connection string.

If you manually include proxy connection parameters in the connection string, and also configure the proxy connection settings, the connection string will have duplicate proxy connection parameters.

We recommend that you use the configuration fields to enable the proxy connection, and do not include proxy connection parameters in the connection string.

To enable and configure the proxy connection:

Toggle Enable proxy connection to the on position.
In the Proxy Host field, provide the host name for the proxy connection.
In the Proxy Port field, provide the port for the proxy connection.
Optionally, in the Proxy User field, provide the name of the user for the proxy connection.
If you provide a proxy user, then in the Proxy Password field, provide the password for the specified user.
Optionally, in the Non-Proxy Hosts field, provide the list of hosts for which to bypass the proxy server and connect to directly.

Use a pipe symbol (|) to separate the host names. For example, host1|host2|host3.

You can also use an asterisk (*) as a wildcard. For example, to connect directly to all hosts whose host names start with myhost, use myhost*.

Setting the storage location for temporary files

During data generation, Structural uses temporary CSV files to load and unload Snowflake tables.

For Lambda processing, you specify a single S3 bucket path.

If you do not use Lambda processing, then you can either:

Use external stages instead of S3 buckets.
Provide separate paths for the source and destination files.

Setting the type of storage to use

By default, the temporary files are stored in S3 buckets.

To instead use external stages, toggle Use External Stage to the on position.

The Use External Stage toggle does not display if Enable Lambda Generation is on.

Enabling separate paths for source and destination files

By default, you provide a single S3 bucket path or external stage. Within that path:

Structural copies the files that contain the source data into an input folder.
After it applies the generators, Structural copies the files that contain the destination data into an output folder.

To instead provide separate paths for the source and destination files, toggle Use Separate Destination Location to the on position.

The Use Separate Destination Location toggle does not display if Enable Lambda Generation is on.

Setting S3 bucket locations

If Use Separate Destination Location is off, then in the S3 Bucket Path field, specify the S3 bucket.

If Use Separate Destination Location is on, then:

In the Source S3 Bucket field, enter the path to the S3 bucket to use for the source files.
In the Destination S3 Bucket field, enter the path to the S3 bucket to use for the destination files.

Setting external stage locations

If Use External Stage is on, then you provide external stage locations instead of S3 buckets. For each stage, the format is:

<database>.<schema>.<stage>

Where:

<database> is the name of the database where the stage is located.
<schema> is the name of the schema that contains the stage.
<stage> is the name of the stage.

If Use Separate Destination Location is off, then in the Source Snowflake External Stage Name field, enter the external stage.

If Use Separate Destination Location is on, then:

in the Source Snowflake External Stage Name field, enter the external stage to use for the source files.
In the Destination Snowflake External Stage Name field, enter the external stage to use for the destination files.

Providing AWS credentials for the storage locations

For each S3 bucket or external stage, you can optionally provide specific AWS credentials.

If you do not provide credentials in the workspace configuration, then Structural uses either:

The credentials set in the following environment settings:
- TONIC_AWS_ACCESS_KEY_ID - An AWS access key that is associated with an IAM user or role.
- TONIC_AWS_SECRET_ACCESS_KEY - The secret key that is associated with the access key.
- TONIC_AWS_REGION - The AWS Region to send the authentication request to.
The credentials for the IAM role on the host machine.
The credentials in a credentials file.

To provide the credentials:

For the S3 bucket or external stage, to display the credentials field, click AWS Credentials.
In the AWS Access Key field, enter the AWS access key that is associated with an IAM user or role.
In the AWS Secret Key field, enter the secret key that is associated with the access key.
From the AWS Region dropdown list, select the AWS Region to send the authentication request to.