Structural process overviews for Snowflake

The following high-level diagrams describe how Tonic Structural orchestrates the processing and moving of data in Snowflake.

Snowflake on AWS

Structural manages the lifetimes of data and resources used in AWS. It only requires you to assign the necessary permissions to the IAM role that Structural uses.

Default process

By default, Structural uses the following data generation process:

Default process flow for Snowflake on AWS

At a high level:

  1. Structural copies the table data into either an S3 bucket or an external stage as CSV files. You specify the S3 bucket path or stage in the Structural workspace configuration. If you use a single location for both source and destination data, Structural copies the data files into an input folder.

  2. Structural applies the configured generators to the data in the files, then writes the resulting files to an S3 bucket or external stage. If you use a single location for both source and destination data, Structural copies the data files into an output folder.

  3. After it processes all of the files, Structural copies the data from the S3 bucket or external stage into the Snowflake destination database.

Lambda process (for extremely high data volumes)

Lambda-based process flow for Snowflake on AWS

The default process cannot support data generation on data volumes that are hundreds of gigabytes or larger.

For those larger data volumes, the data generation process can use a Lambda function, Amazon S3, and Amazon SQS events:

  1. Structural creates a Lambda function for your version of Structural. This occurs once for each version of Structural. The Lambda function is created when you run your first data generation job after you install or update Structural.

  2. Structural creates an Amazon SQS queue and Amazon S3 event triggers. This occurs once for each job. The resource names are scoped to the specific data generation job.

  3. Structural copies table data into Amazon S3 as CSV files. You specify the S3 bucket path in the Structural workspace configuration. Within the S3 bucket, Structural copies the data files into an input folder.

  4. As files land in Amazon S3, Amazon S3 event notifications place messages in Amazon SQS. Messages in Amazon SQS trigger Lambda function invocations. By default, each file placed in Amazon S3 has a maximum file size of 16MB. Each Lambda invocation processes a single file. Lambda processes each file and writes the files to an output folder in the S3 bucket.

  5. After it processes all of files for a table, Structural copies the data back into Snowflake, into the destination database.

  6. After it processes all of the tables, Structural removes ephemeral AWS components such as the event notifications for Amazon SQS and Amazon S3.

Snowflake on Azure

Structural orchestrates the moving and transforming of data between Snowflake databases that are hosted on Azure. Structural uses Azure Blob Storage for interim storage of files that contain the source and destination data.

Tonic Structural data generation for Snowflake on Azure

At a high level, the data generation process is:

  1. Structural copies the table data from the Snowflake database to files in Azure Blob Storage. You specify the container path in the Structural workspace configuration. Structural places the files in an input folder within the container path.

  2. Structural applies the configured generators to the data in the files, then writes the resulting files to an output folder in the container path.

  3. As it finishes processing each file, Structural copies the data from the container path’s output folder into the Snowflake destination database.

Last updated

Was this helpful?