Structural process overviews for Snowflake on AWS

The following high-level diagrams describe how Tonic Structural orchestrates the processing and moving of data in a Snowflake on AWS.

Structural manages the lifetimes of data and resources used in AWS. It only requires you to assign the necessary permissions to the IAM role that Structural uses.

Default process

By default, Structural uses the following data generation process:

At a high level:

Structural copies the table data into either an S3 bucket or an external stage as CSV files. You specify the S3 bucket path or stage in the Structural workspace configuration. If you use a single location for both source and destination data, Structural copies the data files into an input folder.
Structural applies the configured generators to the data in the files, then writes the resulting files to an S3 bucket or external stage. If you use a single location for both source and destination data, Structural copies the data files into an output folder.
After it processes all of the files, Structural copies the data from the S3 bucket or external stage into the Snowflake destination database.

Lambda process (for extremely high data volumes)

The default process cannot support data generation on data volumes that are hundreds of gigabytes or larger.

For those larger data volumes, the data generation process can use a Lambda function, Amazon S3, and Amazon SQS events:

Structural creates a Lambda function for your version of Structural. This occurs once for each version of Structural. The Lambda function is created when you run your first data generation job after you install or update Structural.
Structural creates an Amazon SQS queue and Amazon S3 event triggers. This occurs once for each job. The resource names are scoped to the specific data generation job.
Structural copies table data into Amazon S3 as CSV files. You specify the S3 bucket path in the Structural workspace configuration. Within the S3 bucket, Structural copies the data files into an input folder.
As files land in Amazon S3, Amazon S3 event notifications place messages in Amazon SQS. Messages in Amazon SQS trigger Lambda function invocations. By default, each file placed in Amazon S3 has a maximum file size of 16MB. Each Lambda invocation processes a single file. Lambda processes each file and writes the files to an output folder in the S3 bucket.
After it processes all of files for a table, Structural copies the data back into Snowflake, into the destination database.
After it processes all of the tables, Structural removes ephemeral AWS components such as the event notifications for Amazon SQS and Amazon S3.

Last updated 7 months ago

Was this helpful?