Structural process overviews for Snowflake on AWS
Last updated
Last updated
The following high-level diagrams describe how Tonic Structural orchestrates the processing and moving of data in a Snowflake on AWS.
Structural manages the lifetimes of data and resources used in AWS. It only requires you to assign the necessary permissions to the IAM role that Structural uses.
By default, Structural uses the following data generation process:
At a high level:
Structural copies the table data into either an S3 bucket or an external stage as CSV files. You specify the S3 bucket path or stage in the Structural workspace configuration.
If you use a single location for both source and destination data, the data files are copied into an input
folder.
Structural applies the configured generators to the data in the files, then writes the resulting files to an S3 bucket or external stage.
If you use a single location for both source and destination data, the data files are copied into an output
folder.
After it processes all of the files, Structural copies the data from the S3 bucket or external stage into the Snowflake destination database.
That process cannot support data generation on data volumes that are hundreds of gigabytes or larger.
For those larger data volumes, you can have the data generation process use a Lambda function, Amazon S3, and Amazon SQS events:
Structural creates a Lambda function for your version of Structural. This occurs once for each version of Structural. The Lambda function is created when you run your first data generation job after you install or update Structural.
Structural creates an Amazon SQS queue and Amazon S3 event triggers. This occurs once for each job. The resource names are scoped to the specific data generation job.
Structural copies table data into Amazon S3 as CSV files. You specify the S3 bucket path in the Structural workspace configuration. Within the S3 bucket, the data files are copied into an input
folder.
As files land in Amazon S3, Amazon S3 event notifications place messages in Amazon SQS. Messages in Amazon SQS trigger Lambda function invocations.
By default, each file placed in Amazon S3 has a maximum file size of 16MB. Each Lambda invocation processes a single file. Lambda processes each file and writes the files back to Amazon S3 in an output
folder in the S3 bucket.
After it processes all of files for a table, Structural copies the data back into Snowflake, into the destination database.
After it processes all of the tables, Structural removes ephemeral AWS components such as event notifications for Amazon SQS and Amazon S3.