Structural process overview for Amazon Redshift

The data generation process for Amazon Redshift is slightly different based on whether you use the previous data generation process or the newer Data Pipeline V2 process. The Data Pipeline V2 process is used by default.

Both processes use Amazon S3 to store the data as CSV files during data generation.

Data Pipeline V2 process

The following high-level diagram describes how Tonic orchestrates the processing and moving of data in Amazon Redshift during the Data Pipeline V2 data generation process.

Data Pipeline V2 data process flow for Amazon Redshift

This diagram specifically shows the Amazon Redshift data generation process. For the Structural architecture diagram, go to Structural architecture.

Structural orchestrates the moving and transforming of data between Amazon Redshift databases. To do this, Structural uses Amazon S3.

Structural manages the lifetimes of data and resources used in AWS. It only requires you to assign the necessary permissions to the IAM role that Structural uses.

At a high level, the process is:

  1. Structural copies the table data into Amazon S3 as CSV files. You specify the S3 bucket path in the Structural workspace configuration. Within the S3 bucket, the data files are copied into an input folder.

  2. After it transforms the data in a file, Structural copies the transformed file to the output folder in the configured S3 bucket.

  3. After it processes all of the files for a table, Structural copies the output data back into Amazon Redshift, into the destination database.

Previous data generation process

The following high-level diagram describes how Structural orchestrates the processing and moving of data in Amazon Redshift during the previous data generation process.

Previous data process flow for Amazon Redshift data

This diagram specifically shows the Amazon Redshift data generation process. For the Structural architecture diagram, go to Structural architecture.

Structural orchestrates the moving and transforming of data between Amazon Redshift databases. To do this, Structural uses the Amazon S3, Amazon SQS, and AWS Lambda services.

Structural manages the lifetimes of data and resources used in AWS. It only requires you to assign the necessary permissions to the IAM role that Structural uses.

At a high level, the process is:

  1. Structural creates a Lambda function for your version of Structural. This step is performed once per version of Structural. The Lambda function is created when you run your first data generation job after you install or update Structural.

  2. Structural creates an Amazon SQS queue and Amazon S3 event triggers. This is done once for each data generation job. The resource names are scoped to your specific generation job.

  3. Structural copies the table data into Amazon S3 as CSV files. You specify the S3 bucket path in the Structural workspace configuration. Within the S3 bucket, the data files are copied into an input folder.

  4. As files land in Amazon S3, Amazon S3 event notifications place messages in Amazon SQS. Messages in Amazon SQS trigger Lambda function invocations. By default, each file placed in Amazon S3 has a maximum file size of 50MB. Each Lambda invocation processes a single file. Lambda processes each file and then writes them back to Amazon S3 in an output folder in the S3 bucket.

  5. After it processes all of the files for a table, Structural copies data back into Amazon Redshift, into the destination database.

  6. After it processes all of the tables, Structural removes ephemeral AWS components such as Amazon SQS and Amazon S3 event notifications.

Last updated

Was this helpful?