1 of 10

Amazon Redshift

Amazon Redshift is a cloud-based data warehouse service.

Tonic Structural can move data from one database to another within a single Amazon Redshift instance. Structural can also move data between Amazon Redshift instances.

In both cases, Structural uses Amazon S3 as an intermediate stage to host both the source data and the destination data.

Structural process overview for Amazon Redshift

The following high-level diagram describes how Tonic Structural orchestrates the processing and moving of data in Amazon Redshift.

This diagram is not the same as the Structural architectural diagram.

Structural orchestrates the moving and transforming of data between Amazon Redshift databases. To do this, Structural uses the Amazon S3, Amazon SQS, and AWS Lambda services.

Structural manages the lifetimes of data and resources used in AWS. It only requires you to assign the necessary permissions to the IAM role that Structural uses.

At a high level, the process is:

Structural creates a Lambda function for your version of Structural. This step is performed once per version of Structural. The Lambda function is created when you run your first data generation job after you install or update Structural.
Structural creates an Amazon SQS queue and Amazon S3 event triggers. This is done once for each data generation job. The resource names are scoped to your specific generation job.
Structural copies the table data into Amazon S3 as CSV files. You specify the S3 bucket path in the Structural workspace configuration. Within the S3 bucket, the data files are copied into an input folder.
As files land in Amazon S3, Amazon S3 event notifications place messages in Amazon SQS. Messages in Amazon SQS trigger Lambda function invocations. By default, each file placed in Amazon S3 has a maximum file size of 50MB. Each Lambda invocation processes a single file. Lambda processes each file and then writes them back to Amazon S3 in an output folder in the S3 bucket.
After it processes all of the files for a table, Structural copies data back into Amazon Redshift, into the destination database.
After it processes all of the tables, Structural removes ephemeral AWS components such as Amazon SQS and Amazon S3 event notifications.

Structural differences and limitations with Amazon Redshift

Required license: Professional or Enterprise license.

Not available on Structural Cloud.

Table mode limitations

Amazon Redshift workspaces cannot use the following table modes:

Scale
Incremental

Generator limitations

Amazon Redshift workspaces cannot use the following generators:

Algebraic
Array JSON Mask
Array Regex Mask
Cross Table Sum
Current Date
Event Timestamps
Geo
Sequential Integer

No subsetting, but support for table filtering

Amazon Redshift workspaces do not support subsetting.

No upsert

Amazon Redshift workspaces do not support upsert.

No output to a container repository

For Amazon Redshift workspaces, you cannot write the destination data to a container repository.

No output to an Ephemeral snapshot

For Amazon Redshift workspaces, you cannot write the destination data to an Ephemeral snapshot.

Before you create an Amazon Redshift workspace

Before you create a workspace that uses the Amazon Redshift data connector, complete the configuration that is outlined in the following topics.

Required AWS instance profile permissions for Amazon Redshift

When it uses Amazon Redshift, Tonic Structural orchestrates the creation, usage, and deletion of several AWS components.

The required permissions to do so are taken from the instance profile role of the machine that runs Structural's server. This role (EC2) needs the permissions listed below.

Note that these permissions are starting point. Based on your exact AWS setup, you might need to add additional permissions. For example, if you use AWS KMS on your S3 buckets, then you might need to grant AWS Key Management Service (AWS KMS) access. Go to the instructions on how to use AWS KMS encryption with Structural's programmatically generated Amazon SQS queues.

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Action": [
                "sqs:ListQueues",
                "lambda:CreateEventSourceMapping",
                "lambda:DeleteEventSourceMapping",
                "lambda:GetEventSourceMapping",
                "lambda:ListFunctions",
                "ecr:DescribeRepositories",
                "ecr:ListImages"
            ],
            "Resource": "*"
        },
        {
            "Effect": "Allow",
            "Action": [
                "sqs:ChangeMessageVisibility",
                "sqs:CreateQueue",
                "sqs:DeleteMessage",
                "sqs:DeleteQueue",
                "sqs:GetQueueAttributes",
                "sqs:GetQueueUrl",
                "sqs:ListDeadLetterSourceQueues",
                "sqs:ListQueueTags",
                "sqs:ReceiveMessage",
                "sqs:SendMessage",
                "sqs:SetQueueAttributes",
                "s3:ListBucket",
                "s3:GetBucketNotification",
                "s3:PutBucketNotification",
                "lambda:CreateFunction",
                "lambda:GetFunctionConfiguration",
                "lambda:UpdateFunctionConfiguration"
            ],
            "Resource": [
                "arn:aws:sqs:*:*:tonic-*",
                "arn:aws:s3:::tonic-*",
                "arn:aws:lambda:*:*:function:tonic-*"

            ]
        },
        {
            "Effect": "Allow",
            "Action": [
                "iam:PassRole"
            ],
            "Resource": [
                <ARN to Structural Lambda Role (see below docs)>
            ]
        },
        {
            "Effect": "Allow",
            "Action": [
                "s3:PutObject",
                "s3:GetObject",
                "s3:GetObjectVersion",
                "s3:DeleteObject",
                "s3:DeleteObjectVersion"
            ],
            "Resource": "arn:aws:s3:::tonic-*/*"
        },
        {
            "Effect": "Allow",
            "Action": [
                "logs:DescribeLogStreams",
                "logs:GetQueryResults",
                "logs:GetLogEvents",
                "logs:FilterLogEvents",
                "logs:StartQuery",
                "logs:StopQuery"
            ],
            "Resource": ["arn:aws:lambda:${region}:${account_id}:log-group:/aws/lambda/tonic-*"]
        },
        {
            "Effect": "Allow",
            "Action": "s3:ListBucket",
            "Resource": "arn:aws:s3:::tonic-*",
            "Condition": {
                "StringLike": {
                    "s3:prefix": [
                        "*"
                    ]
                }
            }
        }
    ]
}

The above policy allows Structural to properly orchestrate jobs in your AWS infrastructure. It assumes that you use default names for objects in AWS, and that your source and destination S3 buckets begin with the tonic- prefix.

Setting up the AWS Lambda role for Amazon Redshift

Creating the role

The AWS Lambda function that Tonic Structural sets up requires an AWS role. The name of this role is set by the following environment setting:

TONIC_LAMBDA_ROLE

The policy for this role should look like this:

{
	"Version": "2012-10-17",
	"Statement": [{
		"Sid": "VisualEditor0",
		"Effect": "Allow",
		"Action": [
			"s3:PutObject",
			"s3:GetObject",
			"s3:ListBucket",
			"sqs:ReceiveMessage",
			"sqs:GetQueueAttributes",
			"sqs:SendMessage",
			"sqs:DeleteMessage",
			"logs:CreateLogGroup",
			"logs:PutLogEvents"
			"logs:CreateLogStream",
		],
		"Resource": [
			"arn:aws:sqs:*:<aws account id>:tonic-*",
			"arn:aws:s3:::tonic-*",
			"arn:aws:logs:*:*:*"
		]
	}]
}

The above policy grants the Lambda function the required access to Amazon SQS, Amazon S3, and CloudWatch.

This policy assumes that the S3 buckets and Amazon SQS queues that are used begin with the tonic- prefix.

Enabling Lambda to assume the role

After you create the role, you must allow the Lambda service to assume the role.

For the role, the Trust relationships in the AWS IAM role should look like the following:

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Principal": {
                "Service": "lambda.amazonaws.com"
            },
            "Action": "sts:AssumeRole"
        }
    ]
}

AWS KMS permissions for Amazon SQS message encryption

If you use AWS KMS for Amazon SQS encryption, make sure that you provided the correct key ID for the Tonic Structural environment setting TONIC_LAMBDA_KMS_MASTER_KEY.

Also provide Amazon S3 access under your AWS KMS key policy:

{
    "Sid": "Allow access for Amazon S3 Event Notifications to Amazon SQS",
    "Effect": "Allow",
    "Principal": {
        "Service": "s3.amazonaws.com"
    },
    "Action": [
        "kms:Decrypt",
        "kms:GenerateDataKey"
    ],
    "Resource": "*"
}

Additional key permissions must be added to your Amazon EC2 and Lambda roles:

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Action": [
                "kms:Decrypt",
                "kms:Encrypt",
                "kms:GenerateDataKey"
            ],
            "Resource": "<ARN to AWS KMS key>"
        }
    ]
}

Amazon Redshift-specific Structural environment settings

Tonic Structural allows you to set several Amazon Redshift-specific environment settings that make it easier to adapt our Amazon Redshift integration into your specific AWS environment.

You configure these settings in the Structural worker container.

# No default value
# This setting is required to be set by user
# ARN of AWS Role to be assumed by Tonic's Lambda function
TONIC_LAMBDA_ROLE

# Default value of 30 secs
# Timeout of Lambda used to process data files
# Maximum allowed duration of Lambda function is 15 min
TONIC_LAMBDA_TIMEOUT

# Default value of 1024MB
# Memory limit of Lambda used to process data files
# Maximum allowed memory of Lambda function is 10240 MB
TONIC_LAMBDA_MEMORY_SIZE

# Default value of 30 secs
# Visibility of SQS which stores messages sent to Lambda
# Note that this value must be >= TONIC_LAMBDA_TIMEOUT
TONIC_LAMBDA_SQS_VISIBILITY_TIMEOUT

# No default value
# This setting is required to be set by user if using AWS KMS encryption
# AWS KMS Key ID for encrypting messages sent to Amazon SQS
TONIC_LAMBDA_KMS_MASTER_KEY

Source and destination database permissions for Amazon Redshift

User permissions on the source database

The following is an example of how to create an Amazon Redshift user with the permissions needed to connect to Tonic Structural.

We recommend that you use a backup as your source database instead of connecting directly to your production environment.

If your database contains additional schemas that are included, then you must also run the same commands for those schemas.

--create user
CREATE USER tonic_user WITH PASSWORD 'tonic_password';

--add USAGE GRANTs on all schemas in the DB 
GRANT USAGE ON SCHEMA public TO tonic_user;

--add SELECT GRANTs on all tables in each schema in the DB 
GRANT SELECT ON ALL TABLES IN SCHEMA public TO tonic_user;

User permissions on destination database

The destination database must exist before Structural can connect to it. The user provided to Structural for connecting to the destination database must be a superuser who holds ownership and privileges of all schemas and tables.

--create a superuser
CREATE USER tonic_user createuser PASSWORD 'tonic_password';

Configuring Amazon Redshift workspace data connections

During workspace creation, under Connection Type, click Redshift.

Connecting to the source database

In the Source Settings section, provide the details about the source database:

Specifying the source connection details

To provide the connection details for the source database:

In the Server field, provide the server where the database is located.
In the Database field, provide the name of the database.
In the Port field, provide the port to use to connect to the database.
In the Username field, provide the username for the account to use to connect to the database.
For the password, you can either specify the password manually, or you can select a secret name from a secrets manager. The selected secret must store a password. The secrets manager option is only available if you have configured at least one secrets manager. For information about configuring the available secrets managers, go to Configuring secrets managers for database connections. To enter the password manually:
1. Click Provide Password.
2. In the password field, enter the password.
To use a secret name from a secrets manager:
1. Click Use Secrets Manager.
2. From the secrets manager dropdown list, select the secrets manager. Structural connects to the secrets manager and retrieves a list of available secret names.
3. From the secret name dropdown list, select the secret name.
In the S3 Bucket Path field, specify the S3 bucket where Tonic Structural places temporary CSV files that it uses to load and unload Amazon Redshift tables. During data generation, CSV files containing the source data are copied to an input folder in the S3 bucket. After the generators are applied, CSV files containing the destination data are copied to an output folder in the S3 bucket.
To test the connection to the source database, click Test Source Connection.

Ensuring encryption of source database authentication

The Enable SSL/TLS setting indicates whether to encrypt source database authentication data.

By default, the toggle is in the on position. We strongly recommend that you do not turn off this setting.

Blocking data generation on all schema changes

By default, data generation is not blocked for schema changes that do not conflict with your workspace configuration.

To block data generation when there are any schema changes, regardless of whether they conflict with your workspace configuration, toggle Block data generation on schema changes to the on position.

Determining ownership in the destination database

By default, the source database owner relationships, such as schema and table ownership, are preserved in the destination database. The Preserve source database owners in destination database toggle is in the on position.

To instead have the admin user for the destination database gain ownership of the schema and tables, toggle Preserve source database owners in destination database to the off position.

Limiting the included schemas

By default, the source database includes all of the schemas. To specify a list of specific schemas to either include or exclude:

Toggle Limit Schemas to the on position.
From the filter option dropdown list, select whether to include or exclude the listed schemas.
In the field, provide the list of schemas to either include or exclude. Use commas or semicolons to separate the schemas.

Do not exclude schemas that are referred to by included schemas, unless you create those schemas manually outside of Structural.

Connecting to the destination database

In the Destination Settings section, you specify the connection information for the destination database.

Copying the source connection details

To copy the connection details from the source database:

Click Copy Settings from Source.
For the password, you can either specify the password manually, or you can select a secret name from a secrets manager. The selected secret must store a password. The secrets manager option only displays if at least one secrets manager is configured. For information about configuring the available secrets managers, go to Configuring secrets managers for database connections. To enter the password manually:
1. Click Provide Password.
2. In the password field, enter the password.
To use a secret name from a secrets manager:
1. Click Use Secrets Manager.
2. From the secrets manager dropdown list, select the secrets manager. Structural connects to the secrets manager and retrieves a list of available secret names.
3. From the secret name dropdown list, select the secret name.
To test the connection to the destination database, click Test Destination Connection.

Specifying the destination connection details

If you do not copy the source connection details, then to specify the connection information for the destination database:

In the Server field, provide the server where the database is located.
In the Database field, provide the name of the database.
In the Port field, provide the port to use to connect to the database.
In the Username field, provide the username for the account to use to connect to the database.
For the password, you can either specify the password manually, or you can select a secret name from a secrets manager. The selected secret must store a password. The secrets manager option only displays if at least one secrets manager is configured. For information about configuring the available secrets managers, go to Configuring secrets managers for database connections. To enter the password manually:
1. Click Provide Password.
2. In the password field, enter the password.
To use a secret name from a secrets manager:
1. Click Use Secrets Manager.
2. From the secrets manager dropdown list, select the secrets manager. Structural connects to the secrets manager and retrieves a list of available secret names.
3. From the secret name dropdown list, select the secret name.
To test the connection to the destination database, click Test Destination Connection.

Ensuring encryption of destination database authentication

The Enable SSL/TLS setting indicates whether to encrypt authentication data for the destination database.

By default, it is in the on position. We strongly recommend that you do not turn off this setting.

Configuring Amazon Redshift workspace data connections

During workspace creation, under Connection Type, click Redshift.

Connecting to the source database

In the Source Settings section, provide the details about the source database:

Specifying the source connection details

To provide the connection details for the source database:

In the Server field, provide the server where the database is located.
In the Database field, provide the name of the database.
In the Port field, provide the port to use to connect to the database.
In the Username field, provide the username for the account to use to connect to the database.
For the password, you can either specify the password manually, or you can select a secret name from a secrets manager. The selected secret must store a password. The secrets manager option is only available if you have configured at least one secrets manager. For information about configuring the available secrets managers, go to Configuring secrets managers for database connections. To enter the password manually:
1. Click Provide Password.
2. In the password field, enter the password.
To use a secret name from a secrets manager:
1. Click Use Secrets Manager.
2. From the secrets manager dropdown list, select the secrets manager. Structural connects to the secrets manager and retrieves a list of available secret names.
3. From the secret name dropdown list, select the secret name.
In the S3 Bucket Path field, specify the S3 bucket where Tonic Structural places temporary CSV files that it uses to load and unload Amazon Redshift tables. During data generation, CSV files containing the source data are copied to an input folder in the S3 bucket. After the generators are applied, CSV files containing the destination data are copied to an output folder in the S3 bucket.
To test the connection to the source database, click Test Source Connection.

Ensuring encryption of source database authentication

The Enable SSL/TLS setting indicates whether to encrypt source database authentication data.

By default, the toggle is in the on position. We strongly recommend that you do not turn off this setting.

Blocking data generation on all schema changes

By default, data generation is not blocked for schema changes that do not conflict with your workspace configuration.

Determining ownership in the destination database

To instead have the admin user for the destination database gain ownership of the schema and tables, toggle Preserve source database owners in destination database to the off position.

Limiting the included schemas

By default, the source database includes all of the schemas. To specify a list of specific schemas to either include or exclude:

Toggle Limit Schemas to the on position.
From the filter option dropdown list, select whether to include or exclude the listed schemas.
In the field, provide the list of schemas to either include or exclude. Use commas or semicolons to separate the schemas.

Do not exclude schemas that are referred to by included schemas, unless you create those schemas manually outside of Structural.

Connecting to the destination database

In the Destination Settings section, you specify the connection information for the destination database.

Copying the source connection details

To copy the connection details from the source database:

Click Copy Settings from Source.
For the password, you can either specify the password manually, or you can select a secret name from a secrets manager. The selected secret must store a password. The secrets manager option only displays if at least one secrets manager is configured. For information about configuring the available secrets managers, go to Configuring secrets managers for database connections. To enter the password manually:
1. Click Provide Password.
2. In the password field, enter the password.
To use a secret name from a secrets manager:
1. Click Use Secrets Manager.
2. From the secrets manager dropdown list, select the secrets manager. Structural connects to the secrets manager and retrieves a list of available secret names.
3. From the secret name dropdown list, select the secret name.
To test the connection to the destination database, click Test Destination Connection.

Specifying the destination connection details

If you do not copy the source connection details, then to specify the connection information for the destination database:

In the Server field, provide the server where the database is located.
In the Database field, provide the name of the database.
In the Port field, provide the port to use to connect to the database.
In the Username field, provide the username for the account to use to connect to the database.
For the password, you can either specify the password manually, or you can select a secret name from a secrets manager. The selected secret must store a password. The secrets manager option only displays if at least one secrets manager is configured. For information about configuring the available secrets managers, go to Configuring secrets managers for database connections. To enter the password manually:
1. Click Provide Password.
2. In the password field, enter the password.
To use a secret name from a secrets manager:
1. Click Use Secrets Manager.
2. From the secrets manager dropdown list, select the secrets manager. Structural connects to the secrets manager and retrieves a list of available secret names.
3. From the secret name dropdown list, select the secret name.
To test the connection to the destination database, click Test Destination Connection.

Ensuring encryption of destination database authentication

The Enable SSL/TLS setting indicates whether to encrypt authentication data for the destination database.

By default, it is in the on position. We strongly recommend that you do not turn off this setting.