Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Amazon Redshift is a cloud-based data warehouse service.
Tonic Structural can move data from one database to another within a single Amazon Redshift instance. Structural can also move data between Amazon Redshift instances.
In both cases, Structural uses Amazon S3 as an intermediate stage to host both the original data and the masked data.
When it uses Amazon Redshift, Tonic Structural orchestrates the creation, usage, and deletion of several AWS components.
The required permissions to do so are taken from the Instance Profile role of the machine that runs Structural's server. This role (EC2) needs the permissions listed below.
Note that these permissions are starting point. Based off your exact AWS setup, you might need to add additional permissions. For example, you might need to grant AWS Key Management Service (AWS KMS) access if you use AWS KMS on your S3 buckets. Go to the .
The above policy allows Structural to properly orchestrate jobs in your AWS infrastructure. It assumes that you use default names for objects in AWS, and that your source and destination S3 buckets begin with the "tonic-" prefix.
The following is an example of how to create an Amazon Redshift user with the permissions needed to connect to Tonic Structural.
We recommend that you use a backup as your source database instead of connecting directly to your production environment.
If your database contains additional schemas that are included, then you must also run the same commands for those schemas.
The destination database must exist before Structural can connect to it. The user provided to Structural for connecting to the destination database must be a superuser who holds ownership and privileges of all schemas and tables.
Structural process for Amazon Redshift
How Structural data generation works with Amazon Redshift
Structural differences and limitations
Features that are unavailable or work differently for the Amazon Redshift data connector
Required Amazon Redshift configuration
Required configuration for Amazon Redshift before you create an Amazon Redshift workspace
Configure workspace data connections
Data connection settings for Amazon Redshift workspaces
During workspace creation, under Connection Type, click Redshift.
In the Source Settings section, provide the details about the source database:
To provide the connection details for the source database:
In the Server field, provide the server where the database is located.
In the Database field, provide the name of the database.
In the Port field, provide the port to use to connect to the database.
In the Username field, provide the username for the account to use to connect to the database.
In the Password field, provide the password for the specified username.
In the S3 Bucket Path field, specify the S3 bucket where Tonic Structural places temporary CSV files that it uses to load and unload Amazon Redshift tables.
During data generation, CSV files containing the source data are copied to an input
folder in the S3 bucket. After the generators are applied, CSV files containing the destination data are copied to an output
folder in the S3 bucket.
To test the connection to the source database, click Test Source Connection.
The Enable SSL/TLS setting indicates whether to encrypt source database authentication data.
By default, the toggle is in the on position. We strongly recommend that you do not turn off this setting.
By default, data generations are not blocked when schema changes do not conflict with your workspace configuration.
To block data generation when there are any schema changes, regardless of whether they conflict with your workspace configuration, toggle Block data generation on schema changes to the on position.
By default, the source database owner relationships, such as schema and table ownership, are preserved in the destination database. The Preserve source database owners in destination database toggle is in the on position.
To instead have the admin user for the destination database gain ownership of the schema and tables, toggle Preserve source database owners in destination database to the off position.
By default, the source database includes all of the schemas. To specify a list of specific schemas to either include or exclude:
Toggle Limit Schemas to the on position.
From the filter option dropdown list, select whether to include or exclude the listed schemas.
In the field, provide the list of schemas to either include or exclude. Use commas or semicolons to separate the schemas.
Do not exclude schemas that are referred to by included schemas, unless you create those schemas manually outside of Structural.
In the Destination Settings section, you specify the connection information for the destination database.
To copy the connection details from the source database:
Click Copy Settings from Source.
In the Password field, provide the password.
To test the connection to the destination database, click Test Destination Connection.
If you do not copy the source connection details, then to specify the connection information for the destination database:
In the Server field, provide the server where the database is located.
In the Database field, provide the name of the database.
In the Port field, provide the port to use to connect to the database.
In the Username field, provide the username for the account to use to connect to the database.
In the Password field, provide the password for the specified username.
To test the connection to the destination database, click Test Destination Connection.
The Enable SSL/TLS setting indicates whether to encrypt authentication data for the destination database.
By default, it is in the on position. We strongly recommend that you do not turn off this setting.
Required license: Professional or Enterprise license.
Not available on Structural Cloud.
Amazon Redshift workspaces cannot use the following table modes:
Scale
Incremental
Amazon Redshift workspaces cannot use the following generators:
Algebraic
Array JSON Mask
Array Regex Mask
Cross Table Sum
Current Date
Event Timestamps
Geo
Sequential Integer
Amazon Redshift workspaces do not support subsetting.
However, for tables that use the De-Identify table mode, you can provide a WHERE
clause to filter the table. For details, go to Using table filtering for data warehouses and Spark-based data connectors.
Amazon Redshift workspaces do not support upsert.
For Amazon Redshift workspaces, you cannot write the destination data to container artifacts.
For Amazon Redshift workspaces, you cannot write the destination data to an Ephemeral snapshot.
Tonic Structural allows you to set several Amazon Redshift-specific environment settings that make it easier to adapt our Amazon Redshift integration into your specific AWS environment. You configure these settings in the Structural worker container.
The AWS Lambda function that Tonic Structural sets up requires an AWS role. The name of this role is set by the following environment setting:
The policy for this role should look like this:
The above policy grants the Lambda function the required access to Amazon SQS, Amazon S3, and CloudWatch.
This policy assumes that the S3 buckets and Amazon SQS queues that are used begin with the tonic- prefix.
After you create the role, you must allow the Lambda service to assume the role.
For the role, the Trust relationships in the AWS IAM role should be configured to look like the following:
Required AWS instance profile permissions
Configure the required permissions for Tonic Structural to work with AWS components
Set up the AWS Lambda role
Configure the required Lambda role for the Structural Lambda function
Required KMS permissions for SQS message encryption
Needed if you use KMS for SQS encryption
Configure Structural environment settings
Environment settings that are specific to Amazon Redshift
Required database permissions
Configure the required permissions for source and destination databases
The following high-level diagram describes how Tonic Structural orchestrates the processing and moving of data in Amazon Redshift.
This diagram is not the same as the Tonic architectural diagram.
Structural orchestrates the moving and transforming of data between Redshift databases. To do this, Structural uses the Amazon S3, Amazon SQS, and AWS Lambda services in AWS.
Structural manages the lifetimes of data and resources used in AWS. It only requires you to assign the necessary permissions to the IAM role that Structural uses.
At a high level, the process is:
Structural creates a Lambda function for your version of Structural. This step is performed once per version of Structural. The Lambda function is created when you run your first data generation job after you install or update Structural.
Structural creates an Amazon SQS queue and Amazon S3 event triggers. This is done once for each data generation job. The resource names are scoped to your specific generation job.
Structural copies the table data into Amazon S3 as CSV files. You specify the S3 bucket path in the Structural workspace configuration. Within the S3 bucket, the data files are copied into an input
folder.
As files land in Amazon S3, Amazon S3 event notifications place messages in Amazon SQS.
Messages in Amazon SQS trigger Lambda function invocations. By default, each file placed in Amazon S3 has a maximum file size of 50MB. Each Lambda invocation processes a single file. Lambda processes each file and then writes them back to Amazon S3 in an output
folder in the S3 bucket.
After it processes all of the files for a table, Structural copies data back into Amazon Redshift, into the destination database.
After it processes all of the tables, Structural removes ephemeral AWS components such as Amazon SQS and Amazon S3 event notifications.
If you use AWS KMS for Amazon SQS encryption, make sure that you provided the correct key ID for the Tonic Structural environment setting TONIC_LAMBDA_KMS_MASTER_KEY
. Also provide Amazon S3 access under your AWS KMS key policy:
Additional key permissions must be added to your Amazon EC2 and Lambda roles: