Configuration for cross-account setups
Tonic Structural supports operating on AWS Glue catalogs in AWS accounts different from where Structural and Amazon EMR are configured.
For the instructions in this topic, we'll use the following example:
AWS Account A contains the Amazon EMR Cluster, Athena workgroup, and destination S3 bucket.
AWS Account B contains the AWS Glue data catalog and source S3 bucket.
Granting access to the required resources
The account that has Structural and Amazon EMR must be granted accesses to the resources for the account that has the AWS Glue catalog.
To continue our example, you must first grant Account A access to Account B's resources. To do this, set up the following resource-based policies for Account B's AWS Glue data catalog and source S3 bucket.
Account B Glue data catalog resource policy
Account B source S3 bucket bucket policy
Account A Amazon EMR cluster
When you create your Amazon EMR cluster, make sure to enable the Use AWS Glue Data Catalog for table metadata option. This allows you to set a default catalog ID that points to Account B.
Structural server role
Identifying the profile that has the Structural server role
By default, Structural uses the IAM profile that is attached to the instance where Structural runs.
If you do not want to use that IAM profile, then to identify the profile to use:
Set the environment setting
TONIC_AWS_ACCESS_KEY_ID
to the AWS access key that is associated with the IAM profile.Set the environment setting
TONIC_AWS_SECRET_ACCESS_KEY
to the secret key that is associated with the access key.
Required permissions for the Structural server role
The Structural server role must have the the following permissions:
Amazon EC2 instance profile role
Identifying the profile that has the Amazon EC2 instance role
The profile is the Amazon EC2 instance profile that you assigned as the value of EC2 instance profile when you created the Amazon EMR cluster.
Required permissions for the Amazon EC2 instance role
By default, a new Amazon EMR cluster is assigned the role EMR_EC2_DefaultRole
, which contains all of the required permissions, plus additional permissions.
However, AWS recommends that you create a custom IAM role for your Amazon EMR cluster's Amazon EC2 instance profile role.
The following permissions reflect the minimum permissions needed for Structural data generation:
For Amazon EMR, the Glue catalog must contain a default
database. If the default
database does not exist, then Amazon EMR attempts to create it.
Before you run a Structural data generation, you must either:
Ensure that the
default
catalog existsAdd
glue:CreateDatabase
to the list of permissions that are granted to this role
Structural does not otherwise require this permission, and does not explicitly attempt to create a database.
Last updated