Tonic Structural supports operating on AWS Glue catalogs in AWS accounts different from where Structural and Amazon EMR are configured.
For the instructions in this topic, we'll use the following example:
AWS Account A contains the Amazon EMR Cluster, Athena workgroup, and destination S3 bucket.
AWS Account B contains the AWS Glue data catalog and source S3 bucket.
The following instructions explain how to set up each required AWS component. For cross-account setups, you use these instructions instead of the instructions in Creating IAM roles for Structural and Amazon EMR.
These instructions assume that both accounts reside in the same region. If your accounts belong to different regions, then see the Amazon documentation for instructions on how to set up a VPC for cross-account access.
Granting access to the required resources
The account that has Structural and Amazon EMR must be granted accesses to the resources for the account that has the AWS Glue catalog.
To continue our example, you must first grant Account A access to Account B's resources. To do this, set up the following resource-based policies for Account B's AWS Glue data catalog and source S3 bucket.
When you create your Amazon EMR cluster, make sure to enable the Use AWS Glue Data Catalog for table metadata option. This allows you to set a default catalog ID that points to Account B.
You must set the following configuration for all instance groups in the Amazon EMR cluster:
Identifying the profile that has the Amazon EC2 instance role
The profile is the Amazon EC2 instance profile that you assigned as the value of EC2 instance profile when you created the Amazon EMR cluster.
Required permissions for the Amazon EC2 instance role
By default, a new Amazon EMR cluster is assigned the role EMR_EC2_DefaultRole, which contains all of the required permissions, plus additional permissions.
However, AWS recommends that you create a custom IAM role for your Amazon EMR cluster's Amazon EC2 instance profile role.
The following permissions reflect the minimum permissions needed for Structural data generation: