Search…
⌃K
Links

Configuration for cross-account setups

Tonic supports operating on Glue catalogs in AWS accounts different from where Tonic and Amazon EMR are configured.
For the instructions in this topic, we'll use the following example:
Tonic cross account structure
  • AWS Account A contains the Amazon EMR Cluster, Athena workgroup, and destination S3 bucket.
  • AWS Account B contains the Glue data catalog and source S3 bucket.
The following instructions explain how to set up each required AWS component. For cross-account setups, you use these instructions instead of the instructions in Creating IAM roles for Tonic and Amazon EMR.
These instructions assume that both accounts reside in the same region. If your accounts belong to different regions, then see the Amazon documentation for instructions on how to set up a VPC for cross-account access.

Granting access to the required resources

The account that has Tonic and Amazon EMR must be granted accesses to the resources for the account that has the Glue catalog.
To continue our example, you must first grant Account A access to Account B's resources. To do this, set up the following resource-based policies for Account B's Glue data catalog and source S3 bucket.

Account B Glue data catalog resource policy

{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Principal": {
"AWS": [
"arn:aws:iam::<account-A-id>:role/<tonic-role>",
"arn:aws:iam::<account-A-id>:role/<emr-ec2-instance-profile-role>"
]
},
"Action": [
"glue:GetUserDefinedFunctions",
"glue:BatchGetPartition",
"glue:GetDatabase",
"glue:GetDatabases",
"glue:GetPartition",
"glue:GetPartitions",
"glue:GetTable",
"glue:GetTables",
"glue:GetTableVersion",
"glue:GetTableVersions"
],
"Resource": [
"arn:aws:glue:<region>:<account-B-id>:catalog",
"arn:aws:glue:<region>:<account-B-id>:database/*",
"arn:aws:glue:<region>:<account-B-id>:table/*"
]
}
]
}
Register Account B's glue data catalog as an Athena data source inside Account A's Athena console. For instructions, see the AWS documentation.

Account B source S3 bucket bucket policy

{
"Version": "2012-10-17",
"Statement": [
{
"Sid": "Statement1",
"Effect": "Allow",
"Principal": {
"AWS": [
"arn:aws:iam::<account-A-id>:role/<tonic-role>",
"arn:aws:iam::<account-A-id>:role/<emr-ec2-instance-profile-role>"
]
},
"Action": [
"s3:ListBucket",
"s3:GetObject"
],
"Resource": [
"arn:aws:s3:::<account-B-source-bucket>",
"arn:aws:s3:::<account-B-source-bucket>/*"
]
}
]
}

Account A Amazon EMR cluster

When you create your Amazon EMR cluster, make sure to enable the Use AWS Glue Data Catalog for table metadata option. This allows you to set a default catalog ID that points to Account B.
You must set the following configuration for all instance groups in the EMR cluster:
[
{
"Classification": "spark-hive-site",
"Properties": {
"hive.metastore.client.factory.class": "com.amazonaws.glue.catalog.metastore.AWSGlueDataCatalogHiveClientFactory",
"hive.metastore.glue.catalogid": "<account-B-id>"
}
}
]

Tonic server role

Identifying the profile that has the Tonic server role

By default, Tonic uses the IAM profile that is attached to the instance where Tonic runs.
If you do not want to use that IAM profile, then to identify the profile to use:
  1. 1.
    Set the environment variable TONIC_AWS_ACCESS_KEY_ID to the AWS access key that is associated with the IAM profile.
  2. 2.
    Set the environment variable TONIC_AWS_SECRET_ACCESS_KEY to the secret key that is associated with the access key.
For information about how to set Tonic environment variables, see Setting environment variables.

Required permissions for the Tonic server role

The Tonic server role must have the the following permissions:
{
"Sid": "EmrListClustersPerms",
"Effect": "Allow",
"Action": "elasticmapreduce:ListClusters",
"Resource": "*"
},
{
"Sid": "EmrPerms",
"Effect": "Allow",
"Action": [
"elasticmapreduce:DescribeStep",
"elasticmapreduce:AddJobFlowSteps",
"elasticmapreduce:DescribeCluster"
],
"Resource": [
"arn:aws:elasticmapreduce:<region>:<account-A-id>:cluster/<cluster-id>"
]
},
{
"Sid": "CrossAccountGluePerms",
"Effect": "Allow",
"Action": [
"glue:GetUserDefinedFunctions",
"glue:BatchGetPartition",
"glue:GetDatabase",
"glue:GetDatabases",
"glue:GetPartition",
"glue:GetPartitions",
"glue:GetTable",
"glue:GetTables",
"glue:GetTableVersion",
"glue:GetTableVersions"
],
"Resource": [
"arn:aws:glue:<region>:<account-B-id>:catalog",
"arn:aws:glue:<region>:<account-B-id>:database/*",
"arn:aws:glue:<region>:<account-B-id>:table/*"
]
},
{
"Sid": "CrossAccountS3SourcePerms",
"Effect": "Allow",
"Action": [
"s3:ListBucket",
"s3:GetObject"
],
"Resource": [
"arn:aws:s3:::<account-B-source-s3-bucket>",
"arn:aws:s3:::<account-B-source-s3-bucket>/*"
]
},
{
"Sid": "S3DestPerms",
"Effect": "Allow",
"Action": [
"s3:PutObject"
],
"Resource": [
"arn:aws:s3:::<destination-s3-bucket>/*"
]
},
{
"Sid": "AthenaPerms",
"Effect": "Allow",
"Action": [
"athena:StartQueryExecution",
"athena:GetQueryExecution",
"athena:GetQueryResults"
],
"Resource": "arn:aws:athena:<region>:<account-A-id>:workgroup/tonic-emr-workgroup"
},
{
"Sid": "CrossAccountAthenaPerms",
"Effect": "Allow",
"Action": [
"athena:GetDataCatalog"
],
"Resource": "arn:aws:athena:<region>:<account-A-id>:datacatalog/<catalog-name>"
},
{
"Sid": "AthenaQueryResultPerms",
"Effect": "Allow",
"Action": [
"s3:GetBucketLocation",
"s3:GetObject",
"s3:ListBucket",
"s3:ListBucketMultipartUploads",
"s3:ListMultipartUploadParts",
"s3:AbortMultipartUpload",
"s3:CreateBucket",
"s3:PutObject"
],
"Resource": [
"arn:aws:s3:::<athena-query-results-bucket>",
"arn:aws:s3:::<athena-query-results-bucket>/*"
]
}

EC2 instance profile role

Identifying the profile that has the EC2 instance role

The profile is the EC2 instance profile that you assigned as the value of EC2 instance profile when you created the Amazon EMR cluster.

Required permissions for the EC2 instance role

By default, a new EMR cluster is assigned the role EMR_EC2_DefaultRole, which contains all of the required permissions, plus additional permissions.
However, AWS recommends that you create a custom IAM role for your EMR cluster's EC2 instance profile role.
The following permissions reflect the minimum permissions needed for Tonic data generation:
{
"Sid": "GluePerms",
"Effect": "Allow",
"Action": [
"glue:CreateDatabase",
"glue:UpdateDatabase",
"glue:DeleteDatabase",
"glue:GetDatabase",
"glue:GetDatabases",
"glue:CreateTable",
"glue:UpdateTable",
"glue:DeleteTable",
"glue:GetTable",
"glue:GetTables",
"glue:GetTableVersions",
"glue:CreatePartition",
"glue:BatchCreatePartition",
"glue:UpdatePartition",
"glue:DeletePartition",
"glue:BatchDeletePartition",
"glue:GetPartition",
"glue:GetPartitions",
"glue:BatchGetPartition",
"glue:CreateUserDefinedFunction",
"glue:UpdateUserDefinedFunction",
"glue:DeleteUserDefinedFunction",
"glue:GetUserDefinedFunction",
"glue:GetUserDefinedFunctions"
],
"Resource": "*"
},
{
"Sid": "CrossAccountS3SourceBucketPerms",
"Effect": "Allow",
"Action": [
"s3:ListBucket",
"s3:GetObject"
],
"Resource": [
"arn:aws:s3:::<account-B-source-s3-bucket>",
"arn:aws:s3:::<account-B-source-s3-bucket>/*"
]
},
{
"Sid": "S3DestBucketPerms",
"Effect": "Allow",
"Action": [
"s3:ListBucket",
"s3:GetObject",
"s3:PutObject"
],
"Resource": [
"arn:aws:s3:::<destination-s3-bucket>",
"arn:aws:s3:::<destination-s3-bucket>/*"
]
},
{
"Sid": "S3EmrLogBucketPerms",
"Effect": "Allow",
"Action": "s3:PutObject",
"Resource": [
"arn:aws:s3:::<s3-emr-log-bucket>/*"
]
}