This is the recommended method to grant the cluster access to the S3 buckets.
Modifications to the Databricks instructions
Instance profile for separate source and destination S3 buckets
The instance profile definition that Databricks provides assumes that the cluster reads from and writes to the same S3 bucket.
If your source and destination S3 buckets are different, you can use an instance profile similar to the following, which separates the read and write permissions.
Replace <source-bucket> and <destination-bucket> with your S3 bucket names.
If your S3 buckets are owned by the same account in which the Databricks cluster is provisioned, you do not need an S3 bucket policy.
If your S3 buckets are in a separate account, then to allow the cluster access to the S3 buckets, you must create an S3 bucket policy as a cross-account trust relationship.
Similar to the instance profile, if you use separate S3 buckets for the source and destination, you can split the Databricks-provided definitions for the source and destination as shown in the following examples.
Source S3 bucket policy
This policy limits the instance profile to read-only (Get, List) access to the source S3 bucket.
Destination S3 bucket policy
This policy grants the instance profile both read and write access to the destination S3 bucket.
Alternatives to the instance profile
If you cannot or do not want to configure an instance profile, you can instead directly grant the cluster access to the S3 bucket.
To do this, you use your AWS Access Key and AWS Secret Access Key to set the following Spark configuration properties and values.
For some of the methods, you must set various Spark configuration properties. The Azure Databricks documentation provides Python examples that use spark.conf.set(<property>,<value>).
For Structural, you must provide these in the cluster configuration. Several of the methods recommend the use of secrets. To reference a secret, follow these instructions. You enter the Spark configuration parameters when you set up your cluster.