System requirements for Amazon EMR
Supported versions of Spark and Amazon EMR
Tonic Structural supports Spark 2.4.x, Spark 3.0.0, Spark 3.0.1, and Spark 3.2.0. However, note that Spark 2.4.2 is not supported.
We suggest using EMR-6.1.0 or EMR-6.2.0 with Spark 3.0.0 or Spark 3.0.1, respectively. Any version between 5.2.8 and 6.0.2 should work.
Supported providers
Structural supports the following data providers:
Source Provider | Output Provider |
---|---|
Parquet | Parquet |
CSV | CSV |
Avro | Avro |
JSON | JSON |
ORC | ORC |
Metadata catalog
Structural requires a metadata catalog when connecting to your data. Currently only AWS Glue is supported when working with Amazon EMR.
Structural writes data to Amazon S3 only. Structural does not write output data back into a catalog.
Amazon S3 server side encryption requirements
If your S3 buckets have server side encryption enabled via AWS KMS, then your Spark cluster must have Hadoop 2.8.1+ installed.
Last updated