System requirements for Amazon EMR

Supported versions of Spark and Amazon EMR

Tonic Structural supports Spark 2.4.x, Spark 3.0.0, Spark 3.0.1, and Spark 3.2.0. However, note that Spark 2.4.2 is not supported.

We suggest using EMR-6.1.0 or EMR-6.2.0 with Spark 3.0.0 or Spark 3.0.1, respectively. Any version between 5.2.8 and 6.0.2 should work.

Supported providers

Structural supports the following data providers:

Source Provider
Output Provider

Parquet

Parquet

CSV

CSV

Avro

Avro

JSON

JSON

ORC

ORC

Metadata catalog

Structural requires a metadata catalog when connecting to your data. Currently only AWS Glue is supported when working with Amazon EMR.

Structural writes data to Amazon S3 only. Structural does not write output data back into a catalog.

Amazon S3 server side encryption requirements

If your S3 buckets have server side encryption enabled via AWS KMS, then your Spark cluster must have Hadoop 2.8.1+ installed.

Last updated