Tonic Structural supports Spark 2.4.x, Spark 3.0.0, Spark 3.0.1, and Spark 3.2.0. However, note that Spark 2.4.2 is not supported.
We suggest using EMR-6.1.0 or EMR-6.2.0 with Spark 3.0.0 or Spark 3.0.1, respectively. Any version between 5.2.8 and 6.0.2 should work.
Structural supports the following data providers:
Structural requires a metadata catalog when connecting to your data. Currently only AWS Glue is supported when working with Amazon EMR.
Structural writes data to Amazon S3 only. Structural does not write output data back into a catalog.
If your S3 buckets have server side encryption enabled via AWS KMS, then your Spark cluster must have Hadoop 2.8.1+ installed.
Source Provider | Output Provider |
---|---|
Parquet
Parquet
CSV
CSV
Avro
Avro
JSON
JSON
ORC
ORC