Tonic Structural supports Spark 2.4.x, Spark 3.0.0, Spark 3.0.1, and Spark 3.2.0. However, note that Spark 2.4.2 is not supported.
We suggest using EMR-6.1.0 or EMR-6.2.0 with Spark 3.0.0 or Spark 3.0.1, respectively. Any version between 5.2.8 and 6.0.2 should work.
Structural supports the following data providers:
Source Provider | Output Provider |
---|---|
Structural requires a metadata catalog when connecting to your data. Currently only AWS Glue is supported when working with Amazon EMR.
Structural writes data to Amazon S3 only. Structural does not write output data back into a catalog.
If your S3 buckets have server side encryption enabled via AWS KMS, then your Spark cluster must have Hadoop 2.8.1+ installed.
Parquet
Parquet
CSV
CSV
Avro
Avro
JSON
JSON
ORC
ORC