Databricks

Tonic supports the running of Spark jobs via Databricks on AWS. To run Tonic on Databricks on Azure please reach out to [email protected]

Supported versions of Databricks

Tonic supports Spark 2.4.x and Spark 3+. However, Spark 2.4.2 is not supported. Any version of Databricks running one of the aforementioned Spark versions should be compatible, however, Tonic has only been tested against Databricks v7.4.

Supported Providers

Tonic supports the following data providers:

Source Provider

Output Provider

Parquet

Parquet

Avro

Avro

JSON

Parquet

Note that source data written in JSON will be outputted to Parquet files.

Supported Table Types

Databricks supports both MANAGED and EXTERNAL tables. MANAGED tables store all of their data within Databricks, whereas EXTERNAL tables store their data on a separate file system (often S3). Tonic can read from both table types but when writing output data will only write to EXTERNAL tables.

‚Äč