Configuring Databricks workspace data connections
During workspace creation, under Connection Type, select Databricks.
In the Source Server section, in the Database Name field, provide the name of the source database.
For Databricks workspaces, you can provide where clauses to filter tables. See Applying a filter to tables.
The Enable partition filter validation toggle indicates whether Tonic should validate those filters when you create them.
By default, the setting is in the on position, and Tonic validates the filters. To disable the validation, toggle Enable partition filter validation to the off position.
By default, data generation is not blocked as long as schema changes do not conflict with your workspace configuration.
To block data generation when there are any schema changes, regardless of whether they conflict with your workspace configuration switch Block data generation on schema changes to the on position.
In the Databricks Cluster section, you provide the connection information for the cluster.
- 1.Under Databricks Type, select whether to use Databricks on AWS or Azure Databricks.
- 2.In the API Token field, provide the API token for Databricks. For information on how to generate an API token, see the Databricks documentation.
- 3.In the Host URL field, provide the URL for the cluster host.
- 4.In the HTTP Path field, provide the path to the cluster.
- 5.In the Port field, provide the port to use to access the cluster.
- 6.By default, data generation jobs run on the specified cluster. To instead run data generation jobs on an ephemeral Databricks job cluster:
- 1.Toggle Use Databricks Job Cluster to the on position.
- 2.In the Cluster Information text area, provide the details for the job cluster.
- 7.To test the connection to the cluster, click Test Cluster Connection.
In the Destination Server section, you specify where Tonic writes the destination database:
- 1.Under Output Storage Type, indicate whether to store the destination data in Amazon S3 or Azure Data Lake Storage Gen2.
- 2.In the Output Location field, provide the location in either S3 or Azure for the destination data.
- 3.By default, each output table is written in the format used by the corresponding input table. To instead write all output tables to a single format:
- 1.Toggle Write all output to a specific type to the on position.
- 2.From the Select output type dropdown list, select the output format to use. The options are:
- Avro
- JSON
- Parquet
- Delta
- CSV
- ORC
- 3.If you select CSV, you also configure the file format.
- 1.To treat the first row is a header, check Treat first row as a column header. The box is checked by default.
- 2.In the Column Delimiter field, type the character to use to separate the columns. The default is a comma (
,
). - 3.In the Escape Character field, type the character to use to escape special characters. The default is a backslash (
\
). - 4.In the Quoting Character field, type the character to use to quote text values. The default is a double quote (
"
). - 5.In the NULL Value Replacement String field, type the string to use to represent null values. The default is an empty string.
- 4.By default, Tonic writes the results of each data generation to a different folder. To create the folder, it appends a GUID to the end of the output location. To instead always write the results to the specified output location, and overwrite the results of the previous job, toggle Create job specific destination folder to the off position.