Search…
⌃K
Links

Configuring the workspace data connections

During workspace creation, under Connection Type, select Databricks.

Connecting to the source server

In the Source Server section:
  1. 1.
    In the Database Name field, provide the name of the database.
  2. 2.
    For Databricks workspaces, you can provide where clauses to filter tables. See Applying a filter to tables. The Enable partition filter validation toggle indicates whether Tonic should validate those filters when you create them. By default, the setting is in the on position, and Tonic validates the filters. To disable the validation, toggle Enable partition filter validation to the off position.
  3. 3.
    By default, data generation is not blocked as long as schema changes do not conflict with your workspace configuration. To block data generation when there are any schema changes, regardless of whether they conflict with your workspace configuration switch Block data generation on schema changes to the on position.

Connecting to the Databricks cluster

In the Databricks Cluster section, you provide the connection information for the cluster.
  1. 1.
    Under Databricks Type, select whether to use Databricks on AWS or Azure Databricks.
  2. 2.
    In the API Token field, provide the API token for Databricks. For information on how to generate an API token, see the Databricks documentation.
  3. 3.
    In the Host URL field, provide the URL for the cluster host.
  4. 4.
    In the HTTP Path field, provide the path to the cluster.
  5. 5.
    In the Port field, provide the port to use to access the cluster.
  6. 6.
    By default, data generation jobs run on the specified cluster. To instead run data generation jobs on an ephemeral Databricks job cluster:
    1. 1.
      Toggle Use Databricks Job Cluster to the on position.
    2. 2.
      In the Cluster Information text area, provide the details for the job cluster.
  7. 7.
    To test the connection to the cluster, click Test Cluster Connection.

Connecting to the destination server

In the Destination Server section, you specify where Tonic writes the destination database:
  1. 1.
    Under Output Storage Type, indicate whether to store the destination data in Amazon S3 or Azure Data Lake Storage Gen2.
  2. 2.
    In the Output Location field, provide the location in either S3 or Azure for the destination data.
  3. 3.
    By default, Tonic writes the results of each data generation to a different folder. To create the folder, it appends a GUID to the end of the output location. To instead always write the results to the specified output location, and overwrite the results of the previous job, toggle Create job specific destination folder to the off position.
  4. 4.
    To write the destination data to Databricks Delta, toggle Write all tables to Databricks Delta to the on position.