# Configuring Databricks workspace data connections

During workspace creation, under **Connection Type**, select **Databricks**.

## Identifying the source database <a href="#databricks-connection-config-source-server" id="databricks-connection-config-source-server"></a>

In the **Source Settings** section:

1. By default, the workspace includes a single schema.\
   \
   To enable the option to provide multiple schemas, toggle **Use multiple Unity Catalog source schemas** to the on position.
2. In the **Catalog Name** field, provide the name of the catalog where the source database is located.\
   \
   If you do not provide a catalog name, then the default catalog is used.\
   \
   For Unity Catalog, this is the catalog that you configured as the default.\
   \
   For earlier versions that do not support Unity Catalog, the default is `hive_metastore`.
3. If you did not enable multiple schemas, then in the **Database Name** field, provide the name of the source database schema.\
   \
   If you did enable multiple schemas, then in the **Database Names** field, for each database schema to include, type the database name, then press Enter. You must provide at least two database schemas.

## Enabling validation of table filters <a href="#databricks-data-connection-table-filter-validation" id="databricks-data-connection-table-filter-validation"></a>

For Databricks workspaces, you can provide where clauses to filter tables. For details, go to [#table-mode-filter-tables](https://docs.tonic.ai/app/generation/table-modes#table-mode-filter-tables "mention").

The **Enable partition filter validation** toggle indicates whether Tonic Structural validates those filters when you create them.

By default, the setting is in the on position, and Structural validates the filters. To disable the validation, toggle **Enable partition filter validation** to the off position.

## Connecting to the Databricks cluster <a href="#databricks-connection-config-databricks-cluster" id="databricks-connection-config-databricks-cluster"></a>

In the **Databricks Cluster** section, you provide the connection information for the cluster.

1. Under **Databricks Type**, select whether to use Databricks on AWS or Azure Databricks.
2. In the **API Token** field, provide the API token for Databricks. For information on how to generate an API token, go to the [Databricks documentation](https://docs.databricks.com/dev-tools/api/latest/authentication.html#generate-a-personal-access-token).
3. In the **Host URL** field, provide the URL for the cluster host.
4. In the **HTTP Path** field, provide the path to the cluster.
5. In the **Port** field, provide the port to use to access the cluster.
6. By default, data generation jobs run on the specified cluster. To instead run data generation jobs on an ephemeral Databricks job cluster:
   1. Toggle **Use Databricks Job Cluster** to the on position.
   2. In the **Cluster Information** text area, provide the details for the job cluster.
7. For clusters that use Databricks runtime 10.4 and below, Structural installs a cluster initialization script, which is stored as a Databricks workspace file.\
   \
   By default, this script is uploaded to the `/Shared` workspace directory.\
   \
   To upload the script to a different directory, set **Workspace Path** to an absolute path in the workspace tree. Structural must have access to the directory.
8. To test the connection to the cluster, click **Test Cluster Connection**.

## Connecting to the destination server <a href="#databricks-connection-config-destination-server" id="databricks-connection-config-destination-server"></a>

In the **Destination Settings** section, you specify where Structural writes the destination database.

### Selecting the output type <a href="#databricks-config-destination-output-type" id="databricks-config-destination-output-type"></a>

Under **Output Storage Type**, select the type of storage to use for the destination data:

* To use Databricks Delta tables, click **Databricks**.
* To use Amazon S3, click **Amazon S3 Files**.
* To use Azure, click **Azure Data Lake Storage Gen2 Files**.

### Configuring the output settings for Databricks Delta tables <a href="#databricks-config-destination-databricks-delta-output" id="databricks-config-destination-databricks-delta-output"></a>

If you selected **Databricks** as the output type:

1. In the **Catalog Name** field, provide the name of the catalog that contains the database\
   \
   If the Databricks cluster connection supports multiple catalogs (Unity Catalog) and you do not specify a catalog, then Structural uses the default catalog.\
   \
   For connections that use the legacy metastore, you can leave the field blank, or set it to `hive_metastore`.\
   \
   Note that if you specify a catalog that does not already exist, then the user that is associated with the API token must have permission to create the catalog.
2. If you did not provide multiple source database schemas, then in the **Database Name** field, provide the name of the database. If you do not specify a database, Structural uses the database name default in the active catalog.\
   \
   If you did provide multiple source database schemas, then the **Database Name** field does not display. Structural automatically creates destination schemas that match the source schemas.

### Configuring the output settings for Amazon S3 or Azure <a href="#databricks-config-destination-s3-azure-output" id="databricks-config-destination-s3-azure-output"></a>

If you selected either **Amazon S3 Files** or **Azure Data Lake Storage Gen2 Files** as the output type:

1. In the **Output Location** field, provide the location in either Amazon S3 or Azure for the destination data.
2. By default, Structural writes the results of each data generation to a different folder. To create the folder, it appends a GUID to the end of the output location.\
   \
   To instead always write the results to the specified output location, and overwrite the results of the previous job, toggle **Create job specific destination folder** to the off position.<br>

   If you use non-job-specific folders for destination data, then the following [environment settings](https://docs.tonic.ai/app/admin/environment-variables-setting) determine how Structural handles overwrites. You can configure these settings from the **Environment Settings** tab on **Structural Settings**. You can also [override these settings](https://docs.tonic.ai/app/workspace/managing-workspaces/workspace-configuration-settings/advanced-overrides) in individual workspaces. Note that any defined table-level Error on Override setting takes precedence over these settings.

   * `TONIC_WORKSPACE_ERROR_ON_OVERRIDE`. Whether to prevent overwrites of previous writes. By default, this setting is `true`, and attempts to overwrite return an error. To allow overwrites, set this to `false`.
   * `TONIC_WORKSPACE_DEFAULT_SAVE_MODE`. The mode to use to save tables to a non-job-specific folder. When this is set to a value other than null, which is the default, then this setting takes precedence over `TONIC_WORKSPACE_ERROR_ON_OVERRIDE`. The available values are `Append`, `ErrorIfExists`, `Ignore`, and `Overwrite`.
3. By default, each output table is written in the format used by the corresponding input table. To instead write all output tables to a single format:
   1. Toggle **Write all output to a specific type** to the on position.
   2. From the **Select output type** dropdown list, select the output format to use. The options are:
      * Avro
      * JSON
      * Parquet
      * Delta
      * CSV
      * ORC
   3. If you select CSV, you also configure the file format.
      1. To treat the first row as a header, check **Treat first row as a column header**.\
         \
         The box is checked by default.
      2. In the **Column Delimiter** field, type the character to use to separate the columns.\
         \
         The default is a comma (`,`).
      3. In the **Escape Character** field, type the character to use to escape special characters.\
         \
         The default is a backslash (`\`).
      4. In the **Quoting Character** field, type the character to use to quote text values.\
         \
         The default is a double quote (`"`).
      5. In the **NULL Value Replacement String** field, type the string to use to represent null values.\
         \
         The default is an empty string.


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://docs.tonic.ai/app/setting-up-your-database/databricks/connecting-to-databricks.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
