In the workspace configuration, select Spark as the connection type, then select Self-managed as the cluster type.
Under Catalog Database, to connect to a Hive catalog using Livy:
Under Catalog Type, click Hive.
Under Launch Method, click Livy.
In the Hive Catalog Database field, enter the name of the database.
In the Server field, provide the server where the database is located.
In the Port field, provide the port to use to connect to the database.
In the Username field, provide the username for the account to use to connect to the database.
In the Password field, provide the password for the specified user.
To test the connection to the Hive catalog database, click Test Hive Connection.
For Spark workspaces, you can provide where clauses to filter tables. See #table-mode-filter-tables.
The Enable partition filter validation setting indicates whether Tonic Structural should validate those filters when you create them.
By default, the setting is in the on position, and Structural validates the filters. To disable the validation, toggle Enable partition filter validation to the off position.
By default, data generation is not blocked as long as schema changes do not conflict with your workspace configuration.
To block data generation when there are any schema changes, regardless of whether they conflict with your workspace configuration, toggle Block data generation on schema changes to the on position.
Under Livy Connection Details, you connect the connection to Livy, which launches the data generation:
In the Server field, provide the name of the Livy server.
In the Port field, provide the port to use to connect to the Livy server.
In the Proxy User field, provide the name of the proxy user to use to connect to the Livy server.
By default, SSL is enabled, and Enable SSL/TLS is in the on position. We strongly recommend that you do not turn off SSL.
To indicate that Structural should trust the server certificate, toggle Trust Server Certificate to the on position.
To test the connection to Livy, click Test Livy Connection.
In the Output Location section, you configure the location in HDFS where Structural writes the destination data.
In the Server field, provide the server where the destination data is located.
In the IPC Port field, provide the IPC port.
In the Web HDFS Port field, provide the web HDFS port.
By default, WebHDFS Authentication Method is set to None. To use Pseudo authentication, click Pseudo. In the Web HDFS Username field, provide the name of the Web HDFS user. To use Kerberos authentication:
Click Kerberos.
In the Kerberos Username field, type the name of the Kerberos user.
In the Kerberos Password field, type the password for the specified Kerberos user.
In the Kerberos Realm field, type the Kerberos realm.
In the Path on HDFS field, provide the path to the destination database.
By default, SSL is enabled, and Enable SSL/TLS is in the on position. We strongly recommend that you do not turn off SSL.
To indicate that Structural should trust the server certificate, toggle Trust Server Certificate to the on position.
To test the connection to the destination database, click Test HDFS Connection.
Note that if you use Kerberos for authentication, then you must also provide Structural with the path to the Kerberos configuration file (krb5.conf). This allows Tonic to communicate with the Kerberos clusters. The path to the file is stored in the Kerberos environment variable KRB5_CONFIG
.
To provide Structural with access to the configuration file:
Create a volume mount.
Put the Kerberos configuration file (krb5.conf) on the volume mount.
Add the KRB5_CONFIG
environment variable, and set the value to the path to krb5.conf.
The Spark Configuration section provides a list of spark configuration variables that Structural needs to be set.