# Running data generation manually

## Selecting the data generation option <a href="#data-generation-run-job" id="data-generation-run-job"></a>

{% hint style="info" %}
**Required workspace permission:** Run data generation
{% endhint %}

To start the data generation, at the top right of the workspace management view, click **Generate Data**.

<figure><img src="https://3378426797-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2F-LSQCLFQ4bslJ-HYc8c3%2Fuploads%2FN2yEHVMeVad344rZe4XW%2FGenerateDataButton.png?alt=media&#x26;token=6f2d1b63-cf29-4a64-94f8-cc325ce4bd8a" alt=""><figcaption><p>Generate Data button to start a data generation</p></figcaption></figure>

As you configure the data generation options, Structural runs checks to verify that you can use the current configuration to generate data.

If any of these checks do not pass, then when you click **Generate Data**, Structural displays information about why you cannot run the data generation job.

If all of those checks pass, then when you click **Generate Data**, if there are no warnings, the **Confirm Generation** panel displays.

## Warning for notification schema changes <a href="#data-gen-schema-change-warning" id="data-gen-schema-change-warning"></a>

Data generation is always blocked by sensitive schema changes. Sensitive schema changes include new tables and columns that do not have an assigned generator.

The workspace configuration includes whether to block data generation for all schema changes, including notification changes. Notification schema changes include removed tables and columns.

If this setting is turned off, then if there are notification schema changes, when you click **Generate Data**, a warning displays.

To continue to the **Confirm Generation** panel, click **Continue to Data Generation**.

## Confirming the generation details <a href="#data-gen-confirm-generation-details" id="data-gen-confirm-generation-details"></a>

The **Confirm Generation** panel allows you to confirm the details for the data generation.

<figure><img src="https://3378426797-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2F-LSQCLFQ4bslJ-HYc8c3%2Fuploads%2FkwRPp5Aufg9gjNh4eCTZ%2FConfirmGeneration.png?alt=media&#x26;token=3e2c7ca9-4bfa-4f7b-be5a-84299c07d04a" alt=""><figcaption><p>Confirm Generation panel</p></figcaption></figure>

### Indicating whether to use upsert <a href="#data-gen-confirm-upsert" id="data-gen-confirm-upsert"></a>

If upsert is available for the workspace, then you can also determine whether to use upsert for data generation.

If upsert is enabled for the workspace, then by default **Use Upsert** is in the on position.

To not use upsert, toggle **Use Upsert** to the off position. When upsert is turned off, the data generation is a simple data generation that directly populates and replaces the destination database.

### Indicating whether to generate a subset <a href="#confirm-gen-details-subsetting" id="confirm-gen-details-subsetting"></a>

If you configured subsetting, then you can indicate whether to only generate the subset.

To create a subset based on the current subsetting configuration, toggle **Use Subsetting** to the on position.

The initial setting matches the current setting in the subsetting configuration. If **Use subsetting** is enabled in **Subsetting** view, then it is enabled by default on the **Generation Confirmation** panel.

When you change the setting on the generation confirmation panel, it also updates the setting on **Subsetting** view.

### Determining the data generation process to use (Oracle only) <a href="#confirm-gen-details-data-pipeline-v2" id="confirm-gen-details-data-pipeline-v2"></a>

Tonic.ai has released an improved version of the data generation process. It is used automatically for almost all data connectors. It is optional for Oracle.&#x20;

For the new process, the job type is Data Pipeline Generation instead of Data Generation.

On the **Confirm Generation** panel, the **Data Pipeline V2** toggle indicates whether to use the new process:

* When the toggle is in the off position, Structural uses the previous process.
* When the toggle is in the on position, Structural uses the new process.

<figure><img src="https://3378426797-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2F-LSQCLFQ4bslJ-HYc8c3%2Fuploads%2F6YMtHP6o8xQ173ZQIvim%2FConfirmGenerationDataPipelineV2.png?alt=media&#x26;token=389c0840-5c19-42ea-84e6-5a90297a6f69" alt=""><figcaption><p>Confirm Generation panel with the Data Pipeline V2 option</p></figcaption></figure>

## Configuring job logging and metrics <a href="#data-gen-logging-metrics" id="data-gen-logging-metrics"></a>

### Enabling diagnostic logging for a data generation job <a href="#data-gen-diagnostic-enable" id="data-gen-diagnostic-enable"></a>

{% hint style="info" %}
**Required global permission:** Enable diagnostic logging and uploading logs directly to Tonic.ai
{% endhint %}

By default, Structural redacts sensitive values from the logs. To help support troubleshooting, you can configure some Structural data connectors to use diagnostic logging, which generates unredacted versions of the log files. For details, go to [#diagnostic-log-environment-settings](https://docs.tonic.ai/app/admin/tonic-monitoring-logging/logs-redacted-diagnostic#diagnostic-log-environment-settings "mention").

If the data connector is not configured to use diagnostic logging, then you can choose whether to enable diagnostic logging for an individual data generation job. The option is also available for data connectors that do not have a diagnostic logging setting.

On the **Confirm Generation** panel, to enable diagnostic logging for the job, toggle **Enable Diagnostic Logging** to the on position.

Access to diagnostic logs is also controlled by the **Enable diagnostic logging** global permission. If you do not have this permission, then you cannot download diagnostic logs.

### **Sending the log package to Tonic.ai**

{% hint style="info" %}
**Required global permission:** Enable diagnostic logging and uploading logs directly to Tonic.ai
{% endhint %}

You can choose to send a log package to Tonic.ai:

In most cases, you send the log package at the request of Tonic.ai support, who then uses the logs to troubleshoot an issue.

On the **Confirm Generation** panel, to send the log package, toggle **Send Logs to Tonic.ai** to the on position.

Structural creates the package, then uploads it to an S3 bucket.

Packages are removed from the S3 bucket automatically after 30 days.

### Indicating whether to generate performance metrics <a href="#data-gen-manual-performance-metrics" id="data-gen-manual-performance-metrics"></a>

{% hint style="info" %}
**Required workspace permission:** Download job logs
{% endhint %}

To help to troubleshoot issues, for workspaces that use the newer data generation processing, you can configure the data generation job to also generate performance metrics.

The performance metrics start when a specified table is processed, and continue for a specified length of time.

To enable performance metrics for the data generation job:

<figure><img src="https://3378426797-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2F-LSQCLFQ4bslJ-HYc8c3%2Fuploads%2FVhkdHoymeCBKNMVkN8XL%2FDataGenerationPerformanceMetrics.png?alt=media&#x26;token=e4079502-83af-4168-933f-3b5f56a8d393" alt=""><figcaption><p>Configuration options for generating performance traces</p></figcaption></figure>

1. **Toggle Collect Performance Metrics** to the on position.
2. From the **Table Trigger** dropdown list, select the table that triggers the performance metrics.
3. From the **Trace Duration** dropdown list, select the length of time to run the performance metrics.

## Verifying connection information <a href="#data-gen-connection-info" id="data-gen-connection-info"></a>

### Viewing the destination location <a href="#confirm-gen-details-destination-database" id="confirm-gen-details-destination-database"></a>

The **Confirm Generation** panel provides the destination information for the workspace. To display the destination database connection details, click **Destination Settings**.

Depending on the workspace configuration and data connector type, the destination information is either:

* Connection information for a database server.
* A storage location such as an S3 bucket.
* Information to create container artifacts.

If the destination information is incorrect, to navigate to the workspace configuration view to make updates, click **Edit Destination Settings**.

For a [file connector](https://docs.tonic.ai/app/setting-up-your-database/file-connector) workspace, if the source files came from a local file system, then the destination files are written to the large file store in the Structural application database. You can [download the most recently generated files](https://docs.tonic.ai/app/setting-up-your-database/file-connector/file-connector-download-generated-files).

If the destination data is written to a container repository, then from the **Confirm Generation** panel, you can configure custom tag values to use for the artifacts that the data generation job generates. For information about how to configure the tag values, go to [#workspace-settings-containerization-tags](https://docs.tonic.ai/app/workspace/managing-workspaces/workspace-configuration-settings/workspace-config-write-to-container-artifacts#workspace-settings-containerization-tags "mention").

### **Verifying the intermediate database connection information (for upsert)** <a href="#data-gen-confirm-intermediate-db" id="data-gen-confirm-intermediate-db"></a>

When upsert is enabled, the **Confirm Generation** panel provides access to the connection information for the intermediate database. To display the intermediate database connection details, click **Intermediate Upsert Database**.

If the intermediate database information is incorrect, to navigate to the workspace configuration view to make updates, click **Edit Intermediate**.

## Viewing generation performance tips <a href="#confirm-gen-details-performance-tips" id="confirm-gen-details-performance-tips"></a>

For data generation, assigning Truncate table mode to tables that you don't need data for can improve generation performance.

For subsetting, if an upstream table is very large, and the foreign key columns are not indexed, then it can make the subsetting process run more slowly.

The **Want faster generations?** message displays at the bottom of the **Confirm Generation** panel. It displays for all non-subsetting jobs. For subsetting jobs, the panel only displays if Structural identified columns that you should consider indexing.

To display information about tips for faster generation, click **Generation Tips**.

### Viewing suggested columns to index <a href="#generation-tips-columns-to-index" id="generation-tips-columns-to-index"></a>

On the **Generation Tips** panel for subsetting jobs, the **Add Indexes** panel displays the first few columns that you might consider indexing.

<figure><img src="https://3378426797-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2F-LSQCLFQ4bslJ-HYc8c3%2Fuploads%2FvtbGlTvFgNMhFQXKmit3%2FGenerationTipsIndexing.png?alt=media&#x26;token=43d0b355-d344-4149-b32a-43ee40eb6680" alt=""><figcaption><p>Generation Tips panel with indexing suggestions</p></figcaption></figure>

To display a panel with a suggested SQL command to add the index, click the information icon next to the column.

<figure><img src="https://3378426797-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2F-LSQCLFQ4bslJ-HYc8c3%2Fuploads%2FJeOBRSIX9Q9phXXW0T7J%2FGenerationTipsIndexingSQLExample.png?alt=media&#x26;token=a9b93e7b-23d5-429c-a748-56b540c8b532" alt=""><figcaption><p>Example SQL command for indexing</p></figcaption></figure>

On the panel, to copy the command to the clipboard, click **Copy SQL to Clipboard**.

If there are additional columns that are not listed, then to display the full list of columns to index, click **Show all columns**.

<figure><img src="https://3378426797-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2F-LSQCLFQ4bslJ-HYc8c3%2Fuploads%2FSWidoVy02Kq2SnAOoasP%2FGenerationTipsIndexingAllIndexes.png?alt=media&#x26;token=5d790264-9285-441e-a66d-1c8496941baa" alt=""><figcaption><p>Full list of indexing suggestions</p></figcaption></figure>

On the full list, to download the list to a CSV file, click **Download list of columns (.csv)**.

### Hint to truncate tables <a href="#generation-tips-truncate-tables" id="generation-tips-truncate-tables"></a>

On the **Generation Tips** panel for non-subsetting jobs, the **Truncate Tables** panel displays the hint to truncate tables that contain data that you do not need in the destination database.

To navigate to **Database View** to change the current configuration, click **Go to Database View**.

## Starting the generation job

On the **Confirm Generation** panel, after you confirm the generation details, to start the data generation, click **Run Generation**.

When upsert is enabled, to start the data generation and upsert jobs:

1. Click **Run Generation + Upsert**.
2. In the menu, click **Run Generation + Upsert**.

<figure><img src="https://3378426797-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2F-LSQCLFQ4bslJ-HYc8c3%2Fuploads%2Fuu6IwUxytGzsE60QnmcJ%2FConfirmGenerationUpsertRunOptions.png?alt=media&#x26;token=1f312b6a-7560-4883-8ecd-ea8732c27e49" alt=""><figcaption><p>Upsert generation options</p></figcaption></figure>

Structural displays a notification that the job has started. To track the progress of the data generation job and view the results, click the **View Job** button on the notification, or go to [**Jobs** view](https://docs.tonic.ai/app/workspace/jobs).

## Starting an upsert job based on the most recent data generation <a href="#data-gen-run-upsert-only" id="data-gen-run-upsert-only"></a>

If upsert is enabled for a workspace, then on the **Confirm Generation** panel, the more common option is to run both data generation and upsert.

After you run at least one successful data generation to the intermediate database, then you can also choose to run only the upsert process.

For example, if the data generation succeeds but the upsert process fails, then after you address the issues that caused the upsert to fail, you can run the upsert process again.

You also must start the upsert job manually if you turn off **Automatically Start Upsert After Successful Data Generation** in the workspace settings.

From the **Confirm Generation** panel, to run upsert only:

1. Click the **Run Generation + Upsert** button.
2. In the menu, click **Run Upsert Only**.

When you run upsert only, the process uses the results of the most recent data generation.
