Running the Structural sensitivity scan

When sensitivity scans run

Structural runs sensitivity scans automatically based on specific events. You can also run manual sensitivity scans on demand.

On a self-hosted instance, sensitivity scans can also run automatically at the same time each day.

Event-based sensitivity scans

Structural automatically runs a sensitivity scan when you:

  • Create a completely new workspace and connect a data source

  • Change the data connection details for the source database

  • Add a file group to a file connector workspace

A child workspace always inherits the sensitivity designations from its parent workspace.

When you copy a workspace, Structural runs a new sensitivity scan on the copy to identify sensitive columns. However, it keeps the sensitivity designation for columns that you specifically marked as sensitive or not sensitive.

Manual sensitivity scans

In addition to the automatic scans, from Privacy Hub, you can start a sensitivity scan manually.

Scheduling daily sensitivity scans

On self-hosted instances, Structural can also run scheduled daily sensitivity scans in the background.

The daily scans only run on the 10 workspaces that had the most recent activity. Activity includes:

By default, Structural runs the sensitivity scans each day at midnight.

To enable and configure the daily sensitivity scans, use the following environment settings. You can add these settings to the Environment Settings list on Structural Settings.

  • TONIC_ENABLE_SCHEDULED_SENSITIVITY_SCAN - Boolean to indicate whether to enable the scheduled daily sensitivity scans. The default value is true. To disable the scheduled daily scan, set this to false.

  • TONIC_SENSITIVITY_SCAN_HOUR - When scheduled scans are enabled, the hour at which to run the scans. The setting uses the local time zone. The value is an integer between 0 and 23, where 0 is midnight and 23 is 11:00 PM. For example, a value of 14 indicates to run the job at 2:00 PM. The default value is 0.

Configuring parallel processing for sensitivity scans

For improved performance, sensitivity scans can use parallel processing.

For relational databases such as PostgreSQL and SQL Server, to configure parallel processing, you use the environment setting TONIC_PII_SCAN_PARALLELISM_RDBMS. The default value is 4.

For document-based databases such as MongoDB, you use the environment setting TONIC_PII_SCAN_PARALLELISM_DOCUMENTDB. The default value is 1.

How Structural identifies sensitive values

To identify that a column contains sensitive information, Structural looks at the data type, column name, and column values. To help identify sensitive column values, the scan uses regex matching and dictionary lookups.

The sensitivity scan always looks for the Structural built-in sensitivity types. It also looks for any custom sensitivity types that you define in your custom sensitivity rules. Those rules are based on the column data type and column name. For more information about custom sensitivity rules, go to Creating and managing custom sensitivity rules.

This process cannot guarantee perfect precision and recall. We strongly recommend that a human reviews the sensitivity scan results and the broader dataset to ensure that nothing sensitive was missed.

Results of the sensitivity scan

For the columns that it identifies as containing sensitive data, Structural:

Note that if the recommended generator is not compatible with the column, then Structural discards the recommendation.

Downloading the sensitivity scan log

To download the log of the most recent sensitivity scan:

  • On the workspace management view, from the download menu, select Download Sensitivity Scan Log.

  • On Privacy Hub, click Reports and Logs, then select Scan Log.

The log tracks the progress of the scan.

Last updated