Running the Structural sensitivity scan
When sensitivity scans run
Structural runs sensitivity scans automatically based on specific events. You can also run manual sensitivity scans on demand.
On a self-hosted instance, sensitivity scans can also run automatically at the same time each day.
Event-based sensitivity scans
Structural automatically runs a sensitivity scan when you:
Create a completely new workspace and connect a data source
Change the data connection details for the source database
Add a file group to a file connector workspace
A child workspace always inherits the sensitivity designations from its parent workspace.
When you copy a workspace, Structural runs a new sensitivity scan on the copy to identify sensitive columns. However, it keeps the sensitivity designation for columns that you specifically marked as sensitive or not sensitive.
Manual sensitivity scans
In addition to the automatic scans, from Privacy Hub, you can start a sensitivity scan manually.
Scheduling daily sensitivity scans
On self-hosted instances, Structural can also run scheduled daily sensitivity scans in the background.
The daily scans only run on the 10 workspaces that had the most recent activity. Activity includes:
User-initiated updates that are included in the Protection Audit Trail
Data generation jobs
By default, Structural runs the sensitivity scans each day at midnight.
To enable and configure the daily sensitivity scans, use the following environment settings. You can add these settings to the Environment Settings list on Structural Settings.
TONIC_ENABLE_SCHEDULED_SENSITIVITY_SCAN
- Boolean to indicate whether to enable the scheduled daily sensitivity scans. The default value istrue
. To disable the scheduled daily scan, set this tofalse
.TONIC_SENSITIVITY_SCAN_HOUR
- When scheduled scans are enabled, the hour at which to run the scans. The setting uses the local time zone. The value is an integer between 0 and 23, where 0 is midnight and 23 is 11:00 PM. For example, a value of 14 indicates to run the job at 2:00 PM. The default value is 0.
Configuring parallel processing for sensitivity scans
For improved performance, sensitivity scans can use parallel processing.
For relational databases such as PostgreSQL and SQL Server, to configure parallel processing, you use the environment setting TONIC_PII_SCAN_PARALLELISM_RDBMS
. The default value is 4.
For document-based databases such as MongoDB, you use the environment setting TONIC_PII_SCAN_PARALLELISM_DOCUMENTDB
. The default value is 1.
How Structural identifies sensitive values
The Structural sensitivity scan uses the following rules and processes to:
Identify sensitive columns
Recommend generators for those columns. For information about applying recommended generators to columns, go to Reviewing and applying recommended generators.
Indicate its confidence that an identified column is sensitive and is of the detected sensitivity type
Note that this process cannot guarantee perfect precision and recall. We strongly recommend that a human reviews the sensitivity scan results and the broader dataset to ensure that nothing sensitive was missed.
Rule-based data type, column name, and value analysis - High, medium, or low confidence
To identify that a column contains sensitive information for a built-in sensitivity type, Structural looks at the data type, column name, and column values.
This part of the sensitivity scan uses regular expression matching and dictionary lookups. It produces high, medium, or low confidence detections.
When this part of the sensitivity scan determines that a column contains sensitivity data, it:
Marks the column as sensitive
Assigns the sensitivity type to the column
Recommends the generator configuration for the identified sensitivity type. Note that if the recommended generator is not compatible with the column, then Structural discards the recommendation.
Marks the sensitivity detection as high, medium, or low confidence. The confidence level is based on a calculation of how well the column matched the applicable rules.
Custom sensitivity rules - Full confidence
The sensitivity scan also looks for any columns that match custom sensitivity types that you define in your custom sensitivity rules.
Custom sensitivity rules are based on the column data type and column name. For more information about custom sensitivity rules, go to Creating and managing custom sensitivity rules.
Custom sensitivity rules always produce full confidence detections.
When a column matches a custom sensitivity rule, Structural:
Marks the column as sensitive
Assigns the sensitivity rule name as the sensitivity type
Recommends the generator preset from the sensitivity rule
Marks the sensitivity detection as full confidence
Model-based analysis - Medium and low confidence
To identify additional sensitive columns that might not be captured by the other parts of the scan, the sensitivity scan uses an artificial intelligence (AI) model. Note that the model is pre-trained. Structural does not use customer data to train the model or send any customer data externally.
This part of the scan produces medium or low confidence detections for built-in entity types.
The model considers the table and column name. If the combination of table and column name is similar in meaning to a sensitivity type that Structural has a recommended generator for, then Structural:
Marks the column as sensitive
Assigns the sensitivity type to the column
Recommends the generator configuration for that sensitivity type
Uses AI to compare the table name and column name combination to the sensitivity type, and produces a semantic similarity score.
Based on the semantic similarity score, marks the sensitivity detection as either medium or low confidence.
Downloading the sensitivity scan log
To download the log of the most recent sensitivity scan:
On the workspace management view, from the download menu, select Download Sensitivity Scan Log.
On Privacy Hub, click Reports and Logs, then select Scan Log.
The log tracks the progress of the scan.
Last updated