1 of 5

Identifying sensitive data

Tonic Structural uses its sensitivity scan to identify source data columns that contain sensitive information. The sensitivity scan identifies Structural's built-in sensitivity types. It also looks for custom types that you define.

You can also manually mark a column as sensitive or not sensitive.

Running the Structural sensitivity scan

When sensitivity scans run

Structural runs sensitivity scans automatically based on specific events. You can also run manual sensitivity scans on demand.

On a self-hosted instance, sensitivity scans can also run automatically at the same time each day.

Event-based sensitivity scans

Structural automatically runs a sensitivity scan when you:

Create a completely new workspace and connect a data source.
Change the data connection details for the source database.
Add a file group to a file connector workspace.

A child workspace always inherits the sensitivity designations from its parent workspace.

When you copy a workspace, Structural runs a new sensitivity scan on the copy to identify sensitive columns. However, it keeps the sensitivity designation for columns that you specifically marked as sensitive or not sensitive.

Manual sensitivity scans

In addition to the automatic scans, from Privacy Hub, you can .

Scheduling daily sensitivity scans

On self-hosted instances, Structural can also run scheduled daily sensitivity scans in the background.

The daily scans only run on the 10 workspaces that had the most recent activity. Activity includes:

Data generation jobs.

By default, Structural runs the sensitivity scans each day at midnight.

TONIC_ENABLE_SCHEDULED_SENSITIVITY_SCAN - Boolean to indicate whether to enable the scheduled daily sensitivity scans. The default value is true. To disable the scheduled daily scan, set this to false.
TONIC_SENSITIVITY_SCAN_HOUR - When scheduled scans are enabled, the hour at which to run the scans. The setting uses the local time zone. The value is an integer between 0 and 23, where 0 is midnight and 23 is 11:00 PM. For example, a value of 14 indicates to run the job at 2:00 PM. The default value is 0.

Configuring parallel processing for sensitivity scans

For improved performance, sensitivity scans can use parallel processing.

For document-based databases such as MongoDB, you use the environment setting TONIC_PII_SCAN_PARALLELISM_DOCUMENTDB. The default value is 1.

How Structural identifies sensitive values

The Structural sensitivity scan uses the following rules and processes to:

Identify sensitive columns.
Indicate its confidence that an identified column is sensitive and is of the detected sensitivity type.

Note that this process cannot guarantee perfect precision and recall. We strongly recommend that a human reviews the sensitivity scan results and the broader dataset to ensure that nothing sensitive was missed.

Rule-based data type, column name, and value analysis - High, medium, or low confidence

This part of the sensitivity scan uses regular expression matching and dictionary lookups. It produces high, medium, or low confidence detections.

When this part of the sensitivity scan determines that a column contains sensitivity data, it:

Marks the column as sensitive
Assigns the sensitivity type to the column
Recommends the generator configuration for the identified sensitivity type. Note that if the recommended generator is not compatible with the column, then Structural discards the recommendation.
Marks the sensitivity detection as high, medium, or low confidence. The confidence level is based on a calculation of how well the column matched the applicable rules.

Custom sensitivity rules - Full confidence

The sensitivity scan also looks for any columns that match custom sensitivity types that you define in your custom sensitivity rules.

Custom sensitivity rules always produce full confidence detections.

When a column matches a custom sensitivity rule, Structural:

Marks the column as sensitive.
Assigns the sensitivity rule name as the sensitivity type.
Recommends the generator preset from the sensitivity rule.
Marks the sensitivity detection as full confidence.

Model-based analysis - Medium and low confidence

To identify additional sensitive columns that might not be captured by the other parts of the scan, the sensitivity scan uses an artificial intelligence (AI) model. Note that the model is pre-trained. Structural does not use customer data to train the model, and it does not send any customer data externally.

This part of the scan produces medium or low confidence detections for built-in entity types.

The model considers the table and column name. If the combination of table and column name is similar in meaning to a sensitivity type that Structural has a recommended generator for, then Structural:

Marks the column as sensitive.
Assigns the sensitivity type to the column.
Recommends the generator configuration for that sensitivity type.
Uses AI to compare the table name and column name combination to the sensitivity type, and produces a semantic similarity score.
Based on the semantic similarity score, marks the sensitivity detection as either medium or low confidence.

Downloading the sensitivity scan log

To download the log of the most recent sensitivity scan, either:

On the workspace management view, from the download menu, select Download Sensitivity Scan Log.
On Privacy Hub, click Reports and Logs, then select Scan Log.

The log tracks the progress of the scan.

Manually indicating whether a column is sensitive

You can also manually indicate that a column is sensitive or not sensitive.

For example, the sensitivity scan might incorrectly identify a column as sensitive. Or a column might contain data that you consider sensitive but that does not match a detected sensitivity type.

When you manually change a column from not sensitive to sensitive, Structural marks the sensitivity detection as full confidence.

For information on how to change whether a column is sensitive:

For Privacy Hub, go to .
For Database View, go to:
- For a single column,
- For multiple selected columns,
For Table View, go to .

The Structural API also provides endpoints to designate columns as sensitive or not sensitive.

Built-in sensitivity types that Structural detects

Structural identifies the following types of sensitive values. These include some information types that are considered by many privacy standards and frameworks such as HIPAA, GDPR, CCPA, and PCI.

For more information about the HIPAA and Safe Harbor information types that Structural detects, go to the Tonic.ai guide Using Tonic Structural and the Safe Harbor method to de-identify PHI.

Names

First
Last
Full

Organization

Location

Street address
ZIP
PO Box
City
State and two-letter abbreviation
Country
Postal code
GPS coordinates

Contact information

Email address
Telephone number

User credentials

Username
Password

Financial information

Credit card number
International bank account number (IBAN)
SWIFT code for bank transfers
Money amount
BTC (Bitcoin) address

Identification

Social Security Number
Passport number
Driver's license number
Birth date
Gender
Biometric identifier, such as a fingerprint or voiceprint
Full face photographic images and similar images

Medical information

ICD-9 and ICD-10 codes (Used to identify diseases)
Medical record number
Health plan beneficiary number
Admission date
Discharge date
Date of death

Other personal information

Marital status

Accounts and licenses

Account number
Certificate or license number

Network and web location

IP address
IPv6 address
MAC address
Web URL

International Mobile Equipment Identity (IMEI)

Vehicle information

Vehicle identification number (VIN)
License plate number

Creating and managing custom sensitivity rules

Required license: Enterprise

Required global permission: Create and manage sensitivity rules

By default, when a Structural security scan runs on a workspace, it looks for the .

You can also define custom sensitivity rules to identify other values and the corresponding recommended generator. Your data might include values that are specific to your organization.

Each custom sensitivity rule specifies:

The data type for matching columns.
Text matching criteria for the names of matching columns.
The recommended generator preset.

Displaying the list of custom sensitivity rules

To display the current list of sensitivity rules, in the Structural navigation menu, click Sensitivity Rules.

The list contains the sensitivity rules for a self-hosted Structural instance or a Structural Cloud organization.

For each rule, the list includes:

The rule name and description
The recommended generator preset
When the rule was most recently modified

Filtering the rules

You can filter the rule list by the following:

Rule name
Rule description
Generator preset name
Name of the user who most recently updated the rule

In the filter field, start to type text from any of those values. As you type, the list is filtered to only include matching rules.

Note that when the list is filtered, you cannot change the display sequence of the rules.

Setting the rule sequence

Structural applies the rules based on their display order in the list.

If a column matches more than one rule, Structural applies the first matching rule.

To change the display order of a rule, drag and drop it to the new location in the list.

Note that you cannot change the rule sequence when the list is filtered.

Creating and editing a sensitivity rule

Creating a sensitivity rule

To create a sensitivity rule:

On the Sensitivity Rules view, click New Custom Rule.
Click Save.

Editing a sensitivity rule

To change the configuration of a sensitivity rule:

On the Sensitivity Rules view, click the edit icon for the rule.
Click Save.

Note that any changes to a sensitivity rule do not take effect until the next sensitivity scan.

Sensitivity rule configuration

Rule name and description

In the Name field, type the name of the sensitivity rule. The rule name becomes the sensitivity type for matching columns. The rule name must be unique, and also cannot match the name of a built-in sensitivity type.

Optionally, in the Description field, type a longer description of the sensitivity rule.

Data type

From the Data Type dropdown list, select the data type for matching columns. For example, a rule might only be used for columns that contain text.

The available data types are general types that map to specific data types in a given database. The available types are:

Array
Binary
Boolean
Continuous Numerical
Date Range
Datetime
Integer
JSON
MAC Address
Network Address
Text
UUID
XML

Column name criteria

Under Column Name Match, provide the criteria to identify matching columns based on the column name.

Note that a matching column must match both the data type and the column name criteria.

Configuring text matching conditions

When you provide a list of text matching conditions, a matching column must match all of the conditions. In other words, the conditions are joined by AND.

To apply the same generator preset to columns that have completely different names, you must create separate sensitivity rules.

To create a list of text matching conditions:

Click Text Match.
To add a column name condition, click Add String Match.
For each condition:
1. From the comparison type dropdown list, select the type of comparison. For example, Contains, Starts with, Ends with.
2. In the comparison text field, provide the text to check for. The comparison text is case insensitive. For example, if you set a condition to match column names that contain the text term, it also matches column names that contain TERM or Term or tErM.
To remove a column name condition, click its delete icon.

Providing a regular expression

To use a regular expression to identify matching columns based on the column name:

Click Regular Expression.
In the field, provide the regular expression.

Generator preset to apply

From the Recommended Generator Preset dropdown list, select the generator preset that is the recommended generator for matching columns.

To search for a specific preset, begin to type the generator preset name.

Managing generator preset configuration

Required global permission: Create and manage generator presets

When you configure a sensitivity rule, you can also create a new generator preset or update the configuration of the selected generator preset.

To create a new generator preset, click Create Preset. On the generator preset details panel, provide the generator preset configuration, then click Create.

To edit the selected generator preset, click Edit Current Preset. On the generator preset details panel, update the generator preset configuration, then click Save and Apply.

Previewing the rule results

If you have access to a workspace, then you can use the workspace to preview the sensitivity rule results.

Under Test Results, from the workspace dropdown list, select the workspace to use.

Structural searches the workspace schema for matching columns based on the sensitivity rule configuration.

It displays any matching columns. You can filter the matching columns based on the table or column name.

For each matching column, the list includes:

The column name and table
A sample value from the source data. To see the sample source value, you must have the Preview source data permission for the workspace.
A sample replacement value, based on the selected generator preset for the sensitivity rule. To see the sample replacement value, you must have the Preview destination data permission for the workspace.

Deleting a sensitivity rule

To delete a sensitivity rule, on the Sensitivity Rules view, click the delete icon for the rule.

Note that existing generator recommendations for the rule remain in place until the next sensitivity scan.

Creating and managing custom sensitivity rules

Required license: Enterprise

Required global permission: Create and manage sensitivity rules

By default, when a Structural security scan runs on a workspace, it looks for the .

You can also define custom sensitivity rules to identify other values and the corresponding recommended generator. Your data might include values that are specific to your organization.

Each custom sensitivity rule specifies:

The data type for matching columns.
Text matching criteria for the names of matching columns.
The recommended generator preset.

Displaying the list of custom sensitivity rules

To display the current list of sensitivity rules, in the Structural navigation menu, click Sensitivity Rules.

The list contains the sensitivity rules for a self-hosted Structural instance or a Structural Cloud organization.

For each rule, the list includes:

The rule name and description
The recommended generator preset
When the rule was most recently modified

Filtering the rules

You can filter the rule list by the following:

Rule name
Rule description
Generator preset name
Name of the user who most recently updated the rule

In the filter field, start to type text from any of those values. As you type, the list is filtered to only include matching rules.

Note that when the list is filtered, you cannot change the display sequence of the rules.

Setting the rule sequence

Structural applies the rules based on their display order in the list.

If a column matches more than one rule, Structural applies the first matching rule.

To change the display order of a rule, drag and drop it to the new location in the list.

Note that you cannot change the rule sequence when the list is filtered.

Creating and editing a sensitivity rule

Creating a sensitivity rule

To create a sensitivity rule:

On the Sensitivity Rules view, click New Custom Rule.
On the Create Custom Rule view, .
Click Save.

Editing a sensitivity rule

To change the configuration of a sensitivity rule:

On the Sensitivity Rules view, click the edit icon for the rule.
On the Edit Custom Rule view, .
Click Save.

Note that any changes to a sensitivity rule do not take effect until the next sensitivity scan.

Sensitivity rule configuration

Rule name and description

Optionally, in the Description field, type a longer description of the sensitivity rule.

Data type

From the Data Type dropdown list, select the data type for matching columns. For example, a rule might only be used for columns that contain text.

The available data types are general types that map to specific data types in a given database. The available types are:

Array
Binary
Boolean
Continuous Numerical
Date Range
Datetime
Integer
JSON
MAC Address
Network Address
Text
UUID
XML

Column name criteria

Under Column Name Match, provide the criteria to identify matching columns based on the column name.

Note that a matching column must match both the data type and the column name criteria.

Configuring text matching conditions

When you provide a list of text matching conditions, a matching column must match all of the conditions. In other words, the conditions are joined by AND.

To apply the same generator preset to columns that have completely different names, you must create separate sensitivity rules.

To create a list of text matching conditions:

Click Text Match.
To add a column name condition, click Add String Match.
For each condition:
1. From the comparison type dropdown list, select the type of comparison. For example, Contains, Starts with, Ends with.
2. In the comparison text field, provide the text to check for. The comparison text is case insensitive. For example, if you set a condition to match column names that contain the text term, it also matches column names that contain TERM or Term or tErM.
To remove a column name condition, click its delete icon.

Providing a regular expression

To use a regular expression to identify matching columns based on the column name:

Click Regular Expression.
In the field, provide the regular expression.

Generator preset to apply

From the Recommended Generator Preset dropdown list, select the generator preset that is the recommended generator for matching columns.

To search for a specific preset, begin to type the generator preset name.

Managing generator preset configuration

Required global permission: Create and manage generator presets

When you configure a sensitivity rule, you can also create a new generator preset or update the configuration of the selected generator preset.

To create a new generator preset, click Create Preset. On the generator preset details panel, provide the generator preset configuration, then click Create.

To edit the selected generator preset, click Edit Current Preset. On the generator preset details panel, update the generator preset configuration, then click Save and Apply.

For more information about generator preset configuration, go to .

Previewing the rule results

If you have access to a workspace, then you can use the workspace to preview the sensitivity rule results.

Under Test Results, from the workspace dropdown list, select the workspace to use.

Structural searches the workspace schema for matching columns based on the sensitivity rule configuration.

It displays any matching columns. You can filter the matching columns based on the table or column name.

For each matching column, the list includes:

The column name and table
A sample value from the source data. To see the sample source value, you must have the Preview source data permission for the workspace.
A sample replacement value, based on the selected generator preset for the sensitivity rule. To see the sample replacement value, you must have the Preview destination data permission for the workspace.

Deleting a sensitivity rule

To delete a sensitivity rule, on the Sensitivity Rules view, click the delete icon for the rule.

Note that existing generator recommendations for the rule remain in place until the next sensitivity scan.

Running the Structural sensitivity scan

When sensitivity scans run

Structural runs sensitivity scans automatically based on specific events. You can also run manual sensitivity scans on demand.

On a self-hosted instance, sensitivity scans can also run automatically at the same time each day.

Event-based sensitivity scans

Structural automatically runs a sensitivity scan when you:

Create a completely new workspace and connect a data source.
Change the data connection details for the source database.
Add a file group to a file connector workspace.

A child workspace always inherits the sensitivity designations from its parent workspace.

Manual sensitivity scans

In addition to the automatic scans, from Privacy Hub, you can .

Scheduling daily sensitivity scans

On self-hosted instances, Structural can also run scheduled daily sensitivity scans in the background.

The daily scans only run on the 10 workspaces that had the most recent activity. Activity includes:

User-initiated updates that are included in the .
Data generation jobs.

By default, Structural runs the sensitivity scans each day at midnight.

To enable and configure the daily sensitivity scans, use the following . You can add these settings to the Environment Settings list on Structural Settings.

TONIC_ENABLE_SCHEDULED_SENSITIVITY_SCAN - Boolean to indicate whether to enable the scheduled daily sensitivity scans. The default value is true. To disable the scheduled daily scan, set this to false.
TONIC_SENSITIVITY_SCAN_HOUR - When scheduled scans are enabled, the hour at which to run the scans. The setting uses the local time zone. The value is an integer between 0 and 23, where 0 is midnight and 23 is 11:00 PM. For example, a value of 14 indicates to run the job at 2:00 PM. The default value is 0.

Configuring parallel processing for sensitivity scans

For improved performance, sensitivity scans can use parallel processing.

For relational databases such as PostgreSQL and SQL Server, to configure parallel processing, you use the TONIC_PII_SCAN_PARALLELISM_RDBMS. The default value is 4.

For document-based databases such as MongoDB, you use the environment setting TONIC_PII_SCAN_PARALLELISM_DOCUMENTDB. The default value is 1.

How Structural identifies sensitive values

The Structural sensitivity scan uses the following rules and processes to:

Identify sensitive columns.
Recommend generators for those columns. For information about applying recommended generators to columns, go to .
Indicate its confidence that an identified column is sensitive and is of the detected sensitivity type.

Rule-based data type, column name, and value analysis - High, medium, or low confidence

To identify that a column contains sensitive information for a , Structural looks at the data type, column name, and column values.

This part of the sensitivity scan uses regular expression matching and dictionary lookups. It produces high, medium, or low confidence detections.

When this part of the sensitivity scan determines that a column contains sensitivity data, it:

Marks the column as sensitive
Assigns the sensitivity type to the column
Recommends the generator configuration for the identified sensitivity type. Note that if the recommended generator is not compatible with the column, then Structural discards the recommendation.
Marks the sensitivity detection as high, medium, or low confidence. The confidence level is based on a calculation of how well the column matched the applicable rules.

Custom sensitivity rules - Full confidence

The sensitivity scan also looks for any columns that match custom sensitivity types that you define in your custom sensitivity rules.

Custom sensitivity rules are based on the column data type and column name. For more information about custom sensitivity rules, go to .

Custom sensitivity rules always produce full confidence detections.

When a column matches a custom sensitivity rule, Structural:

Marks the column as sensitive.
Assigns the sensitivity rule name as the sensitivity type.
Recommends the generator preset from the sensitivity rule.
Marks the sensitivity detection as full confidence.

Model-based analysis - Medium and low confidence

This part of the scan produces medium or low confidence detections for built-in entity types.

Marks the column as sensitive.
Assigns the sensitivity type to the column.
Recommends the generator configuration for that sensitivity type.
Uses AI to compare the table name and column name combination to the sensitivity type, and produces a semantic similarity score.
Based on the semantic similarity score, marks the sensitivity detection as either medium or low confidence.

Downloading the sensitivity scan log

To download the log of the most recent sensitivity scan, either:

On the workspace management view, from the download menu, select Download Sensitivity Scan Log.
On Privacy Hub, click Reports and Logs, then select Scan Log.

The log tracks the progress of the scan.