Identifying sensitive data

Tonic Structural uses sensitivity scans to identify source data columns that contain sensitive information. You can also manually mark a column as sensitive.

Structural sensitivity scans

Structural runs sensitivity scans automatically. You can also run a manual sensitivity scan.

When does Structural run automatic sensitivity scans?

Structural automatically runs a sensitivity scan when you create a completely new workspace and connect a data source.

Structural also runs a new sensitivity scan when you change the data connection details for the source database.

For a file connector workspace, Structural runs a sensitivity scan when you add a file group.

A child workspace always inherits the sensitivity designations from its parent workspace.

When you copy a workspace, Structural runs a new sensitivity scan on the copy to identify sensitive columns. However, it keeps the sensitivity designation for columns that you specifically marked as sensitive or not sensitive.

Running a sensitivity scan manually

In addition to the automatic scans, from Privacy Hub, you can start a sensitivity scan manually.

How Structural identifies sensitive values

To identify that a column contains sensitive information, Structural looks at the data type, column name, and column values. To help identify sensitive column values, the scan uses regex matching and dictionary lookups.

This process cannot guarantee perfect precision and recall. We strongly recommend that a human reviews the sensitivity scan results and the broader dataset to ensure that nothing sensitive was missed.

Types of sensitive data that the sensitivity scan identifies

Structural identifies the following types of sensitive values. These include some information types that are considered by many privacy standards and frameworks such as HIPAA, GDPR, CCPA, and PCI.

Names

  • First

  • Last

  • Full

Organization

Location

  • Street address

  • ZIP

  • PO Box

  • City

  • State and two letter abbreviation

  • Country

  • Postal code

Contact information

  • Email address

  • Phone number

Password

Financial information

  • Credit card number

  • International bank account number (IBAN)

  • SWIFT code for bank transfers

BTC (Bitcoin) address

Identification

  • Social Security Number

  • Birth dates

  • Gender

Network location

  • IP address

  • IPv6 address

  • MAC address

International Mobile Equipment Identity (IMEI)

Vehicle identification number (VIN)

ICD-9 and ICD-10 codes (Used to identify diseases)

Downloading the sensitivity scan log

To download the log of the most recent sensitivity scan:

  • On the workspace management view, from the download menu, select Download Sensitivity Scan Log.

  • On Privacy Hub, click Download, then select Scan Log.

The log tracks the progress of the scan.

Configuring parallel processing for sensitivity scans

For improved performance, sensitivity scans can use parallel processing.

For relational databases such as PostgreSQL and SQL Server, to configure parallel processing, you use the environment setting TONIC_PII_SCAN_PARALLELISM_RDBMS. The default value is 4.

For document-based databases such as MongoDB, you use the environment setting TONIC_PII_SCAN_PARALLELISM_DOCUMENTDB. The default value is 1.

For information about how to configure environment settings, go to Configuring environment settings.

For each type of detected sensitive data, Structural suggests a recommended generator. For example, for a Social Security Number, Tonic recommends the SSN generator. For a first name, Structural recommends the Name generator configured with First as the value type.

From Privacy Hub, you can review and apply the recommended generators to columns that the sensitivity scan detected.

For more information, go to Reviewing and applying recommended generators.

Manually indicating whether a column is sensitive

The sensitivity scan provides an initial assessment of which column values are sensitive.

You can also indicate manually that a column is sensitive or not sensitive.

Privacy Hub, Database View, and Table View all provide options to indicate whether a column is sensitive or not sensitive.

The Structural API also provides endpoints to designate columns as sensitive or not sensitive.

Last updated