Search
K
Links

Identifying sensitive data

Tonic uses sensitivity scans to identify source data columns that contain sensitive information. You can also manually mark a column as sensitive.

Tonic sensitivity scans

Tonic runs sensitivity scans automatically. You can also run a manual sensitivity scan.

When does Tonic run automatic sensitivity scans?

Tonic automatically runs a sensitivity scan when you create a completely new workspace and connect a data source.
A child workspace always inherits the sensitivity designations from its parent workspace.
When you copy a workspace, Tonic runs a new sensitivity scan on the copy to identify sensitive columns. However, it keeps the sensitivity designation for columns that you specifically marked as sensitive or not sensitive.
Tonic also runs a new sensitivity scan when you change the data connection details for the source database.

Running a sensitivity scan manually

In addition to the automatic scans, from Privacy Hub, you can start a sensitivity scan manually.

How Tonic identifies sensitive values

To identify that a column contains sensitive information, Tonic looks at the data type, column name, and column values. To help identify sensitive column values, the scan uses regex matching and dictionary lookups.
This process cannot guarantee perfect precision and recall. We strongly recommend that a human reviews the sensitivity scan results and the broader dataset to ensure that nothing sensitive was missed.

Types of sensitive data that the sensitivity scan identifies

Tonic identifies the following types of sensitive values. These include some information types that are considered by many privacy standards and frameworks such as HIPAA, GDPR, CCPA, and PCI.
Names
  • First
  • Last
  • Full
Location
  • Street address
  • ZIP
  • PO Box
  • City
  • State and two letter abbreviation
  • Country
  • Postal code
Contact information
  • Email address
  • Phone number
Password
Financial information
  • Credit card number
  • International bank account number (IBAN)
  • SWIFT code for bank transfers
BTC (Bitcoin) address
Identification
  • Social Security Number
  • Birth dates
  • Gender
Network location
  • IP address
  • IPv6 address
  • MAC address
International Mobile Equipment Identity (IMEI)
Vehicle identification number (VIN)
ICD-9 and ICD-10 codes (Used to identify diseases)

Downloading the sensitivity scan log

To download the log of the most recent sensitivity scan:
  • On the workspace management view, from the download menu, select Download Sensitivity Scan Log.
  • On Privacy Hub, click Download Scan Log.
The log tracks the progress of the scan.

Configuring parallel processing for sensitivity scans

For improved performance, sensitivity scans can use parallel processing.
For relational databases such as PostgreSQL and SQL Server, to configure parallel processing, you use the environment variable TONIC_PII_SCAN_PARALLELISM_RDBMS. The default value is 4.
For document-based databases such as MongoDB, you use the environment variable TONIC_PII_SCAN_PARALLELISM_DOCUMENTDB. The default value is 1.
For information about how to configure environment variables, see Setting environment variables.

Manually indicating whether a column is sensitive

The sensitivity scan provides an initial assessment of which column values are sensitive.
You can also indicate manually that a column is sensitive or not sensitive.
Privacy Hub, Database View, and Table View all provide options to indicate whether a column is sensitive or not sensitive.