Identifying sensitive data
Tonic uses sensitivity scans to identify source data columns that contain sensitive information. You can also manually mark a column as sensitive.
Tonic runs sensitivity scans automatically. You can also run a manual sensitivity scan.
Tonic automatically runs a sensitivity scan when you create a completely new workspace and connect a data source.
A child workspace always inherits the sensitivity designations from its parent workspace.
When you copy a workspace, Tonic runs a new sensitivity scan on the copy to identify sensitive columns. However, it keeps the sensitivity designation for columns that you specifically marked as sensitive or not sensitive.
Tonic also runs a new sensitivity scan when you change the data connection details for the source database.
To identify that a column contains sensitive information, Tonic looks at the data type, column name, and column values. To help identify sensitive column values, the scan uses regex matching and dictionary lookups.
This process cannot guarantee perfect precision and recall. We strongly recommend that a human reviews the sensitivity scan results and the broader dataset to ensure that nothing sensitive was missed.
Tonic identifies the following types of sensitive values. These include some information types that are considered by many privacy standards and frameworks such as HIPAA, GDPR, CCPA, and PCI.
Names
- First
- Last
- Full
Location
- Street address
- ZIP
- PO Box
- City
- State and two letter abbreviation
- Country
- Postal code
Contact information
- Email address
- Phone number
Password
Financial information
- Credit card number
- International bank account number (IBAN)
- SWIFT code for bank transfers
BTC (Bitcoin) address
Identification
- Social Security Number
- Birth dates
- Gender
Network location
- IP address
- IPv6 address
- MAC address
International Mobile Equipment Identity (IMEI)
Vehicle identification number (VIN)
ICD-9 and ICD-10 codes (Used to identify diseases)
To download the log of the most recent sensitivity scan:
- On the workspace management view, from the download menu, select Download Sensitivity Scan Log.
- On Privacy Hub, click Download Scan Log.
The log tracks the progress of the scan.
For improved performance, sensitivity scans can use parallel processing.
For relational databases such as PostgreSQL and SQL Server, to configure parallel processing, you use the environment variable
TONIC_PII_SCAN_PARALLELISM_RDBMS
. The default value is 4.For document-based databases such as MongoDB, you use the environment variable
TONIC_PII_SCAN_PARALLELISM_DOCUMENTDB
. The default value is 1.The sensitivity scan provides an initial assessment of which column values are sensitive.
You can also indicate manually that a column is sensitive or not sensitive.
Privacy Hub, Database View, and Table View all provide options to indicate whether a column is sensitive or not sensitive.