Search
⌃K
Links

Using Privacy Hub to identify and protect sensitive data

About Privacy Hub

Before you use Tonic to generate data, you must protect any sensitive information in the original data.
Privacy Hub shows the results of the most recent sensitivity scan, and tracks the current protection status of sensitive columns.
Privacy Hub
To display Privacy Hub, either:
  • On the workspace management view, in the workspace navigation bar, click Privacy Hub.
  • On Workspaces view, click the workspace name.
From Privacy Hub, you can:
  • View and download the results of the most recent sensitivity scan
  • View the current protection status for sensitive columns
  • Configure protection for sensitive columns
  • Run a new sensitivity scan
You can also track the history of changes to column sensitivity and the assigned column generators. See Tracking changes to the data generation configuration.

About Tonic sensitivity scans

When does Tonic run automatic sensitivity scans?

When you create a new workspace and connect a data source, Tonic automatically runs a sensitivity scan to identify columns that contain sensitive information. You can also manually mark a column as sensitive.
A child workspace always inherits the sensitivity designations from its parent workspace.
When you copy a workspace, Tonic runs a new sensitivity scan on the copy to identify sensitive columns. However, it keeps the sensitivity designation for columns that you specifically marked as sensitive or not sensitive.
Tonic also runs a new sensitivity scan when you change the data connection details for the source database.
In addition to the automatic scans, you can start a sensitivity scan manually.

Configuring parallel processing for sensitivity scans

For improved performance, sensitivity scans can use parallel processing.
For relational databases such as PostgreSQL and SQL Server, to configure parallel processing, you use the environment variable TONIC_PII_SCAN_PARALLELISM_RDBMS. The default value is 4.
For document-based databases such as MongoDB, you use the environment variable TONIC_PII_SCAN_PARALLELISM_DOCUMENTDB. The default value is 1.
For information about how to configure environment variables, see Setting environment variables.

Types of sensitive data that the sensitivity scan identifies

The sensitivity scan looks for personally identifiable information, or PII. To identify PII, Tonic uses a variety of signals.
Tonic analyzes column metadata such as data type, column name, and the uniqueness of the column values.
Tonic also scans the actual data. To help identify PII in the data, it uses a combination of regex matching, dictionary lookups, and NER (named entity recognition) algorithms.
Note that this process cannot guarantee perfect precision and recall. We strongly recommend that a human reviews the sensitivity scan results and the broader dataset to ensure that nothing sensitive was missed.
Tonic identifies the following types of sensitive data. This list changes frequently. To get the current list, and to ask about new detectors, contact [email protected].
Names
  • First
  • Last
  • Full
Location
  • Street address
  • ZIP
  • PO Box
  • City
  • State and two letter abbreviation
  • Country
Contact information
  • Email address
  • Phone number
Credit card
  • Number
  • Expiration date
  • CVC (card verification code)
  • Type
BTC (Bitcoin) address
Identification
  • Social Security Number
  • National ID number
  • Birth dates
  • Gender
Network location
  • IP address
  • IPv6 address

Downloading the sensitivity scan log

To download the log of the most recent sensitivity scan:
  • On the workspace management view, from the download menu, select Download Sensitivity Scan Log.
  • On Privacy Hub, click Download Scan Log.
The log tracks the progress of the scan, and provides a list of columns that it identified as sensitive. The list includes the type of personally identifiable information (PII) that the column contains.

Viewing the current protection status

The protection status panels at the top of Privacy Hub provide an overview of the current protection status of the columns in the source data.
Protection status panels
Each panel displays:
  • The number of columns that are in that category
  • The estimated percentage of columns that are in that category
The column counts do not include columns that do not have data in the destination database. For example, if a table is truncated, then Privacy Hub ignores the columns in that table.
The information on these panels updates automatically as you change whether columns are sensitive and assign generators to columns.

At-Risk Columns

The At-Risk Columns panel reflects columns that:
  • Are populated in the destination database.
  • Are marked as sensitive.
  • Have the generator set to Passthrough, which indicates that Tonic does not perform any transformation on the data.
The goal is to have 0 at-risk columns.
Click Open in Database View to navigate to Database View. The column list is filtered to show columns that are at risk.

Protected Columns

The Protected Columns panel reflects columns that:
  • Are populated in the destination database.
  • Are assigned a generator other than Passthrough.
It includes both sensitive and non-sensitive columns.
Note that a column is considered protected based solely on the assigned generator. Some more complex generators, such as JSON Mask or Conditional, allow you to apply different generators to specific portions of a value or based on a specific condition. However, the protection status does not reflect these sub-generators. An applied sub-generator could be Passthrough.
Click Open in Database View to navigate to Database View. The column list is filtered to show all included columns that are protected.

Not Sensitive Columns

The Not Sensitive Columns panel reflects columns that:
  • Are populated in the destination database.
  • Are marked as not sensitive.
  • Have the generator set to Passthrough.
Click Open in Database View to navigate to Database View. The column list is filtered to show included columns that are not sensitive and are not protected.

Viewing the protection status for each table

The Database Tables list shows the protection status for each table in the source database. You can see the number of columns that have each protection status, and update the column configuration.
The list does not include tables where the table mode is Truncated or Preserve Destination. Truncated tables are not populated in the destination database. For Preserve Destination tables, the existing data in the destination database does not change.

Information in the list

For each table, Database Tables provides the following information:
  • Name - The table name.
  • Not Sensitive - The number of not sensitive columns in the table. Not sensitive columns are not marked as sensitive and have Passthrough as the generator. Click the value to navigate to Database View, filtered to display the not sensitive columns for the table.
  • Protected - The number of protected columns in the table. Protected columns have an assigned generator. A protected column can be either sensitive or not sensitive. Click the value to navigate to Database View, filtered to display the protected columns for the table.
  • At-Risk - The number of at-risk columns in the table. These columns are marked as sensitive, but have Passthrough as the generator. The goal is to have 0 unprotected sensitive columns. Click the value to navigate to Database View, filtered to display the at-risk columns for the table.
  • Privacy Status - Indicates the current protection status of the columns in the table. It provides the same view and configuration options as the protection status panels at the top of Privacy Hub.

Filtering the list

You can filter the Database Tables list either by the table name or by the schema.
To filter the list by table name, in the filter field, begin typing text in the table name. As you type, Tonic updates the list to only display matching tables.
To filter the list to only include tables that belong to a specific schema:
  1. 1.
    Click Filter by Schema.
  2. 2.
    From the schema dropdown list, select the schema.
When you select a schema, Tonic adds it to the filter field.

Sorting the list

You can sort the Database Tables list by any column except for the Privacy Status column.
To sort by a column, click the column heading. To reverse the sort order, click the heading again.

Managing columns from the table list

The Privacy Status column in the Database Tables list indicates the protection status of the columns in the table.
This column provides the same options to view and configure columns as the protection status panels at the top of Privacy Hub, but is limited to the columns in a specific table.

Viewing and configuring columns

Navigating through columns and viewing column details

Each protection status panel displays a series of boxes to represent the columns that apply to that status. For example, if the source data contains four columns that are at-risk, then the At-Risk Columns panel displays four boxes, one for each column.
The Privacy Status column in the Database Tables list displays the same set of boxes for the columns in an individual table.
If the number of columns is too large to fit, then the last box shows the number of additional columns that apply. For example, if there are 15 columns that don't fit, then the last box is labeled +15.
When you hover over a box, the column name displays in a tooltip.
When you click a box, the details panel for that column displays.
Settings view of column details panel
When you click the box for remaining columns, the details panel for the first column in the remaining columns displays.
You can use the next and previous icons at the bottom right of the details panel to display the details for the next or previous column.
The column details panel opens to the settings view. The settings view contains the following information:
  • The table and column name.
  • Whether the column is flagged as sensitive.
  • The type of PII that the column contains.
  • The data type for the column data.
  • The generator that is assigned to the column.
  • For a child workspace, whether the column configuration is inherited from the parent workspace. For columns that have overrides, you can reset to the parent configuration.

Indicating whether a column is sensitive

From the settings view of the column details, you can configure the column sensitivity.
You cannot change the sensitivity of columns in a child workspace. A child workspace always inherits the sensitivity from its parent workspace. See About workspace inheritance.
As you change the column sensitivity, Tonic updates the protection status panels.
To change whether the column is sensitive, toggle the Sensitive option. The column is moved if needed to reflect its new status. However, you remain on the current panel.
For example, from the At-Risk Columns panel, you change a column to be not sensitive. The column is moved to the Not Sensitive Columns panel. When you click the next or previous icons, you see the details for the next or previous column on the At-Risk Columns panel.

Selecting and configuring a generator for the column

From the column details, you can assign and configure the column generator.
When you change the column generator, Tonic updates the protection status panels.
If the column generator was previously Passthrough, then the column is moved to the Protected Columns panel. However, you remain on the current panel. For example, you assign a generator to a column that is on the At-Risk Columns panel. The column is moved to the Protected Columns panel, but when you click the next or previous icons, you see the details for the next or previous column on the At-Risk Columns panel.

Selecting the generator

For sensitive columns that are not protected, Tonic displays the recommended generator as a button.
For self-hosted instances that have an Enterprise license, the recommended generator is the built-in generator preset.
To assign the recommended generator to the column, click the button.
Otherwise, select the generator from the Generator Type dropdown list.
For more information about selecting a generator, see Assigning and configuring generators.

Configuring the generator

If the selected generator requires additional configuration, then below the Generator Type dropdown list is an Edit Generator Options link.
Column details panel with generator selected
To display the configuration fields for the generator, click Generator Options.
Configuration options for a selected generator
For information about configuring a selected generator or generator preset, see Assigning and configuring generators.
After you configure the generator, to return to the settings view, click Back.

Displaying sample data for a column

From the column details, you can display sample data for the column. The sample data allows you to compare the source and destination versions of the column values.
To display the sample data, click the view sample (magnifying glass) icon.
On the sample data view of the column details:
  • The Original Data tab shows the values in the source data.
  • The Protected Output tab shows the values that the generator produced.
Sample data view on the column details panel

Commenting on a column

The commenting feature requires a Professional or Enterprise license.
From the column details, you can view and add comments on the column. You might use a comment to explain why you selected a particular generator or marked a column as sensitive or not sensitive.
From the column details, to display the comments for the column, click the comment icon.
The comments view displays any existing comments on the column. The most recent comment is at the bottom of the list. Each comment includes the name of the user who made the comment.
To add the first comment to a column, type the comment into the comment text area, then click Comment.
To add an additional comment, type the comment into the comment text area, then click Reply.
Comment view of the column details panel

Downloading a preview Privacy Report

The Privacy Report requires an Enterprise license.
The Privacy Report that you download from Privacy Hub or the workspace download menu provides an overview of the current protection status based on the current configuration.
This is different from the Privacy Report that you download from the data generation job details, which shows the protection status after the data generation.
The Privacy Report is a .csv file that provides details about the table columns, the column content, and the current protection configuration.
To download the Privacy Report:
  • On the workspace management view, from the download menu, select Download Privacy Report.
  • From Privacy Hub, click Download Privacy Report.
For more information about the Privacy Report and its contents, see Using the Privacy Report to verify data protection.

Running a new sensitivity scan on the data

Privacy Hub provides an option to manually start a new sensitivity scan. You might want to run a new privacy scan in the following cases:
  • You added new columns to the source database. The new scan identifies whether the new columns contain sensitive data.
  • The data in a column has changed significantly, and a column that Tonic originally marked as not sensitive might now contain sensitive data.
You cannot run a sensitivity scan on a child workspace. Child workspaces always inherit the sensitivity results from their parent workspace.
To run a new sensitivity scan, click Run Sensitivity Scan.
Buttons at the top of Privacy Hub
When Tonic runs a new sensitivity scan:
  • Tonic analyzes and determines the sensitivity of any new columns.
  • It does not change the sensitivity of existing columns that you marked as sensitive or not sensitive.
  • For existing columns that you did not change the sensitivity of:
    • Tonic does not change the sensitivity of existing columns that the original scan marked as sensitive.
    • It can change the sensitivity of existing columns that the original scan marked as not sensitive.
The protection status panels are updated to reflect the results of the new scan.