Using Privacy Hub to identify and protect sensitive data
Before you use Tonic to generate data, you must protect any sensitive information in the original data.
When you create a new workspace and connect a data source, Tonic automatically runs a sensitivity scan to identify columns that contain sensitive information. You can also manually mark a column as sensitive.
Privacy Hub shows the results of that initial scan, and tracks the current protection status of sensitive columns.
From Privacy Hub, you can:
- View the results of the initial sensitivity scan
- View the current protection status for sensitive columns
- Configure protection for sensitive columns
- Run a new sensitivity scan
The sensitivity scan looks for personally identifiable information, or PII. To identify PII, Tonic uses a variety of signals.
Tonic analyzes column metadata such as data type, column name, and the uniqueness of the column values.
Tonic also scans the actual data. To help identify PII in the data, it uses a combination of regex matching, dictionary lookups, and NER (named entity recognition) algorithms.
Note that this process cannot guarantee perfect precision and recall. We strongly recommend that a human reviews the sensitivity scan results and the broader dataset to ensure that nothing sensitive was missed.
- Street address
- PO Box
- State and two letter abbreviation
- Email address
- Phone number
- Expiration date
- CVC (card verification code)
BTC (Bitcoin) address
- Social Security Number
- National ID number
- Birth dates
- IP address
- IPv6 address
The protection status panels at the top of Privacy Hub provide an overview of the current protection status of the columns in the source data.
Protection status panels
Each panel displays:
- The number of columns that are in that category
- The estimated percentage of columns that are in that category
The column counts do not include columns that do not have data in the destination database. For example, if a table is truncated, then Privacy Hub ignores the columns in that table.
The information on these panels updates automatically as you change whether columns are sensitive and assign generators to columns.
The At-Risk Columns panel reflects columns that:
- Are populated in the destination database.
- Are marked as sensitive.
- Have the generator set to Passthrough, which indicates that Tonic does not perform any transformation on the data.
The goal is to have 0 at-risk columns.
The Protected Columns panel reflects columns that:
- Are populated in the destination database.
- Are assigned a generator other than Passthrough.
It includes both sensitive and non-sensitive columns.
Note that a column is considered protected based solely on the assigned generator. Some more complex generators, such as JSON Mask or Conditional, allow you to apply different generators to specific portions of a value or based on a specific condition. However, the protection status does not reflect these sub-generators. An applied sub-generator could be Passthrough.
The Not Sensitive Columns panel reflects columns that:
- Are populated in the destination database.
- Are marked as not sensitive.
- Have the generator set to Passthrough.
The Database Tables list shows the protection status for each table in the source database. You can see the number of columns that have each protection status, and update the column configuration.
The list does not include tables where the table mode is Truncated or Preserve Destination. Truncated tables are not populated in the destination database. For Preserve Destination tables, the existing data in the destination database does not change.
For each table, Database Tables provides the following information:
- Name - The table name.
- Not Sensitive - The number of not sensitive columns in the table. Not sensitive columns are not marked as sensitive and have Passthrough as the generator. Click the value to navigate to Database View, filtered to display the not sensitive columns for the table.
- Protected - The number of protected columns in the table. Protected columns have an assigned generator. A protected column can be either sensitive or not sensitive. Click the value to navigate to Database View, filtered to display the protected columns for the table.
- At-Risk - The number of at-risk columns in the table. These columns are marked as sensitive, but have Passthrough as the generator. The goal is to have 0 unprotected sensitive columns. Click the value to navigate to Database View, filtered to display the at-risk columns for the table.
- Privacy Status - Indicates the current protection status of the columns in the table. It provides the same view and configuration options as the protection status panels at the top of Privacy Hub.
You can filter the Database Tables list either by the table name or by the schema.
To filter the list by table name, in the filter field, begin typing text in the table name. As you type, Tonic updates the list to only display matching tables.
To filter the list to only include tables that belong to a specific schema:
- 1.Click Filter by Schema.
- 2.From the schema dropdown list, select the schema.
When you select a schema, Tonic adds it to the filter field.
You can sort the Database Tables list by any column except for the Privacy Status column.
To sort by a column, click the column heading. To reverse the sort order, click the heading again.
The Privacy Status column in the Database Tables list indicates the protection status of the columns in the table.
Each protection status panel displays a series of boxes to represent the columns that apply to that status. For example, if the source data contains four columns that are at-risk, then the At-Risk Columns panel displays four boxes, one for each column.
The Privacy Status column in the Database Tables list displays the same set of boxes for the columns in an individual table.
If the number of columns is too large to fit, then the last box shows the number of additional columns that apply. For example, if there are 15 columns that don't fit, then the last box is labeled +15.
When you hover over a box, the column name displays in a tooltip.
When you click a box, the details panel for that column displays.
Settings view of column details panel
When you click the box for remaining columns, the details panel for the first column in the remaining columns displays.
You can use the next and previous icons at the bottom right of the details panel to display the details for the next or previous column.
The column details panel opens to the settings view. The settings view contains the following information:
- The table and column name.
- Whether the column is flagged as sensitive.
- The type of PII that the column contains.
- The data type for the column data.
- The generator that is assigned to the column.
- For a child workspace, whether the column configuration is inherited from the parent workspace. For columns that have overrides, you can reset to the parent configuration.
From the settings view of the column details, you can configure the column sensitivity.
As you change the column sensitivity, Tonic updates the protection status panels.
To change whether the column is sensitive, toggle the Sensitive option. The column is moved if needed to reflect its new status. However, you remain on the current panel.
For example, from the At-Risk Columns panel, you change a column to be not sensitive. The column is moved to the Not Sensitive Columns panel. When you click the next or previous icons, you see the details for the next or previous column on the At-Risk Columns panel.
By default, for a child workspace, the column configuration is inherited from the parent workspace. When you choose a different generator or change the configuration of the selected generator, the inheritance stops. See About workspace inheritance.
When you change the column generator, Tonic updates the protection status panels.
If the column generator was previously Passthrough, then the column is moved to the Protected Columns panel. However, you remain on the current panel. For example, you assign a generator to a column that is on the At-Risk Columns panel. The column is moved to the Protected Columns panel, but when you click the next or previous icons, you see the details for the next or previous column on the At-Risk Columns panel.
For sensitive columns that are not protected, Tonic displays the recommended generator as a button. To assign that generator to the column, click the button.
Otherwise, select the generator from the Generator Type dropdown list.
If the selected generator requires additional configuration, then below the Generator Type dropdown list is an Edit Generator Options link.
Column details panel with generator selected
To display the configuration fields for the generator, click Generator Options.
Configuration options for a selected generator
After you configure the generator, to return to the settings view, click Back.
The column details panel indicates whether the column currently inherits the parent configuration.
For columns that override the parent, to reset the configuration and restore the inheritance, click Reset.
From the column details, you can display sample data for the column. The sample data allows you to compare the source and destination versions of the column values.
To display the sample data, click the view sample (magnifying glass) icon.
On the sample data view of the column details:
- The Original Data tab shows the values in the source data.
- The Protected Output tab shows the values that the generator produced.
Sample data view on the column details panel
The commenting feature requires an Enterprise tier license.
From the column details, you can view and add comments on the column. You might use a comment to explain why you selected a particular generator or marked a column as sensitive or not sensitive.
From the column details, to display the comments for the column, click the comment icon.
The comments view displays any existing comments on the column. The most recent comment is at the bottom of the list. Each comment includes the name of the user who made the comment.
To add the first comment to a column, type the comment into the comment text area, then click Comment.
To add an additional comment, type the comment into the comment text area, then click Reply.
Comment view of the column details panel
The Privacy Report that you download from Privacy Hub provides an overview of the current protection status based on the current configuration.
This is different from the Privacy Report that you download from the data generation job details, which shows the protection status after the data generation.
The Privacy Report is a .csv file that provides details about the table columns, the column content, and the current protection configuration.
To download the Privacy Report, click Download Privacy Report.
Privacy Hub provides an option to manually start a new sensitivity scan. You might want to run a new privacy scan in the following cases:
- You added new columns to the source database. The new scan identifies whether the new columns contain sensitive data.
- The data in a column has changed significantly, and a column that Tonic originally marked as not sensitive might now contain sensitive data.
To run a new sensitivity scan, click Run Sensitivity Scan.
Buttons at the top of Privacy Hub
When Tonic runs a new sensitivity scan:
- Tonic analyzes and determines the sensitivity of any new columns.
- It does not change the sensitivity of existing columns that you marked as sensitive or not sensitive.
- For existing columns that you did not change the sensitivity of:
- Tonic does not change the sensitivity of existing columns that the original scan marked as sensitive.
- It can change the sensitivity of existing columns that the original scan marked as not sensitive.
The protection status panels are updated to reflect the results of the new scan.