Privacy Hub

Privacy Hub automatically finds your most sensitive information and provides recommendations on how to mask it - all with only a few clicks. To guarantee process integrity, the Privacy Hub tracks all actions in an immutable log.

Automatic Privacy Scan

When you first connect Tonic to a new datasource, privacy hub scans your datasource and displays any potentially risky data. You can then choose to use the automatically suggested replacements or customize your transformation. Once you’re satisfied with your transformations you can generate a new dataset with confidence.

While the privacy scan is running you can use the rest of Tonic

Types of Sensitive Data Identified

Tonic uses a variety of signals to identify PII. For example, Tonic analyzes column metadata such as data type, column name, as well as the uniqueness of the column values. Tonic also scans the actual data and uses a combination of regex matching, dictionary lookups, as well as NER (named entity recognition) algorithms to help identify PII. Like all model-based approaches, this process is not flawless and cannot guarantee perfect precision and recall for our models. We strongly recommend a human review of the results of the privacy scan as well as the broader dataset to ensure that nothing sensitive has been missed.

  • Names

    • First

    • Last

    • Full

  • Location

    • Street address

    • ZIP

    • PO Box

    • City

    • State and two letter abbreviation

    • Country

  • Email address

  • Phone number

  • Credit card

    • Number

    • Expiration date

    • CVC

    • Type

  • BTC address

  • Social Security Number

  • National ID number

  • Birth dates

  • Gender

  • IP address

  • IPv6 address

The above list is constantly being updated. For a current list of sensitive data detected or to inquire about adding new detectors, please reach out to us at [email protected]

Privacy Scan Results

When the privacy scan finishes, you will see the results page with the following components:

  1. Table listing the columns flagged as potentially having sensitive data that have not been protected or marked as not sensitive

  2. The name (including schema and table path) of the column that has been flagged

  3. The suggested generator to replace data in this column

  4. An option to override the suggested generator with another generator

  5. Clicking on the X marks this column as not containing sensitive data and removes it from this list. Once marked as not sensitive, re-running the privacy scan will not reflag this column as sensitive.

  6. The immutable audit log of all modifications to columns marked as sensitive. This includes applying generators to these columns either in the privacy hub or in other locations and marking this column as not sensitive.

  7. Shows when the privacy scan was last run and lets you trigger another scan manually