Using the Privacy Report to verify data protection
Data privacy in Tonic is measured by the sensitivity of the data and the level of protection applied.
Another consideration is the use case, or the purpose and audience for the data. This is external to Tonic, but influences the protective actions that you take in Tonic.
The Privacy Report captures details about the level of data protection that is in place with Tonic. It is used at the following points in the de-identification process.
- You can use a preview as a checkpoint while you configure the data protection, to review the generators that you applied or to look for at-risk data. You also can export the preview from Tonic before you run a generation, to increase your confidence or to confirm that the de-identification configuration is complete.
- Every time you run a data generation job to populate the destination database with de-identified data, Tonic creates a new Privacy Report. The Privacy Report records the protection status of the data that is associated with that run.
The Privacy Report helps to answer the following questions:
- What is the value of Tonic?
- How do I know the data is safe for use?
- How was the data protected?
The Privacy Report consists of summary statistics and field level details in a downloadable .csv file. Here is a stylized version of the report that shows the column groupings:
Example Privacy Report with the column groups labeled
You can display a preview version of the Privacy Report from Privacy Hub.
When you run a data generation job, Tonic generates a version of the Privacy Report that reflects the generation results.
The preview of the Privacy Report is a snapshot of the current generator configuration in the workspace.
When you are ready to generate, or at any point during this process, you can export a preview from Tonic.
Option to download the Privacy Report preview on Privacy Hub
You can use the exported .csv file to review the configuration. You can also share it with others to obtain approval before you run a generation. When you are comfortable with the generators that are in place, you can run the data generation.
Note that the preview is not tied to any version of output data in the destination database. It only reflects Tonic's state at a point in time.
The Privacy Report captures the privacy associated with a particular generation job. Tonic creates a snapshot for each generation. The Privacy Report for a data generation job reflects the output data in the destination database at a point in time.
The job details for each generation job provides access to the Privacy Report for the job.
The Privacy Report tab of the Generation Job Details page displays the following summary statistics for the data protection:
- At-Risk - The number of columns that are sensitive, but that have Passthrough as the assigned generator.
- Protected - The number of columns that have a generator other than Passthrough assigned. This includes both sensitive and non-sensitive columns.
- Non-Sensitive - The number of columns that are not sensitive, and that have Passthrough as the assigned generator. Also includes columns that are in truncated tables, where no data is copied to the destination database.
Job details page for a data generation job
To export the full details of the Privacy Report, click Download Privacy Report CSV.
The fields for each row in the Privacy Report fall into the following categories.
The Privacy Report includes all of the schema detail that is viewable in the Tonic application, such as Database View and Table View). The schema in the source matches the destination.
The schema information is contained in the following columns:
- Schema - Schema name from the source database.
- Table - Table name from the source database.
- Column - Column name from the source database.
- DataType - Data type that is detected in the source database.
Data sensitivity reflects attributes such as:
- Whether the data includes personally identifiable information (PII)
- Whether the data is regulated by law
- Whether the data is business confidential
It affects decisions on how to protect the data.
During the sensitivity scan, Tonic identifies suspected sensitive fields. You can also manually indicate that a column is sensitive or not sensitive.
The data sensitivity information is contained in the following columns:
- IsSensitive - Indicates whether the column is currently flagged as sensitive.
TRUEindicates that the column is currently flagged as sensitive. This includes columns that Tonic detected automatically, and fields that you flagged as sensitive.
FALSEindicates that the column is currently flagged as not sensitive. This includes columns that Tonic flagged as sensitive, but that you changed to not sensitive.
- SensitiveType - For fields that Tonic identifies as sensitive, the detected data type. For example, Tonic detects a field of type Address that might be sensitive. For manually flagged fields, SensitiveType is Manual.
The protection section of the Privacy Report provides key details about how the masking transformations protect data.
The protection information is contained in the following columns:
- ProtectionType - Indicates the level of protection provided by the assigned generator and generator configuration. The possible protection type values are:
- Masked - Applied to columns that have a generator other than Passthrough assigned. The selected generator provides some protection against seeing source data. If both IsDifferentiallyPrivate and IsDataFree are
FALSE, then ColumnPrivacyStatus is
Masked. Consistency decreases the protection level. If consistency is enabled, then ColumnPrivacyStatus is
- Anonymized - Applied to columns for which the assigned generators and the generator configuration are guaranteed against reverse engineering. The assigned generator either uses differential privacy, or is considered data-free, where the output data is completely unlinked from the source data. The assigned generator does not have consistency enabled.
- IsDifferentiallyPrivate - Indicates whether the assigned generator supports differential privacy and that differential privacy is enabled.
TRUEindicates that both of these are true.
FALSEindicates that either the assigned generator does not support differential privacy, or that differential privacy is not enabled. Differential privacy guarantees the highest level of privacy, and eliminates the ability to re-identify the data.
- IsDataFree - Indicates whether the assigned generator uses the underlying data. If the output data is completely unlinked to the source data, the generator is considered data-free, with a high degree of protection.
- IsConsistent - Indicates whether consistency is enabled for a given field. Consistency ensures that a given input always results in the same output. It retains data utility at the cost of a higher level of protection. When consistency is on, ColumnPrivacyStatus is
Anonymized. For more information, see Privacy Status.
The ColumnPrivacyStatus column provides the overall privacy status of the column. The possible privacy status values are:
- Not Included - Applied to columns that are not populated in the destination database. For example, if a table is truncated, then the table columns are NotIncluded.
- At-Risk - Applied to columns that are flagged as sensitive, but that have Passthrough as the assigned generator.
- Protected - Applied to included columns that have a generator other than Passthrough assigned.
- Not Sensitive - Applied to columns that are flagged as non-sensitive and that have Passthrough as the assigned generator. Also applied to columns in tables that are truncated.