# Differential privacy

Differential privacy is one technique that Tonic Structural uses to ensure the privacy of your data.

Differential privacy limits the effect of a single source record or user on the destination data. Someone who views the output of a process that has differential privacy cannot determine whether a particular individual's information was used to generate that output.

Data that is protected by a process with differential privacy cannot be reverse engineered, re-identified, or otherwise compromised.

## Generators that automatically have differential privacy

Any generator that does not use the underlying data at all is considered "data-free". A data-free generator always has differential privacy.

Several Structural generators are either always data-free, or are data-free if consistency is not enabled.

## Generators for which differential privacy is configurable

The configuration options for the Categorical and Continuous generators include a **Differential Privacy** toggle to enable or disable differential privacy.

### Categorical generator

The Categorical generator shuffles the values of a column while preserving the overall frequency of the values. Note that NULL is considered its own category of value.

Differential privacy (disabled by default) further protects the privacy of your data by:

First, adding noise to the frequencies of categories.

After that, if needed, removing rare categories from the possible samples.

Differential privacy is not appropriate when the data in each row is unique or nearly unique. As a general rule of thumb, categories that are represented by fewer than 15 rows are at risk of being suppressed.

Structural warns you when a column isn’t suitable for differential privacy. A column is not suitable for differential privacy if most or all categories have fewer than 15 rows.

### Continuous generator

The Continuous generator produces samples that preserve the individual column distributions and correlations between columns.

When differential privacy is enabled, noise is added to the individual distributions and the correlation matrix, using the mechanism described in [4].

## More details: Mathematical formulation

### Privacy budget

### A simple example: counting

Suppose we want to count the number of users in a database that have some sensitive property. For example, the number of users with a particular medical diagnosis.

Dwork, McSherry, Nissim and Smith introduced in [2] the Laplace Mechanism as a way to publish these counts in a secure way, by adding noise sampled from the Laplace distribution.

### Approximate differential privacy

A common relaxation, called *approximate differential privacy*, allows for flexible privacy analysis with noise drawn that is from a wider array of distributions than the Laplace distribution.

For example, the AnalyzeGauss mechanisms of [4], and differentially private gradient descent of [1], use Gaussian noise as a fundamental ingredient, which requires the following relaxation:

## References

Martin Abadi, Andy Chu, Ian Goodfellow, H. Brendan McMahan, Ilya Mironov, Kunal Talwar, and Li Zhang. 2016. Deep Learning with Differential Privacy. In Proceedings of the 2016 ACM SIGSAC Conference on Computer and Communications Security (CCS '16). Association for Computing Machinery, New York, NY, USA, 308–318. DOI:https://doi.org/10.1145/2976749.2978318

Cynthia Dwork, Frank McSherry, Kobbi Nissim and Adam Smith. 2006 Calibrating Noise to Sensitivity in Private Data Analysis. In: Halevi S., Rabin T. (eds) Theory of Cryptography. (TCC '06). Lecture Notes in Computer Science, vol 3876. Springer, Berlin, Heidelberg. DOI:https://doi.org/10.1007/11681878_14

Cynthia Dwork and Aaron Roth. 2014. The Algorithmic Foundations of Differential Privacy. Found. Trends Theor. Comput. Sci. 9, 3–4 (August 2014), 211–407. DOI:https://doi.org/10.1561/0400000042

Cynthia Dwork, Kunal Talwar, Abhradeep Thakurta, and Li Zhang. 2014. Analyze gauss: optimal bounds for privacy-preserving principal component analysis. In Proceedings of the forty-sixth annual ACM symposium on Theory of computing (STOC '14). Association for Computing Machinery, New York, NY, USA, 11–20. DOI:https://doi.org/10.1145/2591796.2591883

Last updated