HIPAA Address

This generator can be used to generate cities, states, and zip codes that follow HIPAA guidelines for safe harbor.

Handling of address parts

Zip codes

How the HIPAA Address generator handles zip codes is based on whether the Replace zeros in truncated Zip Code toggle in the generator configuration is off or on.

By default, the setting is off. In this case, the last two digits of the zip code in the column are replaced with zeros, unless the zip code is a low population area as designated by the current census. For a low population area, all of the digits in the zip code are replaced with zeros.

If the setting is on, then the generator selects a real zip code that starts with the same three digits as the original zip code. For a low population area, if a state is linked, then the generator selects a random zip code from within that state. Otherwise the generator selects a random zip code from the United States.

Cities

When a zip code column is not linked, a random city is chosen in the United States. When a zip code is already added to the link, a city is chosen at random that has at least some overlap with the zip code.

If the original zip code is designated as a low population area then a random city is chosen within the state, this is done only if the user has linked a State column. If they have not, a random city within the United States is chosen.

For example, if the original city and zip code were (Atlanta, 30305), the zip code would be replaced with 30300. There are many cities that contain zip codes beginning in 303 such as Atlanta, Decatur, Chamblee, Hapeville, Dunwoody, College Park, etc.). One of these cities is chosen at random so that our final value is (Chamblee, 30300), for example.

States

HIPAA guidelines allow for information at the state level to be kept. Therefore, these values are passed through.

Latitude and longitude (GPS) coordinates

GPS coordinates are randomly generated in descending order of dependence of the linked HIPAA address components:

  1. If a zip code is linked, a random point within the same 3-digit zip code prefix is generated, if the 3-digit zip code prefix is not designated a low population area. If it is a low population area, use the linked state.

  2. If a state is available and a zip code and city are not, or the zip code or city are in a 3-digit zip code prefix that is designated a low population area, then a random GPS coordinate is generated somewhere within the state.

  3. If no zip code, city, or state is linked, or one or more of them were provided, but there was a problem generating a random GPS coordinate within the linked areas, then a GPS coordinate is generated at a random location within the United States.

Note: If the city component of the HIPAA address is linked with latitude and/or longitude, the GPS coordinate components are randomly generated independently of the city.

Other address parts

All other address parts are generated randomly. The output value is not influenced at all by the underlying value in the column.

Characteristics

Consistency

Yes, can be made self-consistent.

Linking

Yes, can be linked.

Differential privacy

No

Data-free

No

Allowed for primary keys

No

Allowed for unique columns

No

Uses format-preserving encryption (FPE)

No

Privacy ranking

  • 3 if not consistent

  • 4 if consistent

Generator ID (for the API)

How to configure

To configure the generator:

  1. From the Link To dropdown list, select the other columns to link to. You can only select columns that are also assigned the HIPAA Address generator.

  2. From the address part dropdown list, select the type of address value that is in the column.

  3. Toggle the Replace zeros in truncated Zip Code setting how to generate zip codes. If the setting is off, then the last two digits are replaced with zero. For low population areas, the entire zip code is populated with zeroes. If the setting is on, then a real zip code is selected that starts with the first three digits of the original zip code. For low population areas, if a state is linked, a random zip code from the state is used. Otherwise, a random zip code from the United States is used.

  4. Toggle the Consistency setting to indicate whether to make the column self-consistent. By default, consistency is disabled.

  5. If Structural data encryption is enabled, then to use it for this column, toggle Use data encryption process to the on position.

Spark supported address parts

For the HIPAA Address generator, Spark workspaces (Amazon EMR, Databricks, and self-managed Spark clusters) only support the following address parts:

  • City

  • City with State

  • City with State Abbr

  • State

  • State Abbr

  • US Address

  • US Address with Country

  • Zip Code

The Address generator provides support for additional address parts in Spark workspaces.

Last updated