Regex Mask (RegexMaskGenerator)

The Regex Mask generator uses regular expressions to parse strings and replace specified substrings with the output of specified generators. The parts of the string to replace are specified inside unnamed top-level capture groups.

The Regex Mask generator does not in itself support linking or consistency or allow you to configure differential privacy.

Each link object identifies a regular expression and the generators to apply to the resulting capture groups.

The metadata object for each link object is populated from the RegexMaskMetadata object, which includes:

  • Whether to replace all matches or only the first match.

  • The regular expression used to identify the capture groups to replace.

  • The list of generator types to apply to each capture group. The first sub-generator is applied to the first capture group, the second sub-generator to the second group, and so on.

  • In the captureGroupMetadata object, the configuration for each generator.

{
  "schema": "string",
  "table": "string",
  "column": "string",
  "metadata": {
    "generatorId": "RegexMaskGenerator",
    "presetId": "RegexMaskGenerator",
    "replaceAllMatches": boolean,
    "pattern": "string",
    "captureGroupMetadata": [
      {
              //Metadata for a capture group generator
      }
    ]
  }
}

Example replacement

The following example replacement for the Regex Mask generator provides two expressions.

For the first expression, the generator is configured to only replace the first match. There are two capture groups. For the first capture group, the Address generator applies a country value, and consistency is disabled. For the second capture group, the Passthrough generator is applied.

For the second expression, the generator is configured to replace all of the matching values. The Business Name generator is applied with consistency disabled.

{
  "name": "summary",
  "schema": "public",
  "table": "projects",
  "links": [
    {
      "schema": "public",
      "table": "projects",
      "column": "summary",
      "metadata": {
        "generatorId": "RegexMaskGenerator",
        "presetId": "RegexMaskGenerator",
        "replaceAllMatches": false,
        "pattern": "^(\\d{3})(.*)$",
        "captureGroupMetadata": [
          {
            "generatorId": "AddressGenerator",
            "addressType": "Country",
            "isConsistent": false
          },
          { 
            "generatorId": "PassthroughGenerator"
          }
        ]
      }
    },
    {
      "schema": "public",
      "table": "projects",
      "column": "summary",
      "metadata": {
        "generatorId": "RegexMaskGenerator",
        "presetId": "RegexMaskGenerator",
        "replaceAllMatches": true,
        "pattern": "-([a-z]*)",
        "captureGroupMetadata": [
          {
            "generatorId": "BusinessNameGenerator",
            "isConsistent": false
          }
        ]
      }
    }
  ]
}

Last updated