Regex Mask

This is a composite generator.

Uses regular expressions to parse strings and replace specified substrings with the output of specified generators. The parts of the string to replace are specified inside unnamed top-level capture groups.

Defining multiple expressions allows you to attach completely different sets of sub-generators to to a given cell, depending on the cell's value.

How regular expressions are applied

If multiple regular expressions match a given string, the regular expressions and their associated generators are applied in the order that they are specified. The first expression defined that matches has the selected sub-generators applied.

With the Replace all matches option, the Regex Mask generator behaves similarly to a traditional regex parser. It matches all occurrences of a pattern before the next pattern is encountered. For example, the pattern ^(a)$ applied to the string aaab matches every occurrence of the letter a, instead of just the first.

Regular expression compatibility

Note that for Spark-based data connectors, depending on your environment, there might be slight differences in the regular expression support.

To ensure consistent results across all data connectors, use regular expression patterns that are compatible with both Java and C#.

For more information about regular expressions in C#, go to this reference. For more information about regular expressions in Java, go to this reference.

Example expressions

In a cell that contains the string ProductId:123-BuyerId:234, to mask the substrings 123 and 234, specify the regular expression:

^ProductId:([0-9]{3})-BuyerId:([0-9]{3})$

This captures the two occurrences of three-digit numbers in the pattern ProductId:xxx-BuyerId:xxx. This makes it possible to define a sub-generator on neither, either, or both of these captured substrings.

The following regular expression defines a broader capture that matches more cell values:

^(\w+).(\d+).(\w+).(\d+)$

This captures pairs of words ((\w+)) and numbers ((\d+)) if there is a single character of any value between them, instead of the relatively more specific pattern of the first expression.

Characteristics

Consistency

Determined by the selected sub-generators.

Linking

Determined by the selected sub-generators.

Differential privacy

Determined by the selected sub-generators.

Data-free

Determined by the selected sub-generators.

Allowed for primary keys

No

Allowed for unique columns

Yes

Uses format-preserving encryption (FPE)

No

Privacy ranking

5

Generator ID (for the API)

How to configure

Adding a regular expression

To add a regular expression:

  1. Click Add Regex. On the configuration panel, Cell Value shows a sample value from the source database. You can use the previous and next options to navigate through the values.

  2. By default, Replace all matches is enabled. To only match the first occurrence of a pattern, toggle Replace all matches to the off position.

  3. In the Pattern field, enter a regular expression. If the expression is valid, then Tonic displays the capture groups for the expression.

  4. For each capture group, to select and configure the generator to apply, click the selected generator. You cannot select another composite generator.

  5. To save the configuration and immediately add a generator for another path expression, click Save and Add Another. To save the configuration and close the add generator panel, click Save.

Managing the regular expressions

From the Regexes list:

  • To edit a regex, click the edit icon.

  • To remove a regex, click the delete icon.

Last updated