# Regex Mask

This is a [composite generator](https://docs.tonic.ai/app/generation/generators/generator-types/generators-composite).

Uses regular expressions to parse strings and replace specified substrings with the output of specified generators. The parts of the string to replace are specified inside unnamed top-level capture groups.

Defining multiple expressions allows you to attach completely different sets of sub-generators to to a given cell, depending on the cell's value.

## How regular expressions are applied <a href="#regex-mask-how-applied" id="regex-mask-how-applied"></a>

If multiple regular expressions match a given string, the regular expressions and their associated generators are applied in the order that they are specified. Structural applies the selected sub-generators for the first matching expression.

With the **Replace all matches** option, the Regex Mask generator behaves similarly to a traditional regular expression parser. It matches all occurrences of a pattern before the next pattern is encountered. For example, the pattern `^(a)$` applied to the string `aaab` matches every occurrence of the letter `a`, instead of only the first one.

## Regular expression compatibility <a href="#regex-mask-expression-support" id="regex-mask-expression-support"></a>

Note that for Spark-based data connectors, depending on your environment, there might be slight differences in the regular expression support.

To ensure consistent results across all data connectors, use regular expression patterns that are compatible with both Java and C#.

For more information about regular expressions in C#, go to [this reference](http://regexstorm.net/reference). For more information about regular expressions in Java, go to [this reference](https://docs.oracle.com/javase/8/docs/api/java/util/regex/Pattern.html).

## **Example expressions** <a href="#regex-mask-example-expressions" id="regex-mask-example-expressions"></a>

In a cell that contains the string `ProductId:123-BuyerId:234`, to mask the substrings `123` and `234`, specify the regular expression:

`^ProductId:([0-9]{3})-BuyerId:([0-9]{3})$`

This captures the two occurrences of three-digit numbers in the pattern `ProductId:xxx-BuyerId:xxx`. This makes it possible to define a sub-generator on neither, either, or both of these captured substrings.

The following regular expression defines a broader capture that matches more cell values:

`^(\w+).(\d+).(\w+).(\d+)$`

This captures pairs of words (`(\w+)`) and numbers (`(\d+)`) if there is a single character of any value between them, instead of the relatively more specific pattern of the first expression.

## Characteristics <a href="#regex-mask-characteristics" id="regex-mask-characteristics"></a>

<table data-header-hidden><thead><tr><th valign="top"></th><th valign="top"></th></tr></thead><tbody><tr><td valign="top"><strong>Consistency</strong></td><td valign="top">Determined by the selected sub-generators.</td></tr><tr><td valign="top"><strong>Linking</strong></td><td valign="top">Determined by the selected sub-generators.</td></tr><tr><td valign="top"><strong>Differential privacy</strong></td><td valign="top">Determined by the selected sub-generators.</td></tr><tr><td valign="top"><strong>Data-free</strong></td><td valign="top">Determined by the selected sub-generators.</td></tr><tr><td valign="top"><strong>Allowed for primary keys</strong></td><td valign="top"><p>Yes, but:</p><ul><li>Make sure that the configuration preserves uniqueness.</li><li>Do not use on primary key columns that are used for subsetting.</li></ul></td></tr><tr><td valign="top"><strong>Allowed for unique columns</strong></td><td valign="top">Yes</td></tr><tr><td valign="top"><strong>Uses format-preserving encryption (FPE)</strong></td><td valign="top">No</td></tr><tr><td valign="top"><strong>Privacy ranking</strong></td><td valign="top">5</td></tr><tr><td valign="top"><strong>Generator ID (for the API)</strong></td><td valign="top"><a href="../../../api/quick-start-guide/tonic-api-generator-assignment/generator-api-reference/generator-api-ref-regex-mask"><code>RegexMaskGenerator</code></a></td></tr></tbody></table>

## How to configure <a href="#regex-mask-configure" id="regex-mask-configure"></a>

### Adding a regular expression <a href="#regex-mask-add-regex" id="regex-mask-add-regex"></a>

To add a regular expression:

1. Click **Add Regex**.\
   \
   On the configuration panel, **Cell Value** shows a sample value from the source database. You can use the previous and next options to navigate through the values.
2. By default, **Replace all matches** is enabled.\
   \
   To only match the first occurrence of a pattern, toggle **Replace all matches** to the off position.
3. In the **Pattern** field, enter a regular expression.\
   \
   If the expression is valid, then Structural displays the capture groups for the expression.
4. For each capture group, to select and configure the generator to apply, click the selected generator.\
   \
   You cannot select another composite generator.\
   \
   For a primary key column, the selected generator should [support primary keys](https://docs.tonic.ai/app/generation/generators/generator-types/primary-key-generators).
5. To save the configuration and immediately add a generator for another path expression, click **Save and Add Another**.\
   \
   To save the configuration and close the add generator panel, click **Save**.

### Managing the regular expressions <a href="#regex-mask-manage-regex" id="regex-mask-manage-regex"></a>

From the **Regexes** list:

* To edit a regular expression, click the edit icon.
* To remove a regular expression, click the delete icon.
