Tonic Textual workflow

At a high level, to use Tonic Textual to create synthesized or redacted data:

  1. By default, Textual identifies sensitive values based on its built-in models. If needed, you can create custom models to identify sensitive values that are not covered by the built-in models. The custom models option requires an OpenAI key.

  2. Create a Textual dataset. A dataset is a set of files to redact.

  3. If you have custom models, then you can enable the custom models to use on the dataset.

  4. Add files to the dataset. Textual supports almost any free-text file, PDF files, and .docx files. For images, Textual supports PNG, JPG (both .jpg and .jpeg), and TIF (both .tif and .tiff) files. As you add the files, Textual uses its built-in models and your enabled custom models to identify sensitive values in the files.

  5. Configure how to handle each type of value. By default, Textual redacts the values, which means to replace the values with a placeholder that identifies the type of sensitive value. For example, PERSON, LOCATION. For PDF files and image files, redaction means to cover the value with a black box. For a given data type, you can instead choose to synthesize the values, which means to replace the original value with a realistic replacement. You can also choose to ignore the values, and not replace them. Optionally, you can create a list of values to exclude from a specific type, if some of the detected values are incorrect.

  6. Also optionally, you can add manual overrides to a PDF file. When you add a manual override, you draw a box to identify the affected portion of the file.

    You can use manual overrides either to ignore the automatically detected redactions in the selected area, or to redact the selected area. To make it easier to process multiple files that have a similar format, such as a form, you can create templates that you can apply to PDF files in the dataset.

Last updated