Last updated
Last updated
The Dataset Settings panel includes options for how Textual handles the following file components:
For .docx files, images and comments
For PDF files, scanned-in signatures
To display the Dataset Settings panel, on the dataset details page, click Settings.
These options are not available for pipelines that also redact files.
For .docx images, you can configure the dataset to either:
Redact the image content. When you select this option, Textual looks for and blocks out sensitive values in the image.
Ignore the image.
Replace the images with black boxes.
On the Dataset Settings panel, under Image settings for DOCX files:
To redact the image content, click Redact contents of images using OCR. This is the default selection.
To ignore the images entirely, click Ignore images during scan.
To replace the images with black boxes, click Replace images from the output file with black boxes.
For comments in a .docx file, you can configure the dataset to either:
Remove the comments from the file.
Ignore the comments and leave them in the file.
On the Dataset Settings panel, to remove the comments, toggle Remove comments from the output file to the on position. This is the default configuration.
To ignore the comments, toggle Remove comments from the output file to the off position.
By default, Textual redacts scanned-in signatures in PDF files. You can configure the dataset to instead ignore the signatures.
On the Dataset Settings panel:
To redact PDF signatures, toggle Detect and redact signatures in PDFs to the on position. This is the default configuration.
To ignore PDF signatures, toggle Detect and redact signatures in PDFs to the off position.