Configuring handling of file components
Last updated
Last updated
The Dataset Settings panel includes options for how Textual handles the following file components:
For .docx files, images and comments
For PDF files, scanned-in signatures
To display the Dataset Settings panel, on the dataset details page, click Settings.
These options are not available for pipelines that also redact files.
For .docx images, you can configure the dataset to either:
Redact the image content. When you select this option, Textual looks for and blocks out sensitive values in the image.
Remove the image entirely.
On the Dataset Settings panel, under Image settings for DOCX files:
To redact the image content, click Redact contents of images. This is the default selection.
To remove the images, click Remove images from the output file.
For comments in a .docx file, you can configure the dataset to either:
Remove the comments from the file.
Ignore the comments and leave them in the file.
On the Dataset Settings panel, under Comment settings for DOCX files:
To remove the comments, click Remove comments from the output file.
To ignore the comments, click Ignore comments during the scan.
By default, Textual ignores scanned-in signatures in PDF files. You can configure the dataset to instead redact the signatures.
On the Dataset Settings panel:
To redact PDF signatures, toggle Detect and redact signatures in PDFs to the on position.
To ignore PDF signatures, toggle Detect and redact signatures in PDFs to the off position.