Configuring handling of file components
Last updated
Was this helpful?
Last updated
Was this helpful?
The Dataset Settings panel includes options for how Textual handles the following file components:
For .docx files, images and comments
For PDF files, scanned-in signatures
To display the Dataset Settings panel, on the dataset details page, click Settings.
These options are not available for pipelines that also redact files.
For .docx images, including .svg files, you can configure the dataset to either:
Redact the image content. When you select this option, Textual looks for and blocks out sensitive values in the image.
Ignore the image.
Replace the images with black boxes.
On the Dataset Settings panel, under Image settings for DOCX files:
To redact the image content, click Redact contents of images using OCR. This is the default selection.
To ignore the images entirely, click Ignore images during scan.
To replace the images with black boxes, click Replace images from the output file with black boxes.
For .docx tables, you can configure the dataset to either:
Redact the table content. When you select this option, Textual detects sensitive values and replaces them based on the entity type configuration.
Block out all of the table cells. When you select this option, Textual places a black box over each table cell.
On the Dataset Settings panel, under Table settings for DOCX files:
To redact the table content, click Redact content using the entity type configuration. This is the default selection.
To block out the table content, click Block out all table cell content.
For comments in a .docx file, you can configure the dataset to either:
Remove the comments from the file.
Ignore the comments and leave them in the file.
On the Dataset Settings panel, to remove the comments, toggle Remove comments from the output file to the on position. This is the default configuration.
To ignore the comments, toggle Remove comments from the output file to the off position.
By default, Textual redacts scanned-in signatures in PDF files. You can configure the dataset to instead ignore the signatures.
On the Dataset Settings panel:
To redact PDF signatures, toggle Detect and redact signatures in PDFs to the on position. This is the default configuration.
To ignore PDF signatures, toggle Detect and redact signatures in PDFs to the off position.