After you install the app in the Snowflake UI, only the ACCOUNTADMIN
role has access to it.
You can grant access to other roles as needed.
To start the app, run the following command:
This initializes the application. You can then use the app to redact or parse text data.
You use the TEXTUAL_REDACT
function to detect and replace sensitive files in text.
The TEXTUAL_REDACT
function takes the following arguments:
The text to redact, which is required
Optionally, a PARSE_JSON
JSON object that represents the generation configuration for each entity type. The generator configuration indicates what to do with the detected value.
For each entry in PARSE_JSON
:
<EntityType>
is the type of entity for which to specify the handling. For the list of entity types, go to Entity types that Textual detects.
For example, for a first name, the entity type is NAME_GIVEN
.
<HandlingType>
indicates what to do with the detected value. The options are:
Redact
, which replaces the value with a redacted value in the format [<EntityType>_<RandomIdentifier>]
Synthesis
, which replaces the value with a realistic replacement
Off
, which leaves the value as is.
If you do not include PARSE_JSON
, then all of the detected values are redacted.
The following example sends a text string to the app:
This returns the redacted text, which looks similar to the following:
Because we did not specify the handling for any of the entity types, both the first name Jane and last name Doe are redacted.
In this example, when a first name (NAME_GIVEN
) is detected, it is synthesized instead of redacted.
This returns output similar to the following. The first name Jane is replaced with a realistic value (synthesized), and the last name Doe is redacted.
You use the TEXTUAL_PARSE
function to transform files in an external or internal stage into markdown-based content that you can use to populate LLM systems.
The output includes metadata about the file, including sensitive values that were detected.
To be able to parse the files, Textual must have access to the stage where the files are located.
Your role must be able to grant the USAGE
and READ
permissions.
To grant Textual access to the stage, run the following commands:
To send a parse request for a single file, run the following:
Where:
<FullyQualifiedStageName>
is the fully qualified name of the stage, in the format <DatabaseName>.<SchemaName>.<StageName>
. For example, database1.schema1.stage1
.
<FileName>
is the name of the file.
<FileMD5Sum>
is the MD5 sum version of the file content.
To parse a large number of files:
List the stage files to parse. For example, you might use PATTERN
to limit the files based on file type.
Run the parse request command on the list.
For example:
The app writes the results to the TEXTUAL_RESULTS
table.
For each request, the entry in TEXTUAL_RESULTS
includes the request status and the request results.
The status is one of the following values:
QUEUED
- The parse request was received and is waiting to be processed
RUNNING
- The parse request is currently being processed
SKIPPED
- The parse request was skipped because the file did not change since the previous time it was parsed. Whether a file is changed is indicated by its MD5 checksum.
FAILURE_<FailureReason>
- The parse request failed for the provided reason.
The result
column is a VARIANT
type that contains the parsed data. For more information about the format of the results for each document, go to #pipeline-processed-file-json.
You can query the parse results in the same way as you would any other Snowflake VARIANT
column.
For example, the following command retrieves the parsed documents, which are in a converted markdown representation.
To retrieve the entities that were identified in the document:
Because the result
column is a simple variant, you can use flattening operations to perform more complex analysis. For example, you can extract all entities of a certain type or value across the documents, or find all documents that contain a specific type of entity.