Using the app
Granting access to the app
After you install the app in the Snowflake UI, only the ACCOUNTADMIN
role has access to it.
You can grant access to other roles as needed.
Starting the app
To start the app, run the following command:
This initializes the application. You can then use the app to redact or parse text data.
Using the TEXTUAL_REDACT function
You use the TEXTUAL_REDACT
function to detect and replace sensitive files in text.
TEXTUAL_REDACT syntax
The TEXTUAL_REDACT
function takes the following arguments:
The text to redact, which is required
Optionally, a
PARSE_JSON
JSON object that represents the generation configuration for each entity type. The generator configuration indicates what to do with the detected value.
For each entry in PARSE_JSON
:
<EntityType>
is the type of entity for which to specify the handling. For the list of entity types, go to Entity types that Textual detects. For example, for a first name, the entity type isNAME_GIVEN
.<HandlingType>
indicates what to do with the detected value. The options are:Redact
, which replaces the value with a redacted value in the format[<EntityType>_<RandomIdentifier>]
Synthesis
, which replaces the value with a realistic replacementOff
, which leaves the value as is
If you do not include PARSE_JSON
, then all of the detected values are redacted.
Example: Redacting text
The following example sends a text string to the app:
This returns the redacted text, which looks similar to the following:
Because we did not specify the handling for any of the entity types, both the first name Jane and last name Doe are redacted.
Example: Customizing the redaction
In this example, when a first name (NAME_GIVEN
) is detected, it is synthesized instead of redacted.
This returns output similar to the following. The first name Jane is replaced with a realistic value (synthesized), and the last name Doe is redacted.
Using the TEXTUAL_PARSE function
You use the TEXTUAL_PARSE
function to transform files in an external or internal stage into Markdown-based content that you can use to populate LLM systems.
The output includes metadata about the file, including sensitive values that were detected.
Granting access to the stage
To be able to parse the files, Textual must have access to the stage where the files are located.
Your role must be able to grant the USAGE
and READ
permissions.
To grant Textual access to the stage, run the following commands:
Sending a request for a single file
To send a parse request for a single file, run the following:
Where:
<FullyQualifiedStageName>
is the fully qualified name of the stage, in the format<DatabaseName>.<SchemaName>.<StageName>
. For example,database1.schema1.stage1
.<FileName>
is the name of the file.<FileMD5Sum>
is the MD5 sum version of the file content.
Sending a request for multiple files
To parse a large number of files:
List the stage files to parse. For example, you might use
PATTERN
to limit the files based on file type.Run the parse request command on the list.
For example:
About the results
The app writes the results to the TEXTUAL_RESULTS
table.
For each request, the entry in TEXTUAL_RESULTS
includes the request status and the request results.
The status is one of the following values:
QUEUED
- The parse request was received and is waiting to be processed.RUNNING
- The parse request is currently being processed.SKIPPED
- The parse request was skipped because the file did not change since the previous time it was parsed. Whether a file is changed is determined by its MD5 checksum.FAILURE_<FailureReason>
- The parse request failed for the provided reason.
The result
column is a VARIANT
type that contains the parsed data. For more information about the format of the results for each document, go to Structure of the pipeline output file JSON.
Querying the results
You can query the parse results in the same way as you would any other Snowflake VARIANT
column.
For example, the following command retrieves the parsed documents, which are in a converted Markdown representation.
To retrieve the entities that were identified in the document:
Because the result
column is a simple variant, you can use flattening operations to perform more complex analysis. For example, you can extract all entities of a certain type or value across the documents, or find all documents that contain a specific type of entity.
Last updated