Using the app

Granting access to the app

After you install the app in the Snowflake UI, only the ACCOUNTADMIN role has access to it.

You can grant access to other roles as needed.

Starting the app

To start the app, run the following command:

CALL TONIC_TEXTUAL.APP_PUBLIC.START_APP('{YOUR_COMPUTE_POOL_NAME}', '{YOUR_TEXTUAL_TELEMETRY_EGRESS_INTEGRATION_NAME}');

This initializes the application. You can then use the app to redact or parse text data.

Using the TEXTUAL_REDACT function

You use the TEXTUAL_REDACT function to detect and replace sensitive files in text.

TEXTUAL_REDACT syntax

The TEXTUAL_REDACT function takes the following arguments:

The text to redact, which is required
Optionally, a PARSE_JSON JSON object that represents the generation configuration for each entity type. The generator configuration indicates what to do with the detected value.

SELECT TONIC_TEXTUAL.APP_PUBLIC.TEXTUAL_REDACT('Text to redact',
PARSE_JSON('{"<EntityType>": "<HandlingType>"}'));

For each entry in PARSE_JSON:

<EntityType> is the type of entity for which to specify the handling. For the list of entity types, go to Entity types that Textual detects. For example, for a first name, the entity type is NAME_GIVEN.
<HandlingType> indicates what to do with the detected value. The options are:
- Redact, which replaces the value with a redacted value in the format [<EntityType>_<RandomIdentifier>]
- Synthesis, which replaces the value with a realistic replacement
- Off, which leaves the value as is

If you do not include PARSE_JSON, then all of the detected values are redacted.

Example: Redacting text

The following example sends a text string to the app:

SELECT TONIC_TEXTUAL.APP_PUBLIC.TEXTUAL_REDACT('My name is Jane Doe');

This returns the redacted text, which looks similar to the following:

My name is [NAME_GIVEN_abc789] [NAME_FAMILY_xyz123].

Because we did not specify the handling for any of the entity types, both the first name Jane and last name Doe are redacted.

Example: Customizing the redaction

In this example, when a first name (NAME_GIVEN) is detected, it is synthesized instead of redacted.

SELECT TONIC_TEXTUAL.APP_PUBLIC.TEXTUAL_REDACT('My name is Jane Doe', PARSE_JSON('{"NAME_GIVEN": "Synthesis"}'));

This returns output similar to the following. The first name Jane is replaced with a realistic value (synthesized), and the last name Doe is redacted.

My name is Shirley [NAME_FAMILY_xyz123].

Using the TEXTUAL_PARSE function

You use the TEXTUAL_PARSE function to transform files in an external or internal stage into Markdown-based content that you can use to populate LLM systems.

The output includes metadata about the file, including sensitive values that were detected.

Granting access to the stage

To be able to parse the files, Textual must have access to the stage where the files are located.

Your role must be able to grant the USAGE and READ permissions.

To grant Textual access to the stage, run the following commands:

GRANT USAGE ON DATABASE <DatabaseName> TO APPLICATION TONIC_TEXTUAL;
GRANT USAGE ON SCHEMA <DatabaseName>.<SchemaName> TO APPLICATION TONIC_TEXTUAL;
GRANT READ, USAGE ON STAGE <DatabaseName>.<SchemaName>.<StageName> TO APPLICATION TONIC_TEXTUAL;

Sending a request for a single file

To send a parse request for a single file, run the following:

SELECT TONIC_TEXTUAL.APP_PUBLIC.TEXTUAL_PARSE('<FullyQualifiedStageName>', '<FileName>', '<FileMD5Sum>');

Where:

<FullyQualifiedStageName> is the fully qualified name of the stage, in the format <DatabaseName>.<SchemaName>.<StageName>. For example, database1.schema1.stage1.
<FileName> is the name of the file.
<FileMD5Sum> is the MD5 sum version of the file content.

Sending a request for multiple files

To parse a large number of files:

List the stage files to parse. For example, you might use PATTERN to limit the files based on file type.
Run the parse request command on the list.

For example:

LIST @<StageName> PATTERN='.*(txt|xlsx|docx)';
SELECT TONIC_TEXTUAL.APP_PUBLIC.TEXTUAL_PARSE('<StageName>', "name","md5") FROM table(result_scan(last_query_id()));

About the results

The app writes the results to the TEXTUAL_RESULTS table.

For each request, the entry in TEXTUAL_RESULTS includes the request status and the request results.

The status is one of the following values:

QUEUED - The parse request was received and is waiting to be processed.
RUNNING - The parse request is currently being processed.
SKIPPED - The parse request was skipped because the file did not change since the previous time it was parsed. Whether a file is changed is determined by its MD5 checksum.
FAILURE_<FailureReason> - The parse request failed for the provided reason.

The result column is a VARIANT type that contains the parsed data. For more information about the format of the results for each document, go to Structure of the pipeline output file JSON.

Querying the results

You can query the parse results in the same way as you would any other Snowflake VARIANT column.

For example, the following command retrieves the parsed documents, which are in a converted Markdown representation.

SELECT result["Content"]["ContentAsMarkdown"] FROM TEXTUAL_RESULTS;

To retrieve the entities that were identified in the document:

SELECT result["Content"]["nerResults"] FROM TEXTUAL_RESULTS;

Because the result column is a simple variant, you can use flattening operations to perform more complex analysis. For example, you can extract all entities of a certain type or value across the documents, or find all documents that contain a specific type of entity.

Last updated 8 months ago

Was this helpful?