LogoLogo
Release notesPython SDK docsDocs homeTextual CloudTonic.ai
  • Tonic Textual guide
  • Getting started with Textual
  • Previewing Textual detection and redaction
  • Entity types that Textual detects
    • Built-in entity types
    • Managing custom entity types
  • Language support in Textual
  • Datasets - Create redacted files
    • Datasets workflow for text redaction
    • Creating and managing datasets
    • Assigning tags to datasets
    • Displaying the file manager
    • Adding and removing dataset files
    • Reviewing the sensitivity detection results
    • Configuring the redaction
      • Configuring added and excluded values for built-in entity types
      • Working with custom entity types
      • Selecting the handling option for entity types
      • Configuring synthesis options
      • Configuring handling of file components
    • Adding manual overrides to PDF files
      • Editing an individual PDF file
      • Creating templates to apply to PDF files
    • Sharing dataset access
    • Previewing the original and redacted data in a file
    • Downloading redacted data
  • Pipelines - Prepare LLM content
    • Pipelines workflow for LLM preparation
    • Viewing pipeline lists and details
    • Assigning tags to pipelines
    • Setting up pipelines
      • Creating and editing pipelines
      • Supported file types for pipelines
      • Creating custom entity types from a pipeline
      • Configuring file synthesis for a pipeline
      • Configuring an Amazon S3 pipeline
      • Configuring a Databricks pipeline
      • Configuring an Azure pipeline
      • Configuring a Sharepoint pipeline
      • Selecting files for an uploaded file pipeline
    • Starting a pipeline run
    • Sharing pipeline access
    • Viewing pipeline results
      • Viewing pipeline files, runs, and statistics
      • Displaying details for a processed file
      • Structure of the pipeline output file JSON
    • Downloading and using pipeline output
  • Textual Python SDK
    • Installing the Textual SDK
    • Creating and revoking Textual API keys
    • Obtaining JWT tokens for authentication
    • Instantiating the SDK client
    • Datasets and redaction
      • Create and manage datasets
      • Redact individual strings
      • Redact individual files
      • Transcribe and redact an audio file
      • Configure entity type handling for redaction
      • Record and review redaction requests
    • Pipelines and parsing
      • Create and manage pipelines
      • Parse individual files
  • Textual REST API
    • About the Textual REST API
    • REST API authentication
    • Redaction
      • Redact text strings
  • Datasets
    • Manage datasets
    • Manage dataset files
  • Snowflake Native App and SPCS
    • About the Snowflake Native App
    • Setting up the app
    • Using the app
    • Using Textual with Snowpark Container Services directly
  • Install and administer Textual
    • Textual architecture
    • Setting up and managing a Textual Cloud pay-as-you-go subscription
    • Deploying a self-hosted instance
      • System requirements
      • Deploying with Docker Compose
      • Deploying on Kubernetes with Helm
    • Configuring Textual
      • How to configure Textual environment variables
      • Configuring the number of textual-ml workers
      • Configuring the number of jobs to run concurrently
      • Configuring the format of Textual logs
      • Setting a custom certificate
      • Configuring endpoint URLs for calls to AWS
      • Enabling PDF and image processing
      • Setting the S3 bucket for file uploads and redactions
      • Required IAM role permissions for Amazon S3
      • Configuring model preferences
    • Viewing model specifications
    • Managing user access to Textual
      • Textual organizations
      • Creating a new account in an existing organization
      • Single sign-on (SSO)
        • Viewing the list of SSO groups in Textual
        • Azure
        • GitHub
        • Google
        • Keycloak
        • Okta
      • Managing Textual users
      • Managing permissions
        • About permissions and permission sets
        • Built-in permission sets and available permissions
        • Viewing the lists of permission sets
        • Configuring custom permission sets
        • Configuring access to global permission sets
        • Setting initial access to all global permissions
    • Textual monitoring
      • Downloading a usage report
      • Tracking user access to Textual
Powered by GitBook
On this page
  • Granting access to the app
  • Starting the app
  • Using the TEXTUAL_REDACT function
  • TEXTUAL_REDACT syntax
  • Example: Redacting text
  • Example: Customizing the redaction
  • Using the TEXTUAL_PARSE function
  • Granting access to the stage
  • Sending a request for a single file
  • Sending a request for multiple files
  • About the results
  • Querying the results

Was this helpful?

Export as PDF
  1. Snowflake Native App and SPCS

Using the app

Granting access to the app

After you install the app in the Snowflake UI, only the ACCOUNTADMIN role has access to it.

You can grant access to other roles as needed.

Starting the app

To start the app, run the following command:

CALL TONIC_TEXTUAL.APP_PUBLIC.START_APP('{YOUR_COMPUTE_POOL_NAME}', '{YOUR_TEXTUAL_TELEMETRY_EGRESS_INTEGRATION_NAME}');

This initializes the application. You can then use the app to redact or parse text data.

Using the TEXTUAL_REDACT function

You use the TEXTUAL_REDACT function to detect and replace sensitive files in text.

TEXTUAL_REDACT syntax

The TEXTUAL_REDACT function takes the following arguments:

  • The text to redact, which is required

  • Optionally, a PARSE_JSON JSON object that represents the generation configuration for each entity type. The generator configuration indicates what to do with the detected value.

SELECT TONIC_TEXTUAL.APP_PUBLIC.TEXTUAL_REDACT('Text to redact',
PARSE_JSON('{"<EntityType>": "<HandlingType>"}'));

For each entry in PARSE_JSON:

  • <EntityType> is the type of entity for which to specify the handling. For the list of entity types, go to Entity types that Textual detects. For example, for a first name, the entity type is NAME_GIVEN.

  • <HandlingType> indicates what to do with the detected value. The options are:

    • Redact, which replaces the value with a redacted value in the format [<EntityType>_<RandomIdentifier>]

    • Synthesis, which replaces the value with a realistic replacement

    • Off, which leaves the value as is

If you do not include PARSE_JSON, then all of the detected values are redacted.

Example: Redacting text

The following example sends a text string to the app:

SELECT TONIC_TEXTUAL.APP_PUBLIC.TEXTUAL_REDACT('My name is Jane Doe');

This returns the redacted text, which looks similar to the following:

My name is [NAME_GIVEN_abc789] [NAME_FAMILY_xyz123].

Because we did not specify the handling for any of the entity types, both the first name Jane and last name Doe are redacted.

Example: Customizing the redaction

In this example, when a first name (NAME_GIVEN) is detected, it is synthesized instead of redacted.

SELECT TONIC_TEXTUAL.APP_PUBLIC.TEXTUAL_REDACT('My name is Jane Doe', PARSE_JSON('{"NAME_GIVEN": "Synthesis"}'));

This returns output similar to the following. The first name Jane is replaced with a realistic value (synthesized), and the last name Doe is redacted.

My name is Shirley [NAME_FAMILY_xyz123].

Using the TEXTUAL_PARSE function

You use the TEXTUAL_PARSE function to transform files in an external or internal stage into Markdown-based content that you can use to populate LLM systems.

The output includes metadata about the file, including sensitive values that were detected.

Granting access to the stage

To be able to parse the files, Textual must have access to the stage where the files are located.

Your role must be able to grant the USAGE and READ permissions.

To grant Textual access to the stage, run the following commands:

GRANT USAGE ON DATABASE <DatabaseName> TO APPLICATION TONIC_TEXTUAL;
GRANT USAGE ON SCHEMA <DatabaseName>.<SchemaName> TO APPLICATION TONIC_TEXTUAL;
GRANT READ, USAGE ON STAGE <DatabaseName>.<SchemaName>.<StageName> TO APPLICATION TONIC_TEXTUAL;

Sending a request for a single file

To send a parse request for a single file, run the following:

SELECT TONIC_TEXTUAL.APP_PUBLIC.TEXTUAL_PARSE('<FullyQualifiedStageName>', '<FileName>', '<FileMD5Sum>');

Where:

  • <FullyQualifiedStageName> is the fully qualified name of the stage, in the format <DatabaseName>.<SchemaName>.<StageName>. For example, database1.schema1.stage1.

  • <FileName> is the name of the file.

  • <FileMD5Sum> is the MD5 sum version of the file content.

Sending a request for multiple files

To parse a large number of files:

  1. List the stage files to parse. For example, you might use PATTERN to limit the files based on file type.

  2. Run the parse request command on the list.

For example:

LIST @<StageName> PATTERN='.*(txt|xlsx|docx)';
SELECT TONIC_TEXTUAL.APP_PUBLIC.TEXTUAL_PARSE('<StageName>', "name","md5") FROM table(result_scan(last_query_id()));

About the results

The app writes the results to the TEXTUAL_RESULTS table.

For each request, the entry in TEXTUAL_RESULTS includes the request status and the request results.

The status is one of the following values:

  • QUEUED - The parse request was received and is waiting to be processed.

  • RUNNING - The parse request is currently being processed.

  • SKIPPED - The parse request was skipped because the file did not change since the previous time it was parsed. Whether a file is changed is determined by its MD5 checksum.

  • FAILURE_<FailureReason> - The parse request failed for the provided reason.

The result column is a VARIANT type that contains the parsed data. For more information about the format of the results for each document, go to Structure of the pipeline output file JSON.

Querying the results

You can query the parse results in the same way as you would any other Snowflake VARIANT column.

For example, the following command retrieves the parsed documents, which are in a converted Markdown representation.

SELECT result["Content"]["ContentAsMarkdown"] FROM TEXTUAL_RESULTS;

To retrieve the entities that were identified in the document:

SELECT result["Content"]["nerResults"] FROM TEXTUAL_RESULTS;

Because the result column is a simple variant, you can use flattening operations to perform more complex analysis. For example, you can extract all entities of a certain type or value across the documents, or find all documents that contain a specific type of entity.

Last updated 4 months ago

Was this helpful?