Using Textual with Snowpark Container Services directly

Snowpark Container Services (SPCS) allow developers to run containerized workloads directly within Snowflake. Because Tonic Textual is distributed using a private Docker repository, you can use these images in SPCS to run Textual workloads.

It is quicker to use the Snowflake Native App, but SPCS allows for more customization.

Add images to the repository

To use the Textual images, you must add them to Snowflake. The Snowflake documentation and tutorial walks through the process in great detail, but the basic steps are as follows:

Set up an image repository in Snowflake.
To pull down the required images, you must have access to our private Docker image repository on Quay.io. You should have been provided credentials during onboarding. If you require new credentials, or you experience issues accessing the repository, contact [email protected]. Once you have access, pull down the following images:
- textual-snowflake
- Either textual-ml or textual-ml-gpu, depending on whether you plan to use a GPU compute pool
Use the Docker CLI to upload the images to the image repository.

The images are now available in Snowflake.

Create the API service

The API service exposes the functions that are used to redact sensitive values in Snowflake. The service must be attached to a compute pool. You can scale the instances as needed, but you likely only need one API.

DROP SERVICE IF EXISTS api_service;
CREATE SERVICE api_service
  IN COMPUTE POOL compute_pool
  FROM SPECIFICATION $$
    spec:
      containers:
      - name: api_container
        image: /your_db/your_schema/your_image_repository/textual-snowflake:latest
        env:
          ML_SERVICE_URL: https://ml-service:7701
      endpoints:
        - name: api_endpoint
          port: 9002
          protocol: HTTP
      $$
   MIN_INSTANCES=1
   MAX_INSTANCES=1;

Create the machine learning (ML) service

Next, you create the ML service, which recognizes personally identifiable information (PII) and other sensitive values in text. This is more likely to need scaling.

DROP SERVICE IF EXISTS ml_service;
CREATE SERVICE ml_service
  IN COMPUTE POOL compute_pool
  FROM SPECIFICATION $$
    spec:
      containers:
      - name: ml_container
        image: /your_db/your_schema/your_image_repository/textual-ml:latest
      endpoints:
        - name: ml_endpoint
          port: 7701
          protocol: TCP
      $$
   MIN_INSTANCES=1
   MAX_INSTANCES=1;

Create functions

You can create custom SQL functions that use your API and ML services. These functions are accessible from directly within Snowflake.

CREATE OR REPLACE FUNCTION textual_redact(input_text STRING, config STRING)
  RETURNS STRING
  SERVICE = your_db.your_schema.api_service
  ENDPOINT = 'api_endpoint'
  AS '/api/redact';

CREATE OR REPLACE FUNCTION textual_redact(input_text STRING)
  RETURNS STRING
  SERVICE = your_db.your_schema.api_service
  CONTEXT_HEADERS = (current_user)
  ENDPOINT = 'api_endpoint'
  AS '/api/redact';
  
CREATE OR REPLACE FUNCTION textual_parse(PATH VARCHAR, STAGE_NAME VARCHAR, md5sum VARCHAR)
  returns string
  SERVICE=core.textual_service
  CONTEXT_HEADERS = (current_user)
  endpoint='api_endpoint'
  MAX_BATCH_ROWS=10
  as '/api/parse/start';

Example usage

It can take a couple of minutes for the containers to start. After the containers are started, you can use the functions that you created in Snowflake.

To test the functions, use an existing table. You can also create this simple test table:

CREATE TABLE Messages (
    Message TEXT
);

INSERT INTO Messages (Message) VALUES ('Hi my name is John Smith');
INSERT INTO Messages (Message) VALUES ('Hi John, mine is Jane Doe');

You use the function in the same way as any other user-defined function. You can pass in additional configuration to determine how to process specific types of sensitive values.

For example:

SELECT Message, textual_redact(Message) as REDACTED, textual_redact(Message, PARSE_JSON('NAME_GIVEN':'Synthesis', 'NAME_FAMILY':'Off')) as SYNTHESIZED FROM MESSAGES;

By default, the function redacts the entity values. In other words, it replaces the values with a placeholder that includes the type. Synthesis indicates to replace the value with a realistic replacement value. Off indicates to leave the value as is.

The textual_redact function works identically to the textual_redact function in the Snowflake Native App.

The response from the above example should look something like this:

Message

Redacted

Synthesized

Hi my name is John Smith

Hi my name is [NAME_GIVEN_Kx0Y7] [NAME_FAMILY_s9TTP0]

Hi my name is Lamar Smith

Hi John, mine is Jane Doe

Hi [NAME_GIVEN_Kx0Y7], mine is [NAME_GIVEN_veAy9] [NAME_FAMILY_6eC2]

Hi Lamar, mine is Doris Doe

The textual_parse function works identically to the textual_parse function in the Snowflake Native App.

Last updated 6 months ago

Was this helpful?