Downloading and using pipeline output
From Tonic Textual, you can download the JSON output for each file.
You can also use the Textual API to further process the pipeline output - for example, you can chunk the output and determine whether to replace sensitive values before you use the output in a RAG system.
Textual provides next step hints to use the pipeline output. The examples in this topic provide details about how to use the output.
Downloading output files
From a file details page, to download the JSON file, click Download Results.
Viewing next steps information for pipeline output
Available next steps
On the pipeline details page, the next steps panel at the left contains suggested steps to set up the API and use the pipeline output:
Create an API Key contains a link to create the key
Install the Python SDK contains a link to copy the SDK installation command
Fetch the pipeline results provides access to code snippets that you can use to retrieve and chunk the pipeline results.
Copying the pipeline identifier
At the top of the Fetch the pipeline results step is the pipeline identifier. To copy the identifier, click the copy icon.
Selecting the snippet to view
The pipeline results step provides access to the following snippets:
Markdown - A code snippet to retrieve the Markdown results for the pipeline.
JSON - A code snippet to retrieve the JSON results for the pipeline.
Chunks - A code snippet to chunk the pipeline results.
To view a snippet, click the snippet tab.
Viewing the snippet panel
To display the snippet panel, on the snippet tab, click View. The snippet panel provides a larger view of the snippet.
Copying a snippet
To copy the code snippet, on the snippet tab or the snippet panel, click Copy.
Example - Working with sensitive RAG Chunks
This example shows how to use your Tonic Textual pipeline output to create private chunks for RAG, where sensitive chunks are dropped, redacted, or synthesized.
This allows you to ensure that the chunks that you use for RAG do not contain any private information.
Get the latest output files
First, we connect to the API and get the files from the most recent pipeline.
Identify the entity types and handling
Next, specify the sensitive entity types, and indicate whether to redact or to synthesize those entities in the chunks.
Generate the chunks
Next, generate the chunks.
In the following code snippet, the final list does not include chunks with sensitive entities.
To include the chunks with the sensitive entities redacted, remove the if chunk['is_sensitive']: continue
lines.
The chunks are now ready to use for RAG or for other downstream tasks.
Example - Using Pinecone to add pipeline output to a vector retrieval system
This example shows how to use Pinecone to add your Tonic Textual pipeline output to a vector retrieval system, for example for RAG.
The Pinecone metadata filtering options allow you to incorporate Textual NER metadata into the retrieval system.
Get the latest output files
First, connect to the Textual pipeline API, and get the files from the most recently created pipeline.
Identify the entity types to include
Next, specify the entity types to incorporate into the retrieval system.
Chunk the files and add metadata
Chunk the files.
For each chunk, add the metadata that contains the instances of the entity types that occur in that chunk.
Add the chunks to the Pinecone database
Next, embed the text of the chunks.
For each chunk, store the following in a Pinecone vector database:
Text
Embedding
Metadata
You define the embedding function for your system.
Using metadata filters to query the Pinecone database
When you query the Pinecone database, you can then use metadata filters that specify entity type constraints.
For example, to only return chunks that contain the name John Smith
:
As another example, to only return chunks that contain one of the following organizations - Google, Apple, or Microsoft:
Last updated