Downloading and using pipeline output
Last updated
Last updated
From Tonic Textual, you can download the JSON output for each file. For pipelines that also generate synthesized files, you can download those files.
You can also use the Textual API to further process the pipeline output - for example, you can chunk the output and determine whether to replace sensitive values before you use the output in a RAG system.
Textual provides next step hints to use the pipeline output. The examples in this topic provide details about how to use the output.
From a file details page, to download the JSON file, click Download Results.
On the file details for a pipeline file, to download the synthesized version of the file, click Download Synthesized File.
On the Original tab for files other than .txt files, the Redacted <file type> view contains a Download option.
For cloud storage pipelines, the synthesized files are also available in the configured output location.
On the pipeline details page, the next steps panel at the left contains suggested steps to set up the API and use the pipeline output:
Create an API Key contains a link to create the key
Install the Python SDK contains a link to copy the SDK installation command
Fetch the pipeline results provides access to code snippets that you can use to retrieve and chunk the pipeline results.
At the top of the Fetch the pipeline results step is the pipeline identifier. To copy the identifier, click the copy icon.
The pipeline results step provides access to the following snippets:
Markdown - A code snippet to retrieve the Markdown results for the pipeline.
JSON - A code snippet to retrieve the JSON results for the pipeline.
Chunks - A code snippet to chunk the pipeline results.
To view a snippet, click the snippet tab.
To display the snippet panel, on the snippet tab, click View. The snippet panel provides a larger view of the snippet.
To copy the code snippet, on the snippet tab or the snippet panel, click Copy.
This example shows how to use your Textual pipeline output to create private chunks for RAG, where sensitive chunks are dropped, redacted, or synthesized.
This allows you to ensure that the chunks that you use for RAG do not contain any private information.
First, we connect to the API and get the files from the most recent pipeline.
Next, specify the sensitive entity types, and indicate whether to redact or to synthesize those entities in the chunks.
Next, generate the chunks.
In the following code snippet, the final list does not include chunks with sensitive entities.
To include the chunks with the sensitive entities redacted, remove the if chunk['is_sensitive']: continue
lines.
The chunks are now ready to use for RAG or for other downstream tasks.
This example shows how to use Pinecone to add your Tonic Textual pipeline output to a vector retrieval system, for example for RAG.
The Pinecone metadata filtering options allow you to incorporate Textual NER metadata into the retrieval system.
First, connect to the Textual pipeline API, and get the files from the most recently created pipeline.
Next, specify the entity types to incorporate into the retrieval system.
Chunk the files.
For each chunk, add the metadata that contains the instances of the entity types that occur in that chunk.
Next, embed the text of the chunks.
For each chunk, store the following in a Pinecone vector database:
Text
Embedding
Metadata
You define the embedding function for your system.
When you query the Pinecone database, you can then use metadata filters that specify entity type constraints.
For example, to only return chunks that contain the name John Smith
:
As another example, to only return chunks that contain one of the following organizations - Google, Apple, or Microsoft: