Create and manage pipelines
Textual uses pipelines to transform file text into a format that can be used in an LLM system.
You can use the Textual SDK to create and manage pipelines and retrieve pipeline run results.
Before you perform these tasks, remember to instantiate the SDK client.
Creating and updating pipelines
Creating a pipeline
To create a pipeline, use textual.create_pipeline
.
The response contains the pipeline object.
Uploading a file to a pipeline
To upload a file to a pipeline, use pipeline.upload_file
.
Deleting a pipeline
To delete a pipeline, use textual.delete_pipeline
.
Getting a pipeline or pipelines
Getting the list of pipelines
To get the list of pipelines, use textual.get_pipelines
.
The response contains a list of pipeline objects.
Getting a single pipeline
To use the pipeline identifier to get a single pipeline, use textual.get_pipeline_by_id
.
The response contains a single pipeline object.
The pipeline identifier is displayed on the pipeline details page. To copy the identifier, click the copy icon.
Running a pipeline
To run a pipeline, use pipeline.run
.
The response contains the job identifier.
Getting pipelines runs, files, and results
Getting pipeline runs
To get the list of pipeline runs, use pipeline.get_runs
.
The response contains a list of pipeline run objects.
Getting pipeline files
Once you have the pipeline, to get an enumerator of the files in the pipeline from the most recent pipeline run, use pipeline.enumerate_files
.
The response is an enumerator of file parse result objects.
Getting the list of entities in a file
To get a list of entities that were detected in a file, use get_all_entities
. For example, to get the detected entities for all of the files in a pipeline:
To provide a list entity types and how to process them, use get_entities
:
generator_config
is a dictionary that specifies whether to redact, synthesize, or do neither for each entity type in the dictionary.
For a list of the entity types that Textual detects, go to Entity types that Textual detects.
For each entity type, you provide the handling type:
Redaction
indicates to replace the value with the value type.Synthesis
indicates to replace the value with a realistic value.Off
indicates to keep the value as is.
generator_default
indicates how to process values for entity types that were not included in the generator_config
list.
The response contains the list of entities. For each value, the list includes:
Entity type
Where the value starts in the source file
Where the value ends in the source file
The original text of the entity
Getting the Markdown output for pipeline files
To get the Markdown output of a pipeline file, use get_markdown
. In the request, you can provide generator_config
and generator_default
to configure how to present the detected entities in the output file.
The response contains the Markdown files, with the detected entities processed as specified in generator_config
and generator_default
.
Generating chunks from pipeline files
To split a pipeline file into text chunks that can be imported into an LLM, use get_chunks
.
In the request, you set the maximum number of characters in each chunk.
You can also provide generator_config
and generator_default
to configure how to present the detected entities in the text chunks.
The response contains the list of text chunks, with the detected entities processed as specified in generator_config
and generator_default
.
Last updated