LogoLogo
Release notesPython SDK docsDocs homeTextual CloudTonic.ai
  • Tonic Textual guide
  • Getting started with Textual
  • Previewing Textual detection and redaction
  • Entity types that Textual detects
    • Built-in entity types
    • Managing custom entity types
  • Language support in Textual
  • Datasets - Create redacted files
    • Datasets workflow for text redaction
    • Creating and managing datasets
    • Assigning tags to datasets
    • Adding and removing dataset files
    • Reviewing the sensitivity detection results
    • Configuring the redaction
      • Configuring added and excluded values for built-in entity types
      • Working with custom entity types
      • Selecting the handling option for entity types
      • Configuring synthesis options
      • Configuring handling of file components
    • Adding manual overrides to PDF files
      • Editing an individual PDF file
      • Creating templates to apply to PDF files
    • Sharing dataset access
    • Previewing the original and redacted data in a file
    • Downloading redacted data
  • Pipelines - Prepare LLM content
    • Pipelines workflow for LLM preparation
    • Viewing pipeline lists and details
    • Assigning tags to pipelines
    • Setting up pipelines
      • Creating and editing pipelines
      • Supported file types for pipelines
      • Creating custom entity types from a pipeline
      • Configuring file synthesis for a pipeline
      • Configuring an Amazon S3 pipeline
      • Configuring a Databricks pipeline
      • Configuring an Azure pipeline
      • Configuring a Sharepoint pipeline
      • Selecting files for an uploaded file pipeline
    • Starting a pipeline run
    • Sharing pipeline access
    • Viewing pipeline results
      • Viewing pipeline files, runs, and statistics
      • Displaying details for a processed file
      • Structure of the pipeline output file JSON
    • Downloading and using pipeline output
  • Textual Python SDK
    • Installing the Textual SDK
    • Creating and revoking Textual API keys
    • Obtaining JWT tokens for authentication
    • Instantiating the SDK client
    • Datasets and redaction
      • Create and manage datasets
      • Redact individual strings
      • Redact individual files
      • Transcribe and redact an audio file
      • Configure entity type handling for redaction
      • Record and review redaction requests
    • Pipelines and parsing
      • Create and manage pipelines
      • Parse individual files
  • Textual REST API
    • About the Textual REST API
    • REST API authentication
    • Redaction
      • Redact text strings
  • Datasets
    • Manage datasets
    • Manage dataset files
  • Snowflake Native App and SPCS
    • About the Snowflake Native App
    • Setting up the app
    • Using the app
    • Using Textual with Snowpark Container Services directly
  • Install and administer Textual
    • Textual architecture
    • Setting up and managing a Textual Cloud pay-as-you-go subscription
    • Deploying a self-hosted instance
      • System requirements
      • Deploying with Docker Compose
      • Deploying on Kubernetes with Helm
    • Configuring Textual
      • How to configure Textual environment variables
      • Configuring the number of textual-ml workers
      • Configuring the number of jobs to run concurrently
      • Configuring the format of Textual logs
      • Setting a custom certificate
      • Configuring endpoint URLs for calls to AWS
      • Enabling PDF and image processing
      • Setting the S3 bucket for file uploads and redactions
      • Required IAM role permissions for Amazon S3
      • Configuring model preferences
    • Viewing model specifications
    • Managing user access to Textual
      • Textual organizations
      • Creating a new account in an existing organization
      • Single sign-on (SSO)
        • Viewing the list of SSO groups in Textual
        • Azure
        • GitHub
        • Google
        • Keycloak
        • Okta
      • Managing Textual users
      • Managing permissions
        • About permissions and permission sets
        • Built-in permission sets and available permissions
        • Viewing the lists of permission sets
        • Configuring custom permission sets
        • Configuring access to global permission sets
        • Setting initial access to all global permissions
    • Textual monitoring
      • Downloading a usage report
      • Tracking user access to Textual
Powered by GitBook
On this page

Was this helpful?

Export as PDF
  1. Datasets

Manage dataset files

Use the REST API to manage dataset files.

Last updated 2 months ago

Was this helpful?

Download all dataset files

get

Downloads all files from the specified dataset. The downloaded files are redacted based on the dataset configuration.

Path parameters
datasetIdstringRequired
Responses
200
OK
application/json
Responsestring · binary
400
Bad Request
application/json
404
Not Found
application/json
500
Internal Server Error
get
GET /api/Dataset/{datasetId}/files/download_all HTTP/1.1
Host: 
Accept: */*
binary

Download a dataset file

get

Downloads the specified file from the dataset. The downloaded file is redacted based on the dataset configuration.

Path parameters
datasetIdstringRequired
fileIdstringRequired
Responses
200
OK
application/octet-stream
Responsestring · binary
400
Bad Request
404
Not Found
409
Conflict
500
Internal Server Error
get
GET /api/Dataset/{datasetId}/files/{fileId}/download HTTP/1.1
Host: 
Accept: */*
binary

Upload dataset files

post

Upload a file to a dataset for processing.

Path parameters
datasetIdstringRequired
Body
filestring · binaryOptional

File to upload

Responses
200
OK
application/json
post
POST /api/Dataset/{datasetId}/files/upload HTTP/1.1
Host: 
Content-Type: multipart/form-data
Accept: */*
Content-Length: 288

{
  "document": {
    "fileName": "example.txt",
    "csvConfig": {
      "numColumns": 1,
      "hasHeader": true,
      "escapeChar": "text",
      "quoteChar": "text",
      "delimiter": "text",
      "nullChar": "text"
    },
    "datasetId": "6a01360f-78fc-9f2f-efae-c5e1461e9c1et",
    "customPiiEntityIds": [
      "CUSTOM_ENTITY_1",
      "CUSTOM_ENTITY_2"
    ]
  },
  "file": "binary"
}
200

OK

{
  "updatedDataset": {
    "id": "text",
    "name": "text",
    "generatorMetadata": "asdfqwer",
    "generatorSetup": "{\"NAME_GIVEN\":\"Redaction\", \"NAME_FAMILY\":\"Redaction\"}",
    "labelBlockLists": "{\"NAME_FAMILY\": {\"strings\":[],\"regexes\":[\".*\\\\s(disease|syndrom|disorder)\"]}}",
    "labelAllowLists": "{ \"HEALTHCARE_ID\": {\"strings\":[],\"regexes\":[\"[a-z]{2}\\\\d{9}\"]} }",
    "enabledModels": [
      "text"
    ],
    "tags": [
      "text"
    ],
    "files": [
      {
        "fileId": "text",
        "fileName": "text",
        "fileType": "text",
        "datasetId": "text",
        "numRows": 1,
        "numColumns": 1,
        "piiTypes": [
          "text"
        ],
        "wordCount": 1,
        "redactedWordCount": 1,
        "uploadedTimestamp": {},
        "fileSource": "Local",
        "processingStatus": "text",
        "processingError": "text",
        "mostRecentCompletedJobId": "text"
      }
    ],
    "lastUpdated": {},
    "docXImagePolicy": "Redact",
    "pdfSignaturePolicy": "Redact",
    "pdfSynthModePolicy": "V1",
    "docXCommentPolicy": "Remove",
    "docXTablePolicy": "Redact",
    "fileSource": "Local",
    "customPiiEntityIds": [
      "text"
    ],
    "operations": [
      "HasAccess"
    ],
    "rescanJobs": [
      {
        "id": "text",
        "status": "text",
        "errorMessages": "text",
        "startTime": {},
        "endTime": {},
        "publishedTime": {},
        "datasetFileId": "text",
        "jobType": "DeidentifyFile"
      }
    ],
    "fileSourceExternalCredential": {
      "fileSource": "Local",
      "credential": {}
    },
    "awsCredentialSource": "text",
    "outputPath": "text"
  },
  "uploadedFileId": "text"
}

Get all datasets

get

Returns all datasets to which the user has access

Path parameters
includeSynthesizePipelinebooleanRequiredDefault: false
Responses
200
OK
application/json
get
GET /api/Dataset HTTP/1.1
Host: 
Accept: */*
200

OK

[
  {
    "id": "text",
    "name": "text",
    "datasetGeneratorMetadata": "asdfqwer",
    "generatorSetup": "{\"NAME_GIVEN\":\"Redaction\", \"NAME_FAMILY\":\"Redaction\"}",
    "labelBlockLists": "{\"NAME_FAMILY\": {\"strings\":[],\"regexes\":[\".*\\\\s(disease|syndrom|disorder)\"]}}",
    "labelAllowLists": "{ \"HEALTHCARE_ID\": {\"strings\":[],\"regexes\":[\"[a-z]{2}\\\\d{9}\"]} }",
    "enabledModels": [
      "text"
    ],
    "files": [
      {
        "fileId": "text",
        "fileName": "text",
        "fileType": "text",
        "datasetId": "text",
        "numRows": 1,
        "numColumns": 1,
        "piiTypes": [
          "text"
        ],
        "wordCount": 1,
        "redactedWordCount": 1,
        "uploadedTimestamp": {},
        "fileSource": "Local",
        "processingStatus": "text",
        "processingError": "text",
        "mostRecentCompletedJobId": "text"
      }
    ],
    "lastUpdated": {},
    "docXImagePolicy": "Redact",
    "pdfSignaturePolicy": "Redact",
    "docXCommentPolicy": "Remove",
    "docXTablePolicy": "Redact",
    "fileSource": "Local",
    "customPiiEntityIds": [
      "text"
    ],
    "rescanJobs": [
      {
        "id": "text",
        "status": "text",
        "errorMessages": "text",
        "startTime": {},
        "endTime": {},
        "publishedTime": {},
        "datasetFileId": "text",
        "jobType": "DeidentifyFile"
      }
    ]
  }
]

Gets the dataset by its Id

get

Returns the dataset specified by the datasetId

Path parameters
datasetIdstringRequired
Responses
200
OK
application/json
404
The dataset cannot be found
get
GET /api/Dataset/{datasetId} HTTP/1.1
Host: 
Accept: */*
{
  "id": "text",
  "name": "text",
  "datasetGeneratorMetadata": "asdfqwer",
  "generatorSetup": "{\"NAME_GIVEN\":\"Redaction\", \"NAME_FAMILY\":\"Redaction\"}",
  "labelBlockLists": "{\"NAME_FAMILY\": {\"strings\":[],\"regexes\":[\".*\\\\s(disease|syndrom|disorder)\"]}}",
  "labelAllowLists": "{ \"HEALTHCARE_ID\": {\"strings\":[],\"regexes\":[\"[a-z]{2}\\\\d{9}\"]} }",
  "enabledModels": [
    "text"
  ],
  "files": [
    {
      "fileId": "text",
      "fileName": "text",
      "fileType": "text",
      "datasetId": "text",
      "numRows": 1,
      "numColumns": 1,
      "piiTypes": [
        "text"
      ],
      "wordCount": 1,
      "redactedWordCount": 1,
      "uploadedTimestamp": {},
      "fileSource": "Local",
      "processingStatus": "text",
      "processingError": "text",
      "mostRecentCompletedJobId": "text"
    }
  ],
  "lastUpdated": {},
  "docXImagePolicy": "Redact",
  "pdfSignaturePolicy": "Redact",
  "docXCommentPolicy": "Remove",
  "docXTablePolicy": "Redact",
  "fileSource": "Local",
  "customPiiEntityIds": [
    "text"
  ],
  "rescanJobs": [
    {
      "id": "text",
      "status": "text",
      "errorMessages": "text",
      "startTime": {},
      "endTime": {},
      "publishedTime": {},
      "datasetFileId": "text",
      "jobType": "DeidentifyFile"
    }
  ]
}
  • POSTUpload dataset files
  • GETDownload a dataset file
  • GETDownload all dataset files
  • PUTCreates a new dataset
  • GETGet all datasets
  • POSTCreates a new dataset
  • GETGets the dataset by its Id

Creates a new dataset

put

Edits a dataset with the specified configuration

Query parameters
shouldRescanbooleanOptional
Body
all ofOptional
Responses
200
OK
application/json
404
The dataset cannot be found
409
Dataset name is already in use
put
PUT /api/Dataset HTTP/1.1
Host: 
Content-Type: application/json
Accept: */*
Content-Length: 507

{
  "id": "text",
  "name": "text",
  "generatorSetup": "{\"NAME_GIVEN\":\"Redaction\", \"NAME_FAMILY\":\"Redaction\"}",
  "datasetGeneratorMetadata": {
    "ANY_ADDITIONAL_PROPERTY": {}
  },
  "labelBlockLists": "{\"NAME_FAMILY\": {\"strings\":[],\"regexes\":[\".*\\\\s(disease|syndrom|disorder)\"]}}",
  "labelAllowLists": "{ \"HEALTHCARE_ID\": {\"strings\":[],\"regexes\":[\"[a-z]{2}\\\\d{9}\"]} }",
  "enabledModels": [
    "text"
  ],
  "docXImagePolicy": "Redact",
  "pdfSignaturePolicy": "Redact",
  "docXCommentPolicy": "Remove",
  "docXTablePolicy": "Redact"
}
{
  "id": "text",
  "name": "text",
  "datasetGeneratorMetadata": "asdfqwer",
  "generatorSetup": "{\"NAME_GIVEN\":\"Redaction\", \"NAME_FAMILY\":\"Redaction\"}",
  "labelBlockLists": "{\"NAME_FAMILY\": {\"strings\":[],\"regexes\":[\".*\\\\s(disease|syndrom|disorder)\"]}}",
  "labelAllowLists": "{ \"HEALTHCARE_ID\": {\"strings\":[],\"regexes\":[\"[a-z]{2}\\\\d{9}\"]} }",
  "enabledModels": [
    "text"
  ],
  "files": [
    {
      "fileId": "text",
      "fileName": "text",
      "fileType": "text",
      "datasetId": "text",
      "numRows": 1,
      "numColumns": 1,
      "piiTypes": [
        "text"
      ],
      "wordCount": 1,
      "redactedWordCount": 1,
      "uploadedTimestamp": {},
      "fileSource": "Local",
      "processingStatus": "text",
      "processingError": "text",
      "mostRecentCompletedJobId": "text"
    }
  ],
  "lastUpdated": {},
  "docXImagePolicy": "Redact",
  "pdfSignaturePolicy": "Redact",
  "docXCommentPolicy": "Remove",
  "docXTablePolicy": "Redact",
  "fileSource": "Local",
  "customPiiEntityIds": [
    "text"
  ],
  "rescanJobs": [
    {
      "id": "text",
      "status": "text",
      "errorMessages": "text",
      "startTime": {},
      "endTime": {},
      "publishedTime": {},
      "datasetFileId": "text",
      "jobType": "DeidentifyFile"
    }
  ]
}

Creates a new dataset

post

Creates a new dataset with the specified configuration. You must specify a unique, non-empty dataset name

Body
all ofOptional
Responses
200
OK
application/json
400
The dataset name must be specified
409
Dataset name is already in use
post
POST /api/Dataset HTTP/1.1
Host: 
Content-Type: application/json
Accept: */*
Content-Length: 15

{
  "name": "text"
}
{
  "id": "text",
  "name": "text",
  "datasetGeneratorMetadata": "asdfqwer",
  "generatorSetup": "{\"NAME_GIVEN\":\"Redaction\", \"NAME_FAMILY\":\"Redaction\"}",
  "labelBlockLists": "{\"NAME_FAMILY\": {\"strings\":[],\"regexes\":[\".*\\\\s(disease|syndrom|disorder)\"]}}",
  "labelAllowLists": "{ \"HEALTHCARE_ID\": {\"strings\":[],\"regexes\":[\"[a-z]{2}\\\\d{9}\"]} }",
  "enabledModels": [
    "text"
  ],
  "files": [
    {
      "fileId": "text",
      "fileName": "text",
      "fileType": "text",
      "datasetId": "text",
      "numRows": 1,
      "numColumns": 1,
      "piiTypes": [
        "text"
      ],
      "wordCount": 1,
      "redactedWordCount": 1,
      "uploadedTimestamp": {},
      "fileSource": "Local",
      "processingStatus": "text",
      "processingError": "text",
      "mostRecentCompletedJobId": "text"
    }
  ],
  "lastUpdated": {},
  "docXImagePolicy": "Redact",
  "pdfSignaturePolicy": "Redact",
  "docXCommentPolicy": "Remove",
  "docXTablePolicy": "Redact",
  "fileSource": "Local",
  "customPiiEntityIds": [
    "text"
  ],
  "rescanJobs": [
    {
      "id": "text",
      "status": "text",
      "errorMessages": "text",
      "startTime": {},
      "endTime": {},
      "publishedTime": {},
      "datasetFileId": "text",
      "jobType": "DeidentifyFile"
    }
  ]
}