Manage datasets | Tonic Textual | Tonic.ai documentation

Get all datasets

get

Returns all of the datasets that the user has access to. Each dataset contains a set of minimal info, for more detailed information on a given dataset use the /api/dataset/{datasetId} endpoint.

Path parameters

filterSynthesizedbooleanRequiredDefault: true

Responses

200

OK

application/json

idstringOptional

namestringOptional

fileSourceall of · nullableOptional

string · enumOptionalPossible values:

tagsstring[]Optional

lastUpdatedall ofOptional

objectOptional

createdall ofOptional

objectOptional

get

/api/Dataset

GET /api/Dataset HTTP/1.1
Accept: */*

200

OK

[
  {
    "id": "text",
    "name": "text",
    "fileSource": "Local",
    "operations": [
      "HasAccess"
    ],
    "tags": [
      "text"
    ],
    "lastUpdated": {},
    "created": {},
    "creatorUser": {
      "id": "text",
      "userName": "text",
      "firstName": "text",
      "lastName": "text"
    }
  }
]

Gets the dataset by its Id

get

Returns the dataset specified by the datasetId

Path parameters

datasetIdstringRequired

Responses

200

OK

application/json

idstringOptional

namestringOptional

enabledModelsstring[]Optional

lastUpdatedall ofOptional

objectOptional

docXImagePolicyall ofOptionalPossible values:

string · enumOptionalPossible values:

pdfSignaturePolicyall ofOptionalPossible values:

string · enumOptionalPossible values:

docXCommentPolicyall ofOptionalPossible values:

string · enumOptionalPossible values:

docXTablePolicyall ofOptionalPossible values:

string · enumOptionalPossible values:

fileSourceall ofOptionalPossible values:

string · enumOptionalPossible values:

customPiiEntityIdsstring[]Optional

404

The dataset cannot be found

get

/api/Dataset/{datasetId}

GET /api/Dataset/{datasetId} HTTP/1.1
Accept: */*

{
  "id": "text",
  "name": "text",
  "datasetGeneratorMetadata": "asdfqwer",
  "generatorSetup": "{\"NAME_GIVEN\":\"Redaction\", \"NAME_FAMILY\":\"Redaction\"}",
  "labelBlockLists": "{\"NAME_FAMILY\": {\"strings\":[],\"regexes\":[\".*\\\\s(disease|syndrom|disorder)\"]}}",
  "labelAllowLists": "{ \"HEALTHCARE_ID\": {\"strings\":[],\"regexes\":[\"[a-z]{2}\\\\d{9}\"]} }",
  "enabledModels": [
    "text"
  ],
  "files": [
    {
      "fileId": "text",
      "fileName": "text",
      "fileType": "text",
      "datasetId": "text",
      "numRows": 1,
      "numColumns": 1,
      "piiTypes": [
        "text"
      ],
      "wordCount": 1,
      "redactedWordCount": 1,
      "uploadedTimestamp": {},
      "fileSource": "Local",
      "processingStatus": "text",
      "processingError": "text",
      "mostRecentCompletedJobId": "text"
    }
  ],
  "lastUpdated": {},
  "docXImagePolicy": "Redact",
  "pdfSignaturePolicy": "Redact",
  "docXCommentPolicy": "Remove",
  "docXTablePolicy": "Redact",
  "fileSource": "Local",
  "customPiiEntityIds": [
    "text"
  ],
  "rescanJobs": [
    {
      "id": "text",
      "status": "text",
      "errorMessages": "text",
      "startTime": {},
      "endTime": {},
      "publishedTime": {},
      "datasetFileId": "text",
      "jobType": "DeidentifyFile"
    }
  ]
}

Creates a new dataset

post

Creates a new dataset with the specified configuration. You must specify a unique, non-empty dataset name

Body

Responses

200

OK

application/json

idstringOptional

namestringOptional

enabledModelsstring[]Optional

lastUpdatedall ofOptional

objectOptional

docXImagePolicyall ofOptionalPossible values:

string · enumOptionalPossible values:

pdfSignaturePolicyall ofOptionalPossible values:

string · enumOptionalPossible values:

docXCommentPolicyall ofOptionalPossible values:

string · enumOptionalPossible values:

docXTablePolicyall ofOptionalPossible values:

string · enumOptionalPossible values:

fileSourceall ofOptionalPossible values:

string · enumOptionalPossible values:

customPiiEntityIdsstring[]Optional

400

The dataset name must be specified

409

Dataset name is already in use

post

/api/Dataset

POST /api/Dataset HTTP/1.1
Content-Type: application/json
Accept: */*
Content-Length: 15

{
  "name": "text"
}

{
  "id": "text",
  "name": "text",
  "datasetGeneratorMetadata": "asdfqwer",
  "generatorSetup": "{\"NAME_GIVEN\":\"Redaction\", \"NAME_FAMILY\":\"Redaction\"}",
  "labelBlockLists": "{\"NAME_FAMILY\": {\"strings\":[],\"regexes\":[\".*\\\\s(disease|syndrom|disorder)\"]}}",
  "labelAllowLists": "{ \"HEALTHCARE_ID\": {\"strings\":[],\"regexes\":[\"[a-z]{2}\\\\d{9}\"]} }",
  "enabledModels": [
    "text"
  ],
  "files": [
    {
      "fileId": "text",
      "fileName": "text",
      "fileType": "text",
      "datasetId": "text",
      "numRows": 1,
      "numColumns": 1,
      "piiTypes": [
        "text"
      ],
      "wordCount": 1,
      "redactedWordCount": 1,
      "uploadedTimestamp": {},
      "fileSource": "Local",
      "processingStatus": "text",
      "processingError": "text",
      "mostRecentCompletedJobId": "text"
    }
  ],
  "lastUpdated": {},
  "docXImagePolicy": "Redact",
  "pdfSignaturePolicy": "Redact",
  "docXCommentPolicy": "Remove",
  "docXTablePolicy": "Redact",
  "fileSource": "Local",
  "customPiiEntityIds": [
    "text"
  ],
  "rescanJobs": [
    {
      "id": "text",
      "status": "text",
      "errorMessages": "text",
      "startTime": {},
      "endTime": {},
      "publishedTime": {},
      "datasetFileId": "text",
      "jobType": "DeidentifyFile"
    }
  ]
}

Edits a dataset

put

Updates a dataset to use the specified configuration.

Query parameters

shouldRescanbooleanOptional

Body

Responses

200

OK

application/json

idstringOptional

namestringOptional

outputFormatall ofOptionalPossible values:

string · enumOptionalPossible values:

tagsstring[]Optional

lastUpdatedall ofOptional

objectOptional

createdall ofOptional

objectOptional

docXImagePolicyall of · nullableOptional

string · enumOptionalPossible values:

pdfSignaturePolicyall of · nullableOptional

string · enumOptionalPossible values:

pdfSynthModePolicyall of · nullableOptional

string · enumOptionalPossible values:

docXCommentPolicyall of · nullableOptional

string · enumOptionalPossible values:

docXTablePolicyall of · nullableOptional

string · enumOptionalPossible values:

fileSourceall of · nullableOptional

string · enumOptionalPossible values:

customPiiEntityIdsstring[] · nullableOptional

awsCredentialSourcestring · nullableOptional

outputPathstring · nullableOptional

ocrServiceProviderall of · nullableOptional

string · enumOptionalPossible values:

404

The dataset cannot be found

409

Dataset name is already in use

put

/api/Dataset

PUT /api/Dataset HTTP/1.1
Content-Type: application/json
Accept: */*
Content-Length: 735

{
  "id": "text",
  "name": "text",
  "generatorSetup": "{\"NAME_GIVEN\":\"Redaction\", \"NAME_FAMILY\":\"Redaction\"}",
  "generatorMetadata": {
    "ANY_ADDITIONAL_PROPERTY": {
      "version": "V1",
      "customGenerator": "Scramble",
      "swaps": {
        "ANY_ADDITIONAL_PROPERTY": "text"
      }
    }
  },
  "labelBlockLists": "{\"NAME_FAMILY\": {\"strings\":[],\"regexes\":[\".*\\\\s(disease|syndrom|disorder)\"]}}",
  "labelAllowLists": "{ \"HEALTHCARE_ID\": {\"strings\":[],\"regexes\":[\"[a-z]{2}\\\\d{9}\"]} }",
  "docXImagePolicy": "Redact",
  "pdfSignaturePolicy": "Redact",
  "pdfSynthModePolicy": "V1",
  "docXCommentPolicy": "Remove",
  "docXTablePolicy": "Redact",
  "fileSourceExternalCredential": {
    "fileSource": "Local",
    "credential": {}
  },
  "awsCredentialSource": "text",
  "outputPath": "text",
  "ocrServiceProvider": "Azure"
}

{
  "id": "text",
  "name": "text",
  "generatorMetadata": "asdfqwer",
  "outputFormat": "Original",
  "generatorSetup": "{\"NAME_GIVEN\":\"Redaction\", \"NAME_FAMILY\":\"Redaction\"}",
  "labelBlockLists": "{\"NAME_FAMILY\": {\"strings\":[],\"regexes\":[\".*\\\\s(disease|syndrom|disorder)\"]}}",
  "labelAllowLists": "{ \"HEALTHCARE_ID\": {\"strings\":[],\"regexes\":[\"[a-z]{2}\\\\d{9}\"]} }",
  "tags": [
    "text"
  ],
  "files": [
    {
      "fileId": "text",
      "fileName": "text",
      "fileType": "text",
      "datasetId": "text",
      "numRows": 1,
      "numColumns": 1,
      "piiTypes": [
        "text"
      ],
      "wordCount": 1,
      "redactedWordCount": 1,
      "uploadedTimestamp": {},
      "fileSource": "Local",
      "processingStatus": "text",
      "processingError": "text",
      "mostRecentCompletedJobId": "text",
      "fileParseResultId": "text",
      "filePath": "text",
      "generatedFileStatus": "text"
    }
  ],
  "lastUpdated": {},
  "created": {},
  "creatorUser": {
    "id": "text",
    "userName": "text",
    "firstName": "text",
    "lastName": "text"
  },
  "docXImagePolicy": "Redact",
  "pdfSignaturePolicy": "Redact",
  "pdfSynthModePolicy": "V1",
  "docXCommentPolicy": "Remove",
  "docXTablePolicy": "Redact",
  "fileSource": "Local",
  "customPiiEntityIds": [
    "text"
  ],
  "operations": [
    "HasAccess"
  ],
  "rescanJobs": [
    {
      "id": "text",
      "status": "text",
      "errorMessages": "text",
      "startTime": {},
      "endTime": {},
      "publishedTime": {},
      "datasetFileId": "text",
      "datasetId": "text",
      "jobType": "DeidentifyFile",
      "ocrServiceProvider": "Azure",
      "modelsInfo": {
        "dateSynthesis": {
          "runsOnGpu": true
        },
        "fasttext": {
          "libVersion": "text",
          "model": "text",
          "runsOnGpu": true
        },
        "image": {
          "version": "text"
        },
        "spacy": {
          "libVersion": "text",
          "auxModel": {
            "name": "text",
            "language": "text",
            "runsOnGpu": true,
            "version": "text"
          },
          "multilingualModels": [
            {
              "name": "text",
              "language": "text",
              "runsOnGpu": true,
              "version": "text"
            }
          ]
        },
        "torch": {
          "gpuAvailable": true,
          "libVersion": "text"
        },
        "tonicNer": {
          "enModel": "text",
          "xlmModel": "text"
        },
        "tesseract": {
          "model": {
            "id": "text",
            "version": "text"
          }
        }
      }
    }
  ],
  "mostRecentExternalFileGenerationJob": {
    "id": "text",
    "status": "text",
    "errorMessages": "text",
    "startTime": {},
    "endTime": {},
    "publishedTime": {},
    "datasetFileId": "text",
    "datasetId": "text",
    "jobType": "DeidentifyFile",
    "ocrServiceProvider": "Azure",
    "modelsInfo": {
      "dateSynthesis": {
        "runsOnGpu": true
      },
      "fasttext": {
        "libVersion": "text",
        "model": "text",
        "runsOnGpu": true
      },
      "image": {
        "version": "text"
      },
      "spacy": {
        "libVersion": "text",
        "auxModel": {
          "name": "text",
          "language": "text",
          "runsOnGpu": true,
          "version": "text"
        },
        "multilingualModels": [
          {
            "name": "text",
            "language": "text",
            "runsOnGpu": true,
            "version": "text"
          }
        ]
      },
      "torch": {
        "gpuAvailable": true,
        "libVersion": "text"
      },
      "tonicNer": {
        "enModel": "text",
        "xlmModel": "text"
      },
      "tesseract": {
        "model": {
          "id": "text",
          "version": "text"
        }
      }
    }
  },
  "fileSourceExternalCredential": {
    "fileSource": "Local",
    "credential": {}
  },
  "awsCredentialSource": "text",
  "outputPath": "text",
  "externalFilesInfo": {
    "selectedFiles": [
      "text"
    ],
    "pathPrefixes": [
      "text"
    ],
    "selectedFileExtensions": [
      "text"
    ]
  },
  "ocrServiceProvider": "Azure"
}

Good evening

hashtagGet all datasets

hashtagGets the dataset by its Id

hashtagCreates a new dataset

hashtagEdits a dataset

Get all datasets

Gets the dataset by its Id

Creates a new dataset

Edits a dataset