Manage dataset files

Use the REST API to manage dataset files.

Upload dataset files

post

Upload a file to a dataset for processing.

Required Permissions

Dataset: Upload Files

Path parameters

datasetIdstringRequired

Body

filestring · binaryOptional

File to upload

Responses

200

application/json

post

POST /api/Dataset/{datasetId}/files/upload HTTP/1.1
Host: 
Content-Type: multipart/form-data
Accept: */*
Content-Length: 288

{
  "document": {
    "fileName": "example.txt",
    "csvConfig": {
      "numColumns": 1,
      "hasHeader": true,
      "escapeChar": "text",
      "quoteChar": "text",
      "delimiter": "text",
      "nullChar": "text"
    },
    "datasetId": "6a01360f-78fc-9f2f-efae-c5e1461e9c1et",
    "customPiiEntityIds": [
      "CUSTOM_ENTITY_1",
      "CUSTOM_ENTITY_2"
    ]
  },
  "file": "binary"
}

200

{
  "updatedDataset": {
    "id": "text",
    "name": "text",
    "generatorMetadata": "asdfqwer",
    "generatorSetup": "{\"NAME_GIVEN\":\"Redaction\", \"NAME_FAMILY\":\"Redaction\"}",
    "labelBlockLists": "{\"NAME_FAMILY\": {\"strings\":[],\"regexes\":[\".*\\\\s(disease|syndrom|disorder)\"]}}",
    "labelAllowLists": "{ \"HEALTHCARE_ID\": {\"strings\":[],\"regexes\":[\"[a-z]{2}\\\\d{9}\"]} }",
    "enabledModels": [
      "text"
    ],
    "tags": [
      "text"
    ],
    "files": [
      {
        "fileId": "text",
        "fileName": "text",
        "fileType": "text",
        "datasetId": "text",
        "numRows": 1,
        "numColumns": 1,
        "piiTypes": [
          "text"
        ],
        "wordCount": 1,
        "redactedWordCount": 1,
        "uploadedTimestamp": {},
        "fileSource": "Local",
        "processingStatus": "text",
        "processingError": "text",
        "mostRecentCompletedJobId": "text"
      }
    ],
    "lastUpdated": {},
    "created": {},
    "creatorUser": {
      "id": "text",
      "userName": "text",
      "firstName": "text",
      "lastName": "text"
    },
    "docXImagePolicy": "Redact",
    "pdfSignaturePolicy": "Redact",
    "pdfSynthModePolicy": "V1",
    "docXCommentPolicy": "Remove",
    "docXTablePolicy": "Redact",
    "fileSource": "Local",
    "customPiiEntityIds": [
      "text"
    ],
    "operations": [
      "HasAccess"
    ],
    "rescanJobs": [
      {
        "id": "text",
        "status": "text",
        "errorMessages": "text",
        "startTime": {},
        "endTime": {},
        "publishedTime": {},
        "datasetFileId": "text",
        "jobType": "DeidentifyFile"
      }
    ],
    "fileSourceExternalCredential": {
      "fileSource": "Local",
      "credential": {}
    },
    "awsCredentialSource": "text",
    "outputPath": "text"
  },
  "uploadedFileId": "text"
}

Download a dataset file

get

Downloads the specified file from the dataset. The downloaded file is redacted based on the dataset configuration.

Required Permissions

Dataset: Download Redacted Files

Path parameters

datasetIdstringRequired

fileIdstringRequired

Responses

200

application/octet-stream

Responsestring · binary

400

Bad Request

404

Not Found

409

Conflict

500

Internal Server Error

get

GET /api/Dataset/{datasetId}/files/{fileId}/download HTTP/1.1
Host: 
Accept: */*

binary

Download all dataset files

get

Downloads all files from the specified dataset. The downloaded files are redacted based on the dataset configuration.

Required Permissions

Dataset: Download Redacted Files

Path parameters

datasetIdstringRequired

Responses

200

application/json

Responsestring · binary

400

Bad Request

application/json

404

Not Found

application/json

500

Internal Server Error

get

GET /api/Dataset/{datasetId}/files/download_all HTTP/1.1
Host: 
Accept: */*

binary

Creates a new dataset

put

Edits a dataset with the specified configuration

Query parameters

shouldRescanbooleanOptional

Body

all ofOptional

Responses

200

application/json

404

The dataset cannot be found

409

Dataset name is already in use

put

PUT /api/Dataset HTTP/1.1
Host: 
Content-Type: application/json
Accept: */*
Content-Length: 507

{
  "id": "text",
  "name": "text",
  "generatorSetup": "{\"NAME_GIVEN\":\"Redaction\", \"NAME_FAMILY\":\"Redaction\"}",
  "datasetGeneratorMetadata": {
    "ANY_ADDITIONAL_PROPERTY": {}
  },
  "labelBlockLists": "{\"NAME_FAMILY\": {\"strings\":[],\"regexes\":[\".*\\\\s(disease|syndrom|disorder)\"]}}",
  "labelAllowLists": "{ \"HEALTHCARE_ID\": {\"strings\":[],\"regexes\":[\"[a-z]{2}\\\\d{9}\"]} }",
  "enabledModels": [
    "text"
  ],
  "docXImagePolicy": "Redact",
  "pdfSignaturePolicy": "Redact",
  "docXCommentPolicy": "Remove",
  "docXTablePolicy": "Redact"
}

{
  "id": "text",
  "name": "text",
  "datasetGeneratorMetadata": "asdfqwer",
  "generatorSetup": "{\"NAME_GIVEN\":\"Redaction\", \"NAME_FAMILY\":\"Redaction\"}",
  "labelBlockLists": "{\"NAME_FAMILY\": {\"strings\":[],\"regexes\":[\".*\\\\s(disease|syndrom|disorder)\"]}}",
  "labelAllowLists": "{ \"HEALTHCARE_ID\": {\"strings\":[],\"regexes\":[\"[a-z]{2}\\\\d{9}\"]} }",
  "enabledModels": [
    "text"
  ],
  "files": [
    {
      "fileId": "text",
      "fileName": "text",
      "fileType": "text",
      "datasetId": "text",
      "numRows": 1,
      "numColumns": 1,
      "piiTypes": [
        "text"
      ],
      "wordCount": 1,
      "redactedWordCount": 1,
      "uploadedTimestamp": {},
      "fileSource": "Local",
      "processingStatus": "text",
      "processingError": "text",
      "mostRecentCompletedJobId": "text"
    }
  ],
  "lastUpdated": {},
  "docXImagePolicy": "Redact",
  "pdfSignaturePolicy": "Redact",
  "docXCommentPolicy": "Remove",
  "docXTablePolicy": "Redact",
  "fileSource": "Local",
  "customPiiEntityIds": [
    "text"
  ],
  "rescanJobs": [
    {
      "id": "text",
      "status": "text",
      "errorMessages": "text",
      "startTime": {},
      "endTime": {},
      "publishedTime": {},
      "datasetFileId": "text",
      "jobType": "DeidentifyFile"
    }
  ]
}

Get all datasets

get

Returns all datasets to which the user has access

Path parameters

includeSynthesizePipelinebooleanRequiredDefault: false

Responses

200

application/json

get

GET /api/Dataset HTTP/1.1
Host: 
Accept: */*

200

[
  {
    "id": "text",
    "name": "text",
    "datasetGeneratorMetadata": "asdfqwer",
    "generatorSetup": "{\"NAME_GIVEN\":\"Redaction\", \"NAME_FAMILY\":\"Redaction\"}",
    "labelBlockLists": "{\"NAME_FAMILY\": {\"strings\":[],\"regexes\":[\".*\\\\s(disease|syndrom|disorder)\"]}}",
    "labelAllowLists": "{ \"HEALTHCARE_ID\": {\"strings\":[],\"regexes\":[\"[a-z]{2}\\\\d{9}\"]} }",
    "enabledModels": [
      "text"
    ],
    "files": [
      {
        "fileId": "text",
        "fileName": "text",
        "fileType": "text",
        "datasetId": "text",
        "numRows": 1,
        "numColumns": 1,
        "piiTypes": [
          "text"
        ],
        "wordCount": 1,
        "redactedWordCount": 1,
        "uploadedTimestamp": {},
        "fileSource": "Local",
        "processingStatus": "text",
        "processingError": "text",
        "mostRecentCompletedJobId": "text"
      }
    ],
    "lastUpdated": {},
    "docXImagePolicy": "Redact",
    "pdfSignaturePolicy": "Redact",
    "docXCommentPolicy": "Remove",
    "docXTablePolicy": "Redact",
    "fileSource": "Local",
    "customPiiEntityIds": [
      "text"
    ],
    "rescanJobs": [
      {
        "id": "text",
        "status": "text",
        "errorMessages": "text",
        "startTime": {},
        "endTime": {},
        "publishedTime": {},
        "datasetFileId": "text",
        "jobType": "DeidentifyFile"
      }
    ]
  }
]

Creates a new dataset

post

Creates a new dataset with the specified configuration. You must specify a unique, non-empty dataset name

Body

all ofOptional

Responses

200

application/json

400

The dataset name must be specified

409

Dataset name is already in use

post

POST /api/Dataset HTTP/1.1
Host: 
Content-Type: application/json
Accept: */*
Content-Length: 15

{
  "name": "text"
}

{
  "id": "text",
  "name": "text",
  "datasetGeneratorMetadata": "asdfqwer",
  "generatorSetup": "{\"NAME_GIVEN\":\"Redaction\", \"NAME_FAMILY\":\"Redaction\"}",
  "labelBlockLists": "{\"NAME_FAMILY\": {\"strings\":[],\"regexes\":[\".*\\\\s(disease|syndrom|disorder)\"]}}",
  "labelAllowLists": "{ \"HEALTHCARE_ID\": {\"strings\":[],\"regexes\":[\"[a-z]{2}\\\\d{9}\"]} }",
  "enabledModels": [
    "text"
  ],
  "files": [
    {
      "fileId": "text",
      "fileName": "text",
      "fileType": "text",
      "datasetId": "text",
      "numRows": 1,
      "numColumns": 1,
      "piiTypes": [
        "text"
      ],
      "wordCount": 1,
      "redactedWordCount": 1,
      "uploadedTimestamp": {},
      "fileSource": "Local",
      "processingStatus": "text",
      "processingError": "text",
      "mostRecentCompletedJobId": "text"
    }
  ],
  "lastUpdated": {},
  "docXImagePolicy": "Redact",
  "pdfSignaturePolicy": "Redact",
  "docXCommentPolicy": "Remove",
  "docXTablePolicy": "Redact",
  "fileSource": "Local",
  "customPiiEntityIds": [
    "text"
  ],
  "rescanJobs": [
    {
      "id": "text",
      "status": "text",
      "errorMessages": "text",
      "startTime": {},
      "endTime": {},
      "publishedTime": {},
      "datasetFileId": "text",
      "jobType": "DeidentifyFile"
    }
  ]
}

Gets the dataset by its Id

get

Returns the dataset specified by the datasetId

Path parameters

datasetIdstringRequired

Responses

200

application/json

404

The dataset cannot be found

get

GET /api/Dataset/{datasetId} HTTP/1.1
Host: 
Accept: */*

{
  "id": "text",
  "name": "text",
  "datasetGeneratorMetadata": "asdfqwer",
  "generatorSetup": "{\"NAME_GIVEN\":\"Redaction\", \"NAME_FAMILY\":\"Redaction\"}",
  "labelBlockLists": "{\"NAME_FAMILY\": {\"strings\":[],\"regexes\":[\".*\\\\s(disease|syndrom|disorder)\"]}}",
  "labelAllowLists": "{ \"HEALTHCARE_ID\": {\"strings\":[],\"regexes\":[\"[a-z]{2}\\\\d{9}\"]} }",
  "enabledModels": [
    "text"
  ],
  "files": [
    {
      "fileId": "text",
      "fileName": "text",
      "fileType": "text",
      "datasetId": "text",
      "numRows": 1,
      "numColumns": 1,
      "piiTypes": [
        "text"
      ],
      "wordCount": 1,
      "redactedWordCount": 1,
      "uploadedTimestamp": {},
      "fileSource": "Local",
      "processingStatus": "text",
      "processingError": "text",
      "mostRecentCompletedJobId": "text"
    }
  ],
  "lastUpdated": {},
  "docXImagePolicy": "Redact",
  "pdfSignaturePolicy": "Redact",
  "docXCommentPolicy": "Remove",
  "docXTablePolicy": "Redact",
  "fileSource": "Local",
  "customPiiEntityIds": [
    "text"
  ],
  "rescanJobs": [
    {
      "id": "text",
      "status": "text",
      "errorMessages": "text",
      "startTime": {},
      "endTime": {},
      "publishedTime": {},
      "datasetFileId": "text",
      "jobType": "DeidentifyFile"
    }
  ]
}

Last updated 3 months ago

Was this helpful?