Manage dataset files

Use the REST API to manage dataset files.

Upload dataset files

post

Upload a file to a dataset for processing.

Required Permissions

  • Dataset: Upload Files
Path parameters
datasetIdstringRequired
Body
filestring · binaryOptional

File to upload

Responses
200
OK
application/json
post
POST /api/Dataset/{datasetId}/files/upload HTTP/1.1
Host: 
Content-Type: multipart/form-data
Accept: */*
Content-Length: 288

{
  "document": {
    "fileName": "example.txt",
    "csvConfig": {
      "numColumns": 1,
      "hasHeader": true,
      "escapeChar": "text",
      "quoteChar": "text",
      "delimiter": "text",
      "nullChar": "text"
    },
    "datasetId": "6a01360f-78fc-9f2f-efae-c5e1461e9c1et",
    "customPiiEntityIds": [
      "CUSTOM_ENTITY_1",
      "CUSTOM_ENTITY_2"
    ]
  },
  "file": "binary"
}
200

OK

{
  "updatedDataset": {
    "id": "text",
    "name": "text",
    "generatorMetadata": "asdfqwer",
    "generatorSetup": "{\"NAME_GIVEN\":\"Redaction\", \"NAME_FAMILY\":\"Redaction\"}",
    "labelBlockLists": "{\"NAME_FAMILY\": {\"strings\":[],\"regexes\":[\".*\\\\s(disease|syndrom|disorder)\"]}}",
    "labelAllowLists": "{ \"HEALTHCARE_ID\": {\"strings\":[],\"regexes\":[\"[a-z]{2}\\\\d{9}\"]} }",
    "enabledModels": [
      "text"
    ],
    "tags": [
      "text"
    ],
    "files": [
      {
        "fileId": "text",
        "fileName": "text",
        "fileType": "text",
        "datasetId": "text",
        "numRows": 1,
        "numColumns": 1,
        "piiTypes": [
          "text"
        ],
        "wordCount": 1,
        "redactedWordCount": 1,
        "uploadedTimestamp": {},
        "fileSource": "Local",
        "processingStatus": "text",
        "processingError": "text",
        "mostRecentCompletedJobId": "text"
      }
    ],
    "lastUpdated": {},
    "created": {},
    "creatorUser": {
      "id": "text",
      "userName": "text",
      "firstName": "text",
      "lastName": "text"
    },
    "docXImagePolicy": "Redact",
    "pdfSignaturePolicy": "Redact",
    "pdfSynthModePolicy": "V1",
    "docXCommentPolicy": "Remove",
    "docXTablePolicy": "Redact",
    "fileSource": "Local",
    "customPiiEntityIds": [
      "text"
    ],
    "operations": [
      "HasAccess"
    ],
    "rescanJobs": [
      {
        "id": "text",
        "status": "text",
        "errorMessages": "text",
        "startTime": {},
        "endTime": {},
        "publishedTime": {},
        "datasetFileId": "text",
        "jobType": "DeidentifyFile"
      }
    ],
    "fileSourceExternalCredential": {
      "fileSource": "Local",
      "credential": {}
    },
    "awsCredentialSource": "text",
    "outputPath": "text"
  },
  "uploadedFileId": "text"
}

Download a dataset file

get

Downloads the specified file from the dataset. The downloaded file is redacted based on the dataset configuration.

Required Permissions

  • Dataset: Download Redacted Files
Path parameters
datasetIdstringRequired
fileIdstringRequired
Responses
200
OK
application/octet-stream
Responsestring · binary
get
GET /api/Dataset/{datasetId}/files/{fileId}/download HTTP/1.1
Host: 
Accept: */*
binary

Download all dataset files

get

Downloads all files from the specified dataset. The downloaded files are redacted based on the dataset configuration.

Required Permissions

  • Dataset: Download Redacted Files
Path parameters
datasetIdstringRequired
Responses
200
OK
application/json
Responsestring · binary
get
GET /api/Dataset/{datasetId}/files/download_all HTTP/1.1
Host: 
Accept: */*
binary

Creates a new dataset

put

Edits a dataset with the specified configuration

Query parameters
shouldRescanbooleanOptional
Body
all ofOptional
Responses
200
OK
application/json
put
PUT /api/Dataset HTTP/1.1
Host: 
Content-Type: application/json
Accept: */*
Content-Length: 507

{
  "id": "text",
  "name": "text",
  "generatorSetup": "{\"NAME_GIVEN\":\"Redaction\", \"NAME_FAMILY\":\"Redaction\"}",
  "datasetGeneratorMetadata": {
    "ANY_ADDITIONAL_PROPERTY": {}
  },
  "labelBlockLists": "{\"NAME_FAMILY\": {\"strings\":[],\"regexes\":[\".*\\\\s(disease|syndrom|disorder)\"]}}",
  "labelAllowLists": "{ \"HEALTHCARE_ID\": {\"strings\":[],\"regexes\":[\"[a-z]{2}\\\\d{9}\"]} }",
  "enabledModels": [
    "text"
  ],
  "docXImagePolicy": "Redact",
  "pdfSignaturePolicy": "Redact",
  "docXCommentPolicy": "Remove",
  "docXTablePolicy": "Redact"
}
{
  "id": "text",
  "name": "text",
  "datasetGeneratorMetadata": "asdfqwer",
  "generatorSetup": "{\"NAME_GIVEN\":\"Redaction\", \"NAME_FAMILY\":\"Redaction\"}",
  "labelBlockLists": "{\"NAME_FAMILY\": {\"strings\":[],\"regexes\":[\".*\\\\s(disease|syndrom|disorder)\"]}}",
  "labelAllowLists": "{ \"HEALTHCARE_ID\": {\"strings\":[],\"regexes\":[\"[a-z]{2}\\\\d{9}\"]} }",
  "enabledModels": [
    "text"
  ],
  "files": [
    {
      "fileId": "text",
      "fileName": "text",
      "fileType": "text",
      "datasetId": "text",
      "numRows": 1,
      "numColumns": 1,
      "piiTypes": [
        "text"
      ],
      "wordCount": 1,
      "redactedWordCount": 1,
      "uploadedTimestamp": {},
      "fileSource": "Local",
      "processingStatus": "text",
      "processingError": "text",
      "mostRecentCompletedJobId": "text"
    }
  ],
  "lastUpdated": {},
  "docXImagePolicy": "Redact",
  "pdfSignaturePolicy": "Redact",
  "docXCommentPolicy": "Remove",
  "docXTablePolicy": "Redact",
  "fileSource": "Local",
  "customPiiEntityIds": [
    "text"
  ],
  "rescanJobs": [
    {
      "id": "text",
      "status": "text",
      "errorMessages": "text",
      "startTime": {},
      "endTime": {},
      "publishedTime": {},
      "datasetFileId": "text",
      "jobType": "DeidentifyFile"
    }
  ]
}

Get all datasets

get

Returns all datasets to which the user has access

Path parameters
includeSynthesizePipelinebooleanRequiredDefault: false
Responses
200
OK
application/json
get
GET /api/Dataset HTTP/1.1
Host: 
Accept: */*
200

OK

[
  {
    "id": "text",
    "name": "text",
    "datasetGeneratorMetadata": "asdfqwer",
    "generatorSetup": "{\"NAME_GIVEN\":\"Redaction\", \"NAME_FAMILY\":\"Redaction\"}",
    "labelBlockLists": "{\"NAME_FAMILY\": {\"strings\":[],\"regexes\":[\".*\\\\s(disease|syndrom|disorder)\"]}}",
    "labelAllowLists": "{ \"HEALTHCARE_ID\": {\"strings\":[],\"regexes\":[\"[a-z]{2}\\\\d{9}\"]} }",
    "enabledModels": [
      "text"
    ],
    "files": [
      {
        "fileId": "text",
        "fileName": "text",
        "fileType": "text",
        "datasetId": "text",
        "numRows": 1,
        "numColumns": 1,
        "piiTypes": [
          "text"
        ],
        "wordCount": 1,
        "redactedWordCount": 1,
        "uploadedTimestamp": {},
        "fileSource": "Local",
        "processingStatus": "text",
        "processingError": "text",
        "mostRecentCompletedJobId": "text"
      }
    ],
    "lastUpdated": {},
    "docXImagePolicy": "Redact",
    "pdfSignaturePolicy": "Redact",
    "docXCommentPolicy": "Remove",
    "docXTablePolicy": "Redact",
    "fileSource": "Local",
    "customPiiEntityIds": [
      "text"
    ],
    "rescanJobs": [
      {
        "id": "text",
        "status": "text",
        "errorMessages": "text",
        "startTime": {},
        "endTime": {},
        "publishedTime": {},
        "datasetFileId": "text",
        "jobType": "DeidentifyFile"
      }
    ]
  }
]

Creates a new dataset

post

Creates a new dataset with the specified configuration. You must specify a unique, non-empty dataset name

Body
all ofOptional
Responses
200
OK
application/json
post
POST /api/Dataset HTTP/1.1
Host: 
Content-Type: application/json
Accept: */*
Content-Length: 15

{
  "name": "text"
}
{
  "id": "text",
  "name": "text",
  "datasetGeneratorMetadata": "asdfqwer",
  "generatorSetup": "{\"NAME_GIVEN\":\"Redaction\", \"NAME_FAMILY\":\"Redaction\"}",
  "labelBlockLists": "{\"NAME_FAMILY\": {\"strings\":[],\"regexes\":[\".*\\\\s(disease|syndrom|disorder)\"]}}",
  "labelAllowLists": "{ \"HEALTHCARE_ID\": {\"strings\":[],\"regexes\":[\"[a-z]{2}\\\\d{9}\"]} }",
  "enabledModels": [
    "text"
  ],
  "files": [
    {
      "fileId": "text",
      "fileName": "text",
      "fileType": "text",
      "datasetId": "text",
      "numRows": 1,
      "numColumns": 1,
      "piiTypes": [
        "text"
      ],
      "wordCount": 1,
      "redactedWordCount": 1,
      "uploadedTimestamp": {},
      "fileSource": "Local",
      "processingStatus": "text",
      "processingError": "text",
      "mostRecentCompletedJobId": "text"
    }
  ],
  "lastUpdated": {},
  "docXImagePolicy": "Redact",
  "pdfSignaturePolicy": "Redact",
  "docXCommentPolicy": "Remove",
  "docXTablePolicy": "Redact",
  "fileSource": "Local",
  "customPiiEntityIds": [
    "text"
  ],
  "rescanJobs": [
    {
      "id": "text",
      "status": "text",
      "errorMessages": "text",
      "startTime": {},
      "endTime": {},
      "publishedTime": {},
      "datasetFileId": "text",
      "jobType": "DeidentifyFile"
    }
  ]
}

Gets the dataset by its Id

get

Returns the dataset specified by the datasetId

Path parameters
datasetIdstringRequired
Responses
200
OK
application/json
get
GET /api/Dataset/{datasetId} HTTP/1.1
Host: 
Accept: */*
{
  "id": "text",
  "name": "text",
  "datasetGeneratorMetadata": "asdfqwer",
  "generatorSetup": "{\"NAME_GIVEN\":\"Redaction\", \"NAME_FAMILY\":\"Redaction\"}",
  "labelBlockLists": "{\"NAME_FAMILY\": {\"strings\":[],\"regexes\":[\".*\\\\s(disease|syndrom|disorder)\"]}}",
  "labelAllowLists": "{ \"HEALTHCARE_ID\": {\"strings\":[],\"regexes\":[\"[a-z]{2}\\\\d{9}\"]} }",
  "enabledModels": [
    "text"
  ],
  "files": [
    {
      "fileId": "text",
      "fileName": "text",
      "fileType": "text",
      "datasetId": "text",
      "numRows": 1,
      "numColumns": 1,
      "piiTypes": [
        "text"
      ],
      "wordCount": 1,
      "redactedWordCount": 1,
      "uploadedTimestamp": {},
      "fileSource": "Local",
      "processingStatus": "text",
      "processingError": "text",
      "mostRecentCompletedJobId": "text"
    }
  ],
  "lastUpdated": {},
  "docXImagePolicy": "Redact",
  "pdfSignaturePolicy": "Redact",
  "docXCommentPolicy": "Remove",
  "docXTablePolicy": "Redact",
  "fileSource": "Local",
  "customPiiEntityIds": [
    "text"
  ],
  "rescanJobs": [
    {
      "id": "text",
      "status": "text",
      "errorMessages": "text",
      "startTime": {},
      "endTime": {},
      "publishedTime": {},
      "datasetFileId": "text",
      "jobType": "DeidentifyFile"
    }
  ]
}

Last updated

Was this helpful?