Create and manage datasets

Create and populate a dataset

To create a new dataset and then upload a file to it, use textual.create_dataset. To add files to the dataset, use dataset.upload_then_add_file.

dataset = textual.create_dataset('<dataset name>')
dataset.upload_then_add_file("file path") 

For example:

dataset = textual.create_dataset('mydataset')
dataset.upload_then_add_file("patient_notes.txt") 

Textual creates the dataset, scans the uploaded file, and redacts the detected values.

Get the current status of dataset files

To get the current status of the files in the current dataset, use dataset.describe:

dataset.describe()

The response includes:

  • The name and identifier of the dataset

  • The number of files in the dataset

  • The number of files that are waiting to be processed (scanned and redacted)

  • The number of files that had errors during processing

For example:

    Dataset: example [879d4c5d-792a-c009-a9a0-60d69be20206]
    Number of Files: 1
    Files that are waiting for processing: 
    Files that encountered errors while processing: 
    Number of Rows: 0
    Number of rows fetched: 0

Get redacted content for a dataset

To get the redacted content in JSON format for a dataset, use textual.get_dataset:

dataset = textual.get_dataset('<dataset name>')
dataset.fetch_all_json()

For example:

dataset = textual.get_dataset('mydataset')
dataset.fetch_all_json()

The response looks something like:

'[["PERSON_Rz8NtJTPONTKgcB95i Portrait by PERSON_blatU6mAWFCQoSa5E, DATE_TIME_Rcl58 ...]'

Last updated