1 of 1

Create and manage datasets

Textual uses datasets to produce files with sensitive values replaced.

Before you perform these tasks, remember to instantiate the SDK client.

Create and add files to a dataset

To create a new dataset and then upload a file to it, use textual.create_dataset.

dataset = textual.create_dataset('<dataset name>')

To add a file to the dataset, use dataset.add_file. To identify the file, provide the file path and name.

dataset.add_file('<path to file>','<file name>')

To provide the file as IO bytes, you provide the file name and the file bytes. You do not provide a path.

dataset.add_file('<file name>',<file bytes>)

Textual creates the dataset, scans the uploaded file, and redacts the detected values.

Configure a dataset

To change the configuration of a dataset, use dataset.edit.

You can use dataset.edit to change:

The name of the dataset
The handling option for each entity type
Added or excluded values for each entity type

dataset.edit(name='<dataset name>', 
  generator_config={'<entity_type>':'<handling_type>'},
  label_allow_lists={'<entity_type>':LabelCustomList(regexes['<regex>']},
  label_block_lists={'<entity_type>':LabelCustomList(regexes['<regex>']}
)

Get the current status of dataset files

To get the current status of the files in the current dataset, use dataset.describe:

dataset.describe()

The response includes:

The name and identifier of the dataset
The number of files in the dataset
The number of files that are waiting to be processed (scanned and redacted)
The number of files that had errors during processing

For example:

    Dataset: example [879d4c5d-792a-c009-a9a0-60d69be20206]
    Number of Files: 1
    Files that are waiting for processing: 
    Files that encountered errors while processing: 
    Number of Rows: 0
    Number of rows fetched: 0

Get lists of files by status

To get a list of files that have a specific statuse, use the following:

The file list includes:

File identifier and name
Number of rows and columns
Processing status
For failed files, the error
When the file was uploaded

Delete a file from a dataset

To delete a file from a dataset, use dataset.delete_file.

dataset.delete_file('<file identifier>')

Get redacted content for a dataset

To get the redacted content in JSON format for a dataset, use dataset.fetch_all_json():

dataset = textual.get_dataset('<dataset name>')
dataset.fetch_all_json()

For example:

dataset = textual.get_dataset('mydataset')
dataset.fetch_all_json()

The response looks something like:

'[["PERSON_Rz8NtJTPONTKgcB95i Portrait by PERSON_blatU6mAWFCQoSa5E, DATE_TIME_Rcl58 ...]'

Create and manage datasets

Textual uses datasets to produce files with sensitive values replaced.

Before you perform these tasks, remember to instantiate the SDK client.

Create and add files to a dataset

To create a new dataset and then upload a file to it, use textual.create_dataset.

dataset = textual.create_dataset('<dataset name>')

To add a file to the dataset, use dataset.add_file. To identify the file, provide the file path and name.

dataset.add_file('<path to file>','<file name>')

To provide the file as IO bytes, you provide the file name and the file bytes. You do not provide a path.

dataset.add_file('<file name>',<file bytes>)

Textual creates the dataset, scans the uploaded file, and redacts the detected values.

Configure a dataset

To change the configuration of a dataset, use dataset.edit.

You can use dataset.edit to change:

The name of the dataset
The handling option for each entity type
Added or excluded values for each entity type

dataset.edit(name='<dataset name>', 
  generator_config={'<entity_type>':'<handling_type>'},
  label_allow_lists={'<entity_type>':LabelCustomList(regexes['<regex>']},
  label_block_lists={'<entity_type>':LabelCustomList(regexes['<regex>']}
)

Get the current status of dataset files

To get the current status of the files in the current dataset, use dataset.describe:

dataset.describe()

The response includes:

The name and identifier of the dataset
The number of files in the dataset
The number of files that are waiting to be processed (scanned and redacted)
The number of files that had errors during processing

For example:

    Dataset: example [879d4c5d-792a-c009-a9a0-60d69be20206]
    Number of Files: 1
    Files that are waiting for processing: 
    Files that encountered errors while processing: 
    Number of Rows: 0
    Number of rows fetched: 0

Get lists of files by status

To get a list of files that have a specific statuse, use the following:

The file list includes:

File identifier and name
Number of rows and columns
Processing status
For failed files, the error
When the file was uploaded

Delete a file from a dataset

To delete a file from a dataset, use dataset.delete_file.

dataset.delete_file('<file identifier>')

Get redacted content for a dataset

To get the redacted content in JSON format for a dataset, use dataset.fetch_all_json():

dataset = textual.get_dataset('<dataset name>')
dataset.fetch_all_json()

For example:

dataset = textual.get_dataset('mydataset')
dataset.fetch_all_json()

The response looks something like:

'[["PERSON_Rz8NtJTPONTKgcB95i Portrait by PERSON_blatU6mAWFCQoSa5E, DATE_TIME_Rcl58 ...]'