# Create and manage datasets

Textual uses datasets to produce files with sensitive values replaced.

Before you perform these tasks, remember to [instantiate the SDK client](https://docs.tonic.ai/textual/tonic-textual-api/textual-api-instantiate-sdk).

## Get your list of datasets <a href="#get-all-datasets" id="get-all-datasets"></a>

To get the complete list of datasets that you own, use [`textual.get_all_datasets`](https://tonic-textual-sdk.readthedocs-hosted.com/en/latest/redact/api.html#tonic_textual.redact_api.TextualNer.get_all_datasets).

```
datasets = textual.get_all_datasets()
```

## Create and add files to a dataset <a href="#textual-api-create-populate-dataset" id="textual-api-create-populate-dataset"></a>

{% hint style="info" %}
**Required global permission:** Create datasets

**Required dataset permission:** Upload files to a dataset
{% endhint %}

To create a new dataset and then upload a file to it, use [`textual.create_dataset`](https://tonic-textual-sdk.readthedocs-hosted.com/en/latest/redact/api.html#tonic_textual.redact_api.TextualNer.create_dataset).&#x20;

```python
dataset = textual.create_dataset('<dataset name>')
```

To add a file to the dataset, use [`dataset.add_file`](https://tonic-textual-sdk.readthedocs-hosted.com/en/latest/datasets/api.html#tonic_textual.classes.dataset.Dataset.add_file). To identify the file, provide the file path and name.

```python
dataset.add_file('<path to file>','<file name>') 
```

To provide the file as IO bytes, you provide the file name and the file bytes. You do not provide a path.

```
dataset.add_file('<file name>',<file bytes>) 
```

Textual creates the dataset, scans the uploaded file, and redacts the detected values.

## Configure a dataset <a href="#textual-sdk-dataset-configure" id="textual-sdk-dataset-configure"></a>

{% hint style="info" %}
**Required dataset permission:** Edit dataset settings
{% endhint %}

To change the configuration of a dataset, use [`dataset.edit`](https://tonic-textual-sdk.readthedocs-hosted.com/en/latest/datasets/api.html#tonic_textual.classes.dataset.Dataset.edit).

You can use `dataset.edit` to change:

* The name of the dataset
* The [handling option for each entity type](https://docs.tonic.ai/textual/tonic-textual-api/api-redaction-entity-type-handling#specifying-the-handling-option-for-entity-types)
* [Added or excluded values for each entity type](https://docs.tonic.ai/textual/tonic-textual-api/api-redaction-entity-type-handling#providing-added-and-excluded-values-for-entity-types)

```
dataset.edit(name='<dataset name>', 
  generator_config={'<entity_type>':'<handling_type>'},
  label_allow_lists={'<entity_type>':LabelCustomList(regexes['<regex>']},
  label_block_lists={'<entity_type>':LabelCustomList(regexes['<regex>']}
)
```

Alternatively, instead of specifying the configuration, you can use the `copy_from_dataset` parameter to indicate to copy the configuration from another dataset.

## Get the current status of dataset files <a href="#textual-api-get-dataset-file-status" id="textual-api-get-dataset-file-status"></a>

{% hint style="info" %}
**Required dataset permission:** Preview redacted dataset files
{% endhint %}

To get the current status of the files in the current dataset, use [`dataset.describe`](https://tonic-textual-sdk.readthedocs-hosted.com/en/latest/datasets/api.html#tonic_textual.classes.dataset.Dataset.describe):

```python
dataset.describe()
```

The response includes:

* The name and identifier of the dataset
* The number of files in the dataset
* The number of files that are waiting to be processed (scanned and redacted)
* The number of files that had errors during processing

For example:

```python
    Dataset: example [879d4c5d-792a-c009-a9a0-60d69be20206]
    Number of Files: 1
    Files that are waiting for processing: 
    Files that encountered errors while processing: 
    Number of Rows: 0
    Number of rows fetched: 0
```

## Get lists of files by status <a href="#textual-sdk-dataset-file-lists" id="textual-sdk-dataset-file-lists"></a>

{% hint style="info" %}
**Required dataset permission:** Preview redacted dataset files
{% endhint %}

To get a list of files that have a specific status, use the following:

* [`dataset.get_failed_files`](https://tonic-textual-sdk.readthedocs-hosted.com/en/latest/datasets/api.html#tonic_textual.classes.dataset.Dataset.get_failed_files)
* [`dataset.get_running_files`](https://tonic-textual-sdk.readthedocs-hosted.com/en/latest/datasets/api.html#tonic_textual.classes.dataset.Dataset.get_running_files)
* [`dataset.get_queued_files`](https://tonic-textual-sdk.readthedocs-hosted.com/en/latest/datasets/api.html#tonic_textual.classes.dataset.Dataset.get_queued_files)
* [`dataset.get_processed_files`](https://tonic-textual-sdk.readthedocs-hosted.com/en/latest/datasets/api.html#tonic_textual.classes.dataset.Dataset.get_processed_files)

The file list includes:

* File identifier and name
* Number of rows and columns
* Processing status
* For failed files, the error
* When the file was uploaded

## Delete a file from a dataset <a href="#textual-sdk-dataset-delete-file" id="textual-sdk-dataset-delete-file"></a>

{% hint style="info" %}
**Required dataset permission:** Delete files from a dataset
{% endhint %}

To delete a file from a dataset, use [`dataset.delete_file`](https://tonic-textual-sdk.readthedocs-hosted.com/en/latest/datasets/api.html#tonic_textual.classes.dataset.Dataset.delete_file).

```python
dataset.delete_file('<file identifier>')
```

## Get redacted content for a dataset <a href="#textual-api-get-redacted-dataset-content" id="textual-api-get-redacted-dataset-content"></a>

{% hint style="info" %}
**Required dataset permission:** Download redacted dataset files
{% endhint %}

To get the redacted content in JSON format for a dataset, use [`dataset.fetch_all_json()`](https://tonic-textual-sdk.readthedocs-hosted.com/en/latest/datasets/api.html#tonic_textual.classes.dataset.Dataset.fetch_all_json):

```python
dataset = textual.get_dataset('<dataset name>')
dataset.fetch_all_json()
```

For example:

```python
dataset = textual.get_dataset('mydataset')
dataset.fetch_all_json()
```

The response looks something like:

{% code overflow="wrap" %}

```json
'[["PERSON Portrait by PERSON, DATE_TIME ...]'
```

{% endcode %}
