# Create and manage datasets

Textual uses datasets to produce files with sensitive values replaced.

Before you perform these tasks, remember to [instantiate the SDK client](https://docs.tonic.ai/textual/tonic-textual-api/textual-api-instantiate-sdk).

## Get your list of datasets <a href="#get-all-datasets" id="get-all-datasets"></a>

To get the complete list of datasets that you own, use [`textual.get_all_datasets`](https://tonic-textual-sdk.readthedocs-hosted.com/en/latest/redact/api.html#tonic_textual.redact_api.TextualNer.get_all_datasets).

```
datasets = textual.get_all_datasets()
```

## Create and add files to a dataset <a href="#textual-api-create-populate-dataset" id="textual-api-create-populate-dataset"></a>

{% hint style="info" %}
**Required global permission:** Create datasets

**Required dataset permission:** Upload files to a dataset
{% endhint %}

To create a new dataset and then upload a file to it, use [`textual.create_dataset`](https://tonic-textual-sdk.readthedocs-hosted.com/en/latest/redact/api.html#tonic_textual.redact_api.TextualNer.create_dataset).&#x20;

```python
dataset = textual.create_dataset('<dataset name>')
```

To add a file to the dataset, use [`dataset.add_file`](https://tonic-textual-sdk.readthedocs-hosted.com/en/latest/datasets/api.html#tonic_textual.classes.dataset.Dataset.add_file). To identify the file, provide the file path and name.

```python
dataset.add_file('<path to file>','<file name>') 
```

To provide the file as IO bytes, you provide the file name and the file bytes. You do not provide a path.

```
dataset.add_file('<file name>',<file bytes>) 
```

Textual creates the dataset, scans the uploaded file, and redacts the detected values.

## Configure a dataset <a href="#textual-sdk-dataset-configure" id="textual-sdk-dataset-configure"></a>

{% hint style="info" %}
**Required dataset permission:** Edit dataset settings
{% endhint %}

To change the configuration of a dataset, use [`dataset.edit`](https://tonic-textual-sdk.readthedocs-hosted.com/en/latest/datasets/api.html#tonic_textual.classes.dataset.Dataset.edit).

You can use `dataset.edit` to change:

* The name of the dataset
* The [handling option for each entity type](https://docs.tonic.ai/textual/tonic-textual-api/api-redaction-entity-type-handling#specifying-the-handling-option-for-entity-types)
* [Added or excluded values for each entity type](https://docs.tonic.ai/textual/tonic-textual-api/api-redaction-entity-type-handling#providing-added-and-excluded-values-for-entity-types)

```
dataset.edit(name='<dataset name>', 
  generator_config={'<entity_type>':'<handling_type>'},
  label_allow_lists={'<entity_type>':LabelCustomList(regexes['<regex>']},
  label_block_lists={'<entity_type>':LabelCustomList(regexes['<regex>']}
)
```

Alternatively, instead of specifying the configuration, you can use the `copy_from_dataset` parameter to indicate to copy the configuration from another dataset.

## Get the current status of dataset files <a href="#textual-api-get-dataset-file-status" id="textual-api-get-dataset-file-status"></a>

{% hint style="info" %}
**Required dataset permission:** Preview redacted dataset files
{% endhint %}

To get the current status of the files in the current dataset, use [`dataset.describe`](https://tonic-textual-sdk.readthedocs-hosted.com/en/latest/datasets/api.html#tonic_textual.classes.dataset.Dataset.describe):

```python
dataset.describe()
```

The response includes:

* The name and identifier of the dataset
* The number of files in the dataset
* The number of files that are waiting to be processed (scanned and redacted)
* The number of files that had errors during processing

For example:

```python
    Dataset: example [879d4c5d-792a-c009-a9a0-60d69be20206]
    Number of Files: 1
    Files that are waiting for processing: 
    Files that encountered errors while processing: 
    Number of Rows: 0
    Number of rows fetched: 0
```

## Get lists of files by status <a href="#textual-sdk-dataset-file-lists" id="textual-sdk-dataset-file-lists"></a>

{% hint style="info" %}
**Required dataset permission:** Preview redacted dataset files
{% endhint %}

To get a list of files that have a specific status, use the following:

* [`dataset.get_failed_files`](https://tonic-textual-sdk.readthedocs-hosted.com/en/latest/datasets/api.html#tonic_textual.classes.dataset.Dataset.get_failed_files)
* [`dataset.get_running_files`](https://tonic-textual-sdk.readthedocs-hosted.com/en/latest/datasets/api.html#tonic_textual.classes.dataset.Dataset.get_running_files)
* [`dataset.get_queued_files`](https://tonic-textual-sdk.readthedocs-hosted.com/en/latest/datasets/api.html#tonic_textual.classes.dataset.Dataset.get_queued_files)
* [`dataset.get_processed_files`](https://tonic-textual-sdk.readthedocs-hosted.com/en/latest/datasets/api.html#tonic_textual.classes.dataset.Dataset.get_processed_files)

The file list includes:

* File identifier and name
* Number of rows and columns
* Processing status
* For failed files, the error
* When the file was uploaded

## Delete a file from a dataset <a href="#textual-sdk-dataset-delete-file" id="textual-sdk-dataset-delete-file"></a>

{% hint style="info" %}
**Required dataset permission:** Delete files from a dataset
{% endhint %}

To delete a file from a dataset, use [`dataset.delete_file`](https://tonic-textual-sdk.readthedocs-hosted.com/en/latest/datasets/api.html#tonic_textual.classes.dataset.Dataset.delete_file).

```python
dataset.delete_file('<file identifier>')
```

## Get redacted content for a dataset <a href="#textual-api-get-redacted-dataset-content" id="textual-api-get-redacted-dataset-content"></a>

{% hint style="info" %}
**Required dataset permission:** Download redacted dataset files
{% endhint %}

To get the redacted content in JSON format for a dataset, use [`dataset.fetch_all_json()`](https://tonic-textual-sdk.readthedocs-hosted.com/en/latest/datasets/api.html#tonic_textual.classes.dataset.Dataset.fetch_all_json):

```python
dataset = textual.get_dataset('<dataset name>')
dataset.fetch_all_json()
```

For example:

```python
dataset = textual.get_dataset('mydataset')
dataset.fetch_all_json()
```

The response looks something like:

{% code overflow="wrap" %}

```json
'[["PERSON Portrait by PERSON, DATE_TIME ...]'
```

{% endcode %}


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://docs.tonic.ai/textual/tonic-textual-api/datasets-redaction/textual-api-datasets.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
