# Creating a dataset

{% hint style="info" %}
**Required global permission:** Create datasets
{% endhint %}

When you create a dataset, you specify:

* The type of output to produce
* The source location for the files.
* If the files are in cloud storage, the connection credentials.

## Setting the name, source type, and output type

To create a dataset:

1. On the **Datasets** page, click **Create a Dataset**.

<figure><img src="https://3072847115-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FvOPn7KQptPWmS5iKg5P0%2Fuploads%2FbIpfGjtuosdf0euRKG5p%2FDatasetCreate.png?alt=media&#x26;token=8fafa71c-cdeb-41e7-8ecf-b735ef777575" alt="Dataset creation panel"><figcaption><p>Dataset creation panel</p></figcaption></figure>

2. In the **Dataset Name** field, provide a name for the dataset.
3. Under **Output Format**, select the type of output to generate.
4. Under **File Source**, select the source type.\
   \
   If the source type is a cloud storage option, then provide the required credentials.
5. Click **Save**.
6. For cloud storage datasets:
   1. Textual prompts you to configure the initial file selection. For more information, go to [files-cloud-storage](https://docs.tonic.ai/textual/dataset-files/files-cloud-storage "mention").
   2. After you select the files, it prompts you to select an output location. For more information, go to [changing-cloud-storage-credentials-and-output-location](https://docs.tonic.ai/textual/datasets-create-manage/changing-cloud-storage-credentials-and-output-location "mention").

## Providing credentials for Amazon S3

{% hint style="info" %}
On self-hosted instances, we are deprecating the options to provide credentials on the dataset panel and read credentials from environment variables.

Instead, the credentials must be included in the configuration of an IAM role that has the correct permissions.
{% endhint %}

If the source type is Amazon S3, provide the credentials to use to connect to Amazon S3.

<figure><img src="https://3072847115-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FvOPn7KQptPWmS5iKg5P0%2Fuploads%2FG1YyxSLSF3l7xgjXtJFz%2FDatasetCreateS3Credentials.png?alt=media&#x26;token=13991069-c499-4296-8cb6-3b44e76f5878" alt=""><figcaption><p>Credentials fields for an Amazon S3 dataset</p></figcaption></figure>

1. For a self-hosted instance, select the location of the credentials. You can either provide credentials manually, or use credentials that are configured in environment variables.\
   \
   Note that after you save the dataset, you cannot change the selection.
2. If you are not using environment variables, then in the **Access Key** field, provide an AWS access key that is associated with an IAM user or role. For an example of a role that has the required permissions for an Amazon S3 dataset, go to [pipelines-example-iam-roles](https://docs.tonic.ai/textual/textual-install-administer/configuring-textual/enable-and-configure-textual-features/pipelines-example-iam-roles "mention").
3. In the **Access Secret** field, provide the secret key that is associated with the access key.
4. From the **Region** dropdown list, select the AWS Region to send the authentication request to.
5. In the **Session Token** field, provide the session token to use for the authentication request.
6. To test the credentials, click **Test AWS Connection**.
7. By default, connections to Amazon S3 use Amazon S3 encryption.\
   \
   To instead use AWS KMS encryption:

   1. Click **Show Advanced Options**.
   2. From the **Server-Side Encryption Type** dropdown list, select **AWS KMS**.
   3. In the **Server-side Encryption AWS KMS ID** field, provide the KMS key ID.\
      \
      Note that if the KMS key doesn't exist in the same account that issues the command, you must provide the full key ARN instead of the key ID.

   Note that after you save the new dataset, you cannot change the encryption type.
8. Click **Save**.\
   \
   Textual prompts you to [select the dataset files](https://docs.tonic.ai/textual/dataset-files/files-cloud-storage).

## Providing Azure credentials <a href="#dataset-new-azure-credentials" id="dataset-new-azure-credentials"></a>

<figure><img src="https://3072847115-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FvOPn7KQptPWmS5iKg5P0%2Fuploads%2FkIqfFWSRDiSsk409fEVK%2FDatasetCreateAzureCredentials.png?alt=media&#x26;token=c9710e70-3c8c-4834-b98a-91e00f814a15" alt=""><figcaption><p>Credentials fields for an Azure dataset</p></figcaption></figure>

If the source type is Azure, provide the connection information:

1. In the **Account Name** field, provide the name of your Azure account.
2. In the **Account Key** field, provide the access key for your Azure account.
3. To test the connection, click **Test Azure Connection**.
4. Click **Save**.\
   \
   Textual prompts you to [select the dataset files](https://docs.tonic.ai/textual/dataset-files/files-cloud-storage).

## Providing SharePoint credentials <a href="#dataset-new-sharepoint-credentials" id="dataset-new-sharepoint-credentials"></a>

<figure><img src="https://3072847115-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FvOPn7KQptPWmS5iKg5P0%2Fuploads%2FdeiWpe3EA9tPHAtf5Jig%2FDatasetCreateSharePointCredentials.png?alt=media&#x26;token=f716447b-f6a1-49fb-a0a1-4a91301eb52c" alt=""><figcaption><p>Credentials fields for a SharePoint dataset</p></figcaption></figure>

If the source type is SharePoint, provide the credentials for the Entra ID application.

The credentials must have the following application permissions (not delegated permissions):

* `Files.Read.All` -  To see the SharePoint files
* `Files.ReadWrite.All` -To write redacted files and metadata back to SharePoint
* `Sites.ReadWrite.All` - To view and modify the SharePoint sites

To provide the credentials:

1. In the **Tenant ID** field, provide the SharePoint tenant identifier for the SharePoint site.
2. In the **Client ID** field, provide the client identifier for the SharePoint site.
3. In the **Client Secret** field, provide the secret to use to connect to the SharePoint site.
4. To test the connection, click **Test SharePoint Connection**.
5. Click **Save**.\
   \
   Textual prompts you to [select the dataset files](https://docs.tonic.ai/textual/dataset-files/files-cloud-storage).
