# Selecting and reviewing test data

For a model-based custom entity, you first select a set of test data. You annotate the test data to identify all of the entity values that are in those files.

The test data is a small set of files - up to around 5 files -  that contain typical entity type values. Each file also should be relatively small - no more than 5,000 words.

For example, for an entity type that identifies health conditions, you might select 5 or 6 medical appointment reports that contain a variety of typical values.

When you iterate over the model guidelines, Textual uses those guidelines to scan the files, and generates scores to indicate how well its detections matched the set of values that you established during your review.

When a model finishes training, Textual uses the model to scan the test files, and generates a score to indicate how well its detections matched your established values.

## **Selecting the initial set of test files**

On the **Test data setup** page, to select the files, you can do a combination of:

* Paste text into a text field.
* Upload files from a local system.
* Select files from one and only one of the following cloud storage options:
  * An S3 bucket
  * Azure Blob Storage
  * A SharePoint repository

<figure><img src="https://3072847115-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FvOPn7KQptPWmS5iKg5P0%2Fuploads%2FAi1XfMTgT9f5Swe33yaD%2FCustomEntityTypeModelTestDataInitialDisplay.png?alt=media&#x26;token=f2c2e69f-1f4e-46be-90d1-91e9d24f92db" alt=""><figcaption><p>Test Data Setup page with no files selected</p></figcaption></figure>

After you select the initial set of test files, Textual uses the draft guidelines that you provided to identify entity values in the files.

### **Pasting text directly**

To paste text directly:

1. Click **Sample Text**.

<figure><img src="https://3072847115-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FvOPn7KQptPWmS5iKg5P0%2Fuploads%2FR8JkvjScKdVLEkQOwIry%2FCustomEntityTypeModelSampleText.png?alt=media&#x26;token=481c4453-8cca-4375-aa1f-777582e2ffd5" alt=""><figcaption><p>Sample Text field to create a test file from pasted text</p></figcaption></figure>

2. In the file, paste the text.
3. Click **Next**.

### **Uploading local files**

To upload local files for the draft model to annotate:

1. Click **File Upload**.
2. Click **Upload Files**.
3. Search for and select the files.
4. Click **Next**.

### **Providing Amazon S3 credentials**

To provide credentials for Amazon S3:

1. Click **Amazon S3**.

<figure><img src="https://3072847115-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FvOPn7KQptPWmS5iKg5P0%2Fuploads%2FsbHr2jo9sD1D5S8T02xl%2FCustomEntityTypeModelAWSCredentials.png?alt=media&#x26;token=669d38f6-4c4b-4407-8b06-86927f9e9d22" alt=""><figcaption><p>Credentials fields to connect to Amazon S3</p></figcaption></figure>

2. For a self-hosted instance, select the location of the credentials. You can either provide credentials manually, or use credentials that are configured in environment variables.\
   \
   Note that after you save the credentials, you cannot change the selection.
3. If you are not using environment variables, then in the **Access Key** field, provide an AWS access key that is associated with an IAM user or role.\
   \
   For an example of a role that has the required permissions for an Amazon S3 dataset, go to [pipelines-example-iam-roles](https://docs.tonic.ai/textual/textual-install-administer/configuring-textual/enable-and-configure-textual-features/pipelines-example-iam-roles "mention").
4. In the **Access Secret** field, provide the secret key that is associated with the access key.
5. From the **Region** dropdown list, select the AWS Region to send the authentication request to.
6. In the **Session Token** field, provide the session token to use for the authentication request.
7. To test the credentials, click **Test AWS Connection**.
8. Click **Next**.\
   \
   Textual prompts you to select the files.

### **Providing Azure credentials**

To provide credentials for Azure:

1. Click **Azure**.

<figure><img src="https://3072847115-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FvOPn7KQptPWmS5iKg5P0%2Fuploads%2FLhLDtBI398SzxhW1YWxC%2FCustomEntityTypeModelAzureCredentials.png?alt=media&#x26;token=96efcc0c-3fe6-4c88-9afe-184a97e299da" alt=""><figcaption><p>Credentials fields to connect to Azure</p></figcaption></figure>

2. In the **Account Name** field, provide the name of your Azure account.
3. In the **Account Key** field, provide the access key for your Azure account.
4. To test the connection, click **Test Azure Connection**.
5. Click **Next**.\
   \
   Textual prompts you to select the files.

### **Providing SharePoint credentials**

For SharePoint, click **SharePoint**, then provide the credentials for the Entra ID application.

<figure><img src="https://3072847115-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FvOPn7KQptPWmS5iKg5P0%2Fuploads%2FWnLRyLibci9rXY2xsbuz%2FCustomEntityTypeModelSharePointCredentials.png?alt=media&#x26;token=eea27c49-50bb-4ec9-a68d-5b966b5ced80" alt=""><figcaption><p>Credentials fields to connect to SharePoint</p></figcaption></figure>

The credentials must have the following application permissions (not delegated permissions):

* `Files.Read.All` -  To see the SharePoint files
* `Files.ReadWrite.All` -To write redacted files and metadata back to SharePoint
* `Sites.ReadWrite.All` - To view and modify the SharePoint sites

To provide the credentials:

1. In the **Tenant ID** field, provide the SharePoint tenant identifier for the SharePoint site.
2. In the **Client ID** field, provide the client identifier for the SharePoint site.
3. In the **Client Secret** field, provide the secret to use to connect to the SharePoint site.
4. To test the connection, click **Test SharePoint Connection**.
5. Click **Next**.\
   \
   Textual prompts you to select the files.

### **Selecting cloud storage files**

After you provide the credentials, you select the files to use.

For test data, you cannot select folders. You must select individual files.

## **Viewing the file list**

On the **Test data setup** page:

* The list of test files displays at the left.
* The content of the selected file displays at the right, with the entity values highlighted.

<figure><img src="https://3072847115-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FvOPn7KQptPWmS5iKg5P0%2Fuploads%2Fu9Yjf3cHFDAGpYfW8jmJ%2FCustomEntityTypeModelTestDataPopulated.png?alt=media&#x26;token=e8140380-dd3b-4c5e-9eff-101ba9d7a3d9" alt=""><figcaption><p>Test Data Setup page with selected files</p></figcaption></figure>

## **Adding data to the list**

You can add to the test data at any time, including when you are iterating over the model guidelines.

To add data, on the **Test data setup** page:

1. Click **Add test sample**.

<figure><img src="https://3072847115-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FvOPn7KQptPWmS5iKg5P0%2Fuploads%2F28qorCeTaz2itOhfCVgG%2FCustomEntityTypeModelAddTestData.png?alt=media&#x26;token=90eb420d-48e7-42c5-9120-2a0bace23b5a" alt=""><figcaption><p>Add test sample dropdown list with source type options to add test files</p></figcaption></figure>

2. From the sample type menu, select the source type for the new data.\
   \
   The **Write sample text** and **Upload Files** options are always available.\
   \
   If you previously selected data from a cloud storage solution, then that cloud storage solution is available. You cannot add files from a different cloud storage solution. For example, if you initially selected files from Amazon S3, then you cannot select files from Azure or SharePoint.\
   \
   If you did not previously select data from a cloud storage solution, then you can select from any of the cloud storage solutions.
3. For a cloud storage solution, if needed, provide the credentials for the cloud storage solution, then select the additional files.
4. For sample text, provide the content.
5. For upload, search for and select the files.

When you add to the test data, Textual uses the most recent version of the guidelines to identify entity values in the new data. You can then conduct the review.

## **File review statuses**

Each file goes through the following statuses:

* **Queued for upload** - Textual is uploading the file to the set of test files.
* **Ready for Review** - The file is uploaded, but you have not yet reviewed the file to finalize the entity values that the file contains.
* **Reviewed** - You completed the review.

## **Reviewing a file and changing the detected values**

To review a file, click the file name. The file content displays to the right. The values from the initial detection are highlighted.

* To add an instance of an entity value, select the value text.
* To remove an instance, click its delete icon. On the confirmation panel, click **Delete**.

To save the current annotation updates, but not mark the file as reviewed, click **Save**.

When you finish the review and complete the changes, click **Save and mark as reviewed**.

## Deleting test files

On the **Test Data Setup** page, to delete a test file:

1. Click its delete icon.
2. On the confirmation panel, you can choose to skip the confirmation when you delete test files.\
   \
   If you select this option, then the next time you delete a test file, the file is deleted immediately, and the panel does not display.
3. Click **Delete**.

When you delete a test file:

* For existing guidelines versions, the file name and scores remain in the list of test files for those guidelines. The file name is dimmed, and you can no longer display a preview of the file content.
* For existing models that annotated the deleted file during their training, the benchmark score does not change.
* For new guidelines versions, the file is not used and is not listed.
* For models that are trained after the file is deleted, the file is not annotated and is not included in the benchmark score.
