Overview of the process to create a model-based custom entity type
For a custom model entity type, the overall process is as follows:

Select and annotate test files
The first step is to identify the entity values that are in a small set of test files. The test files and established values are used both to iterate over the model guidelines and to assess how well your trained models perform.
When you create the model-based custom entity type, you provide an initial description of the entity type. For example, "Scientific names of health conditions". The description is the first version of the model guidelines. The guidelines tell the model how to identify the entity type values.
You then select a small set of smaller test files that contain entity values. For example, if you typically use Textual to redact values in patient appointment reports, then you might upload a few of those reports to use as test files. The files should be no more than 5,000 words.
Textual uses your initial guidelines to identify values in the files.
You then review and correct the annotations to identify the definitive set of entity values that the test files contain.
Iterate over model guidelines
After you establish the entity values in your test files, you iterate over the guidelines for the model.
For each version of the guidelines, Textual uses the guidelines to detect entity values in the test files.
Textual then compares the values that the guidelines version detects against the values that you established when you annotated the test files.
Textual generates scores to identify how well that version of the guidelines performed. If you are not satisfied with the results, you can update the guidelines to create a new version.
Textual automatically generates suggestions to improve the guidelines, based on how well the current guidelines identified the values. For example, it might suggest more specific wording or additional text to describe exceptions.
Select training data
When you have guidelines that you are satisfied with, you select a larger set of data to use for model training.
The training data should contain at least 1,000 entity values. The files should still be relatively small - no more than 5,000 words.
For example, when setting up a custom entity type to identify health conditions, you might use 5 or 6 appointment reports for your test data, but several hundred reports for your training data.
Train models
When you create a model, you select the guidelines version to use for it.
The model uses the guidelines to annotate the training data - in other words, to detect entity values in the training files. You review the annotation results to determine whether you are satisfied with the detections.
If you are not satisfied, you can:
Return to the guidelines refinement to edit the guidelines.
Create a new guidelines version.
Create a model that uses the new version.
If you are satisfied, then you can start the model training. Model training can take a very long time - sometimes hours or days - depending on the data.
When the model finishes training, it scans and identifies values in the original test data. Each trained model receives a score to identify how well its detections matched the definitive values that you established.
Select a model to use
To make the entity type available to use, you select the trained model to use.
The custom entity type is then active and can be enabled or disabled within individual datasets.
Last updated
Was this helpful?
