Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
At a high level, to use Tonic Textual to create synthesized or redacted data:
By default, Textual identifies sensitive values based on its built-in models. If needed, you can create custom models to identify sensitive values that are not covered by the built-in models. The custom models option requires an OpenAI key.
Create a Textual dataset. A dataset is a set of files to redact.
If you have custom models, then you can enable the custom models to use on the dataset.
Add files to the dataset. Textual supports almost any free-text file, PDF files, .docx files, and .xlsx files. For images, Textual supports PNG, JPG (both .jpg and .jpeg), and TIF (both .tif and .tiff) files. As you add the files, Textual uses its built-in models and your enabled custom models to identify sensitive values in the files.
Configure how to handle each type of value. By default, Textual redacts the values, which means to replace the values with a placeholder that identifies the type of sensitive value. For example, PERSON, LOCATION. For PDF files and image files, redaction means to cover the value with a black box. For a given data type, you can instead choose to synthesize the values, which means to replace the original value with a realistic replacement. You can also choose to ignore the values, and not replace them. Optionally, you can create a list of values to exclude from a specific type, if some of the detected values are incorrect.
Also optionally, you can add manual overrides to a PDF file. When you add a manual override, you draw a box to identify the affected portion of the file.
You can use manual overrides either to ignore the automatically detected redactions in the selected area, or to redact the selected area. To make it easier to process multiple files that have a similar format, such as a form, you can create templates that you can apply to PDF files in the dataset.
You then download the redacted and synthesized versions of the files to use as needed.
The Tonic Textual API is a Python SDK that you can use to manage datasets and get access to redacted files.
To be able to use the Textual API, you must have an API key.
If you didn't create an API key when you signed up for Textual, then you can create API keys from the Textual application.
You can manage keys from the Textual home page or the User API Keys page.
On the Textual home page, the API Keys panel is at the top right.
To display the User API Keys page, in the Textual navigation menu, click User API Keys.
To create a Textual API key:
Either:
In the API keys panel on the Textual home page, click Create an API Key.
On the User API Keys page, click Create API Key.
In the Name field, type a name to use to identify the key.
Click Create API Key.
Textual displays the key value, and prompts you to copy the key. If you do not copy the key and save it to a file, you will not have access to the key. To copy the key, click the copy icon.
To revoke a Textual API key, either:
In the API Keys panel on the Textual home page, click the delete icon for the key to revoke.
On the User API Keys page, click the Revoke option for the key to revoke.
You cannot instantiate the SDK client without an API key.
Instead of providing the key every time you call the Textual API, you can configure the API key as the value of the environment variable TONIC_TEXTUAL_API_KEY
.
To install the Tonic Textual Python API, run:
pip install tonic-textual
When you sign up for a Tonic Textual account, you can immediately set up access to the Textual API or get started with a new dataset.
Note that these instructions are for setting up a new account on Textual Cloud. For a self-hosted instance, depending on how it is set up, you might either create an account manually or use single sign-on (SSO).
To get started with a new Textual account:
Go to https://textual.tonic.ai/.
Click Sign up.
Enter your email address.
Create and confirm a password for your Textual account.
Click Sign Up.
Textual creates your account and displays the Textual home page.
It also displays a Getting Started panel that prompts you to either create a Textual API key or to create a Textual dataset.
From the Getting Started panel, to create a Textual API key:
Click Create an API Key.
In the Name field, type a name to use to identify the key.
Click Create API Key. Textual displays the key value, and prompts you to copy the key. If you do not copy the key and save it to a file, then you will not have access to the key. To copy the key, click the copy icon.
Click Close.
Textual adds the key to the API Keys panel at the top right of the Textual home page. From there, you can add and revoke API keys.
For details about managing API keys, go to Creating and revoking Textual API keys.
For details about using the Textual API to perform tasks, go to Using the Textual API.
For details about the available Tonic classes, go to the generated API documentation.
The Textual Playground page allows you to see how Textual detects values in plain text. You can either have Textual redact the values, or use a large language model (LLM) to replace detected values with realistic synthesized values.
To display the Playground page, in the navigation menu, click Playground.
On the Playground page, as you enter text in the left text area, the synthesized or redacted version displays at the right. It also displays the API redaction response.
On a self-hosted instance of Textual, to use the LLM synthesis option, you must set an OpenAI key as the value of the environment variable SOLAR_OPENAI_KEY
.
By default, the Playground page uses an LLM to generate synthesized values. Synthesized values are realistic replacement values. Synthesis is enabled when Enable LLM Synthesis is in the on position.
You can also choose to display redacted values. Redaction replaces the detected values with placeholder values that represent the value types. To display redacted values, toggle Enable LLM Synthesis to the off position.
To clear the text, click Clear.
Textual also provides sample text options for some common use cases. To populate the text with a sample, click Try a Sample, then select the type of sample.
From the Getting Started panel, you can create your first Textual dataset. You can either create a dataset using your own files, or create a dataset that uses Textual sample files.
To create a dataset from your own files:
Click Upload Files.
On the Upload files to redact panel, to select the files to include in the dataset, either:
Drag and drop the files to the panel.
Click Select Files to Upload, then navigate to and select the files.
Textual displays the dataset details page for the new dataset.
To create a dataset using Textual sample data, on the home page, click Try Demo Dataset.
Whenever you call the Textual API, you first instantiate the SDK client.
If the API key is configured as the value of TONIC_TEXTUAL_API_KEY
, then you do not need to provide the API key when you instantiate the SDK client.
If the API key is not configured as the value of TONIC_TEXTUAL_API_KEY
, then you must include the API key in the request.
The Tonic Textual application and API allow you to synthesize or redact sensitive information in free-text and image files. Textual identifies specific types of sensitive values. You decide whether and how to replace those values. You can then use the redacted versions of the files for testing or training.
One use case for Textual is to remove sensitive values from data before you train a text-based model.
For example, an insurance company needs a trained model to help with the development of a customer service chatbot. They want to use transcripts of previous customer service calls as a source for the trained data, but do not want the chatbot to reveal sensitive information about actual clients.
View a .
For details about all of the available Tonic Textual classes, go to the .
To redact a specific text string and view the results, use textual.redact
:
The response provides the redacted version of the string, and the list of redacted values. For each redacted item, the response includes:
The location of the value
The type of sensitive value
The original value
A score to indicate confidence in the detection and redaction
For example:
By default, Textual redacts detected sensitive values. The redacted value is <value type>_<generated identifier>
. For example, ORGANIZATION_EPfC7XZUZ
.
For each value type, you can instead choose to either synthesize or ignore the value.
When you synthesize a value, Textual replaces the value with a different realistic value.
When you ignore a value, Textual passes through the original value.
To specify the handling type for a value type, you use the generator_config
parameter.
Where:
<value_type>
is the identifier of the type of value. For example, ORGANIZATION
. For the list of built-in value types that Textual scans for, go to About entity types in Textual.
<handling_type>
is the handling type to use for the specified value type. The possible values are Redact
, Synthesis
, and Off
.
To specify a handling type to use for value types that are not specified in generator_config
, you use the generator_default
parameter. generator_default
can be either Redact
, Synthesis
, or Off
.
The following example redacts a string and indicates to synthesize organization values. In the response, the organization value is replaced with a realistic value instead of an ORGANIZATION
placeholder value.
You can also request synthesized values from a large language model (LLM).
When you use this process, Textual first identifies the sensitive values in the text. It then sends the value locations and redacted values to the LLM. For example, if Textual identifies a product name, it sends the location and the redacted value PRODUCT
to the LLM. Textual does not send the original values to the LLM.
The LLM then generates realistic synthesized values of the appropriate value types.
To send text to an LLM, use textual.llm_synthesis
:
The response provides the text with the synthesized replacement values, followed by the list of synthesized values. For each value, the list includes:
Where the value is located in the string
The type of value
The original value
A score to indicate confidence in the detection and synthesis
Here is an example of a request to send a text string to an LLM, and the response with the updated string and value list:
The file manager displays thumbnails of the dataset files.
From the file manager, you can:
To display the file manager, on the dataset details page, click Preview and Manage Files.
A Tonic Textual dataset is a collection of text-based files. Textual uses the built-in and custom models to detect and redact the sensitive information in each file.
The Textual home page includes the list of datasets.
You can also view and manage datasets from the Datasets page. To display the Datasets page, in the navigation menu, click Datasets.
For each dataset, the list on the home page and the Datasets page includes:
The dataset name.
The number of files in the dataset.
When the dataset was most recently updated.
The user who most recently updated the dataset.
From the home page or Datasets page, you can create a new empty dataset. Textual prompts you for the dataset name, then displays the dataset details page.
To create a dataset:
On the home page or Datasets page, click Create a Dataset.
On the dataset creation panel, in the Dataset Name field, provide the name of the dataset.
Click Create Dataset. The dataset details page for the new dataset is displayed.
The Textual home page also allows you to create a dataset from uploaded files. Textual supports plain-text, .csv, PDF, .docx files, and .xlsx files. For images, Textual supports PNG, JPG (both .jpg and .jpeg), and TIF (both .tif and .tiff) files.
To get started, either:
Drag and drop a file or files to the dataset creation option at the top left of the datasets list.
Click Upload Files to Create a Dataset, then search for and select the files.
Textual uses the files to create a dataset with a default name, then displays the dataset details page.
Textual provides access to an example dataset with a single sample file.
If you didn't use the sample dataset when you first signed up for Textual, then on the Textual home page, to create the sample dataset, click Try Demo Dataset.
Textual creates the dataset with a default name and displays the dataset details page.
To display the details page for a dataset, from the datasets list on the home page or Datasets page, click the dataset name.
The dataset details page includes:
The list of files in the dataset
The results of the scan for sensitive values
The configured handling for each type of value
The dataset name displays in the panel at the top left of the dataset details page.
To change the dataset name:
Click Settings.
In the Settings menu, click Edit Dataset Name.
Provide the new name for the dataset.
Click Save Dataset.
To delete a dataset, either:
From the datasets list, click the options menu for the dataset, then click Delete.
From the dataset details page, in the panel at the top left, click Settings, then click Delete Dataset.
To use the custom models feature, you must have an OpenAI key.
A model represents specific types of sensitive data for Tonic Textual to detect and redact.
Within a dataset, you can enable and disable the available custom models. By default, no custom models are enabled. When you enable a custom model, Textual identifies and redacts values for that model.
To enable and disable custom models for a dataset:
On the dataset management view, in the panel at the top left, click Custom Models.
The custom models panel lists the available custom models.
On the custom models panel, to enable a custom model, switch the toggle to the on position. To disable a custom model, switch the toggle to the off position.
Click Save.
Tonic Textual supports plain-text, .csv, PDF, and .docx files.
From the home page or Datasets page, to add files to a dataset:
Click the options menu for the dataset.
Click Upload Files.
On the file upload panel, either:
Drag and drop files to the panel.
Click Select files to upload, then search for and select the files.
Textual adds the files, displays the dataset details page, and begins to scan the new files.
From the dataset details page, to add files to the dataset:
In the panel at the top left, click Upload Files.
Search for and select the files.
Tonic Textual adds the files and begins to scan them.
To remove a file from the dataset, you can use the option in the dataset file list or on the file manager.
From the file list on the dataset details page, to remove a file from the dataset:
Click the options menu for the file.
In the options menu, click Delete.
From the file manager, to remove a file from the dataset:
Click the options menu for the file.
In the options menu, click Delete File.
Textual always uses its . You can create to represent sensitive data types that are specific to your use case. For example, for health care data, you might need to redact disease names.
For each type of sensitive data, you can adjust how Tonic Textual identifies and updates the values.
As you add and remove files, Tonic Textual updates the sensitivity detection results. To identify the sensitive values, it uses the built-in Textual models, as well as any enabled custom models.
At the top of the dataset details view, the Sensitive words tile shows the total number of sensitive values in the dataset that Textual detected.
The dataset details view also displays a list of sensitive data types that Tonic Textual detected in the dataset data.
For each type, Textual displays:
The number of excluded terms. An excluded term is a term that you do not want Textual to identify as that data type.
The number of detected instances. The number excludes the blocked instances.
To view up to 10 of the detected values for a data type, click the view icon next to the value count.
Exclude specific values
Identify specific values to exclude from redaction or synthesis.
Select the handling option
Indicate whether to redact, synthesize, or ignore values.
Configure synthesis for datetime values
Add rules to indicate how to adjust datetime values.
You can exclude specific values for a data type. For example, a detected value might be labeled incorrectly, or you do not want specific values within that data type to be redacted.
In the detected data types list, the first icon and count shows the number of excluded values. If you did not configure any excluded values, then the count is 0.
To configure a list of excluded terms:
Click the exclude icon next to the excluded values count.
On the Blocklist panel:
In the Keywords or Phrases list, provide specific values to exclude.
In the Regexes list, provide regular expressions to use to identify the values to exclude.
For each list, place each entry on a separate line.
Click Save.
For a PDF file, you can add manual overrides to selected areas of a file. Manual overrides can either ignore redactions from Tonic Textual, or that add redactions.
Edit an individual file
Add manual overrides to a PDF file. You can also apply a template.
Create PDF templates
PDF templates allow you to add the same overrides to files that have the same structure
For some value types, when you select the Synthesis option, you can configure additional options for how Tonic Textual generates the replacement values.
To display the available options, click Options.
Location values include the following types:
Location
Location Address
Location State
Location Zip
For each location types other than Location State, you can specify whether to use a realistic replacement value. For Location State, based on HIPAA guidelines, both the Synthesis option and the Off option pass through the value.
For location types that include zip codes, you can also specify how to generate the new zip code values.
By default, Textual replaces a location value with a realistic corresponding value. For example, "Main Street" might be replaced with "Fourth Avenue".
To instead scramble the values, uncheck Replace with realistic values.
By default, to generate a new zip code, Textual selects a real zip code that starts with the same three digits as the original zip code. For a low population area, Textual instead selects a random zip code from the United States.
To instead replace the last two digits of the zip code with zeros, uncheck Replace zeroes for zip codes. For a low population area, Textual instead replaces all of the digits in the zip code with zeros.
By default, when you select the Synthesis option for Date/Time values, Textual shifts the datetime values to a value that occurrs within 7 days before or after the original value.
To customize how Textual sets the new values, you can:
Set a different range within which Textual sets the new values
Indicate whether to scramble date values that Textual cannot parse
Add additional date formats for Textual to recognize
By default, Textual adjusts the dates to values that are within 7 days before or after the original date.
To change the range, in the # of Days To Shift +/- field, enter the number of days before and the original date within which the replacement datetime value must occur. For example, if you enter 10, then the replacement datetime value must occur within 10 days before or after the original value.
Textual can parse datetime values that use either a format in #synthesis-options-datetime-default-formats or a format that you add.
The Scramble Unrecognized Dates checkbox indicates how Textual should handle datetime values that it does not recognize.
By default, the checkbox is checked, and Textual scrambles those values.
To instead pass through the values without changing them, uncheck Scramble Unrecognized Dates.
By default, Textual is able to recognize datetime values that use a format from #synthesis-options-datetime-default-formats.
Under Additional Date Formats, you can add other datetime formats that you know are present in your data.
The formats must use a Noda Time LocalDateTime pattern.
To add a format, type the format in the field, then click +.
To remove a format, click its delete icon.
By default, Textual supports the following datetime formats.
Format | Example value |
---|---|
Format | Example value |
---|---|
Format | Example value |
---|---|
yyyy/M/d
2024/1/17
yyyy-M-d
2024-1-17
yyyyMMdd
20240117
yyyy.M.d
2024.1.17
yyyy, MMM d
2024, Jan 17
yyyy-M
2024-1
yyyy/M
2024/1
d/M/yyyy
17/1/2024
d-MMM-yyyy
17-Jan-2024
dd-MMM-yy
17-Jan-24
d-M-yyyy
17-1-2024
d/MMM/yyyy
17/Jan/2024
d MMMM yyyy
17 January 2024
d MMM yyyy
17 Jan 2024
d MMMM, yyyy
17 January, 2024
ddd, d MMM yyyy
Wed, 17 Jan 2024
M/d/yyyy
1/17/2024
M/d/yy
1/17/24
M-d-yyyy
1-17-2024
MMddyyyy
01172024
MMMM d, yyyy
January 17, 2024
MMM d, ''yy
Jan 17, '24
MM-yyyy
01-2024
MMMM, yyyy
January, 2024
yyyy-M-d HH:mm
2024-1-17 15:45
d-M-yyyy HH:mm
17-1-2024 15:45
MM-dd-yy HH:mm
01-17-24 15:45
d/M/yy HH:mm:ss
17/1/24 15:45:30
d/M/yyyy HH:mm:ss
17/1/2024 15:45:30
yyyy/M/d HH:mm:ss
2024/1/17 15:45:30
yyyy-M-dTHH:mm:ss
2024-1-17T15:45:30
yyyy/M/dTHH:mm:ss
2024/1/17T15:45:30
yyyy-M-d HH:mm:ss'Z'
2024-1-17 15:45:30Z
yyyy-M-d'T'HH:mm:ss'Z'
2024-1-17T15:45:30Z
yyyy-M-d HH:mm:ss.fffffff
2024-1-17 15:45:30.1234567
yyyy-M-dd HH:mm:ss.FFFFFF
2024-1-17 15:45:30.123456
yyyy-M-dTHH:mm:ss.fff
2024-1-17T15:45:30.123
HH:mm
15:45
HH:mm:ss
15:45:30
HHmmss
154530
hh:mm:ss tt
03:45:30 PM
HH:mm:ss'Z'
15:45:30Z
For PDF files, you can add manual overrides to the initial redactions based on the detected data types and handling configuration.
For each manual override, you select an area of the file.
For the selected area, you can either:
Ignore any automatically detected redactions. For example, a scanned form might show example or boilerplate content that doesn't actually contain sensitive values.
Redact that area. The file might contain sensitive content that Tonic Textual is unable to detect. For example, a scanned form might contain handwritten notes.
To manage the manual overrides for a PDF file:
In the file list, click the options menu for the file.
In the options menu, click Edit Redactions.
The File Redactions panel displays the file content. The values that Textual detected are highlighted. The page also shows any manual overrides that were added to the file.
If a dataset contains multiple files that have the same format, then you can create a template to apply to those files. For more information, go to Creating templates to apply to PDF files.
On the File Redactions panel, to apply a template to the file, select it from the template dropdown list.
When you apply a PDF template to a file, the manual overrides from that template are displayed on the file preview. The manual overrides are not included in the Redactions list.
On the File Redactions panel, to add a manual to a file:
Select the type of override. To indicate to ignore any automatically detected values in the selected area, click Ignore Redactions. To indicate to redact the selected area, click Add Manual Redaction.
Use the mouse to draw a box around the area to select.
Textual adds the override to the Redactions list. The icon indicates the type of override.
In the file content:
Overrides that ignore detected values within the selected area are outlined in red.
Overrides that redact the selected area are outlined in green.
To select and highlight a manual override in the file content, in the Redactions list, click the navigate icon for the override.
To remove a manual override, in the Redactions list, click the delete icon for the override.
To save the current manual overrides, click Save.
For each data type, you choose how to handle the sensitive data values.
The available options are:
Synthesis - Indicates to replace the value with another realistic value. For example, the first name value Michael might be replaced with the value John. Tonic Textual does not synthesize any excluded values.
Redaction - This is the default option. For text files, Redaction indicates to replace the value with a placeholder that identifies the data type. For example, the first name value Michael is replaced with the value PERSON. For PDF files and image files, Redaction indicates to cover the value with a black box. Textual does not redact any excluded values.
Off - Indicates to not make any changes to the values. For example, the first name value Michael remains Michael.
To select the handling option for an individual data type, click the option for that type.
To select the same handling option for all of the data types, in the Bulk Edit panel above the data type list, click the option.
You cannot preview TIF image files. You can preview PNG and JPG files.
You can display the preview from the file list or from the file manager.
From the file list, to display the preview, click the options menu, then click Preview.
Click the options menu for the file.
In the options menu, click Preview.
From the file manager, to display the preview, click the file thumbnail.
On the left, the preview displays the original data. The sensitive data values are highlighted.
On the right, the preview displays the data with synthesized and redacted values based on the dataset configuration for the detected sensitive data types.
For a PDF file or an image file, redacted values are covered by a black box.
The preview for a PDF file also includes any manual overrides.
For each file in the dataset, you can download the version of the file that contains the redacted and synthesized values.
You can download an individual file from either the dataset file list or the File Manager.
To download a single file:
Click the options menu for the file.
In the options menu, click Download File.
From the file manager, to download a single file:
Click the options menu for the file.
In the options menu, click Download File.
To download all of the files, click Download All Files.
A dataset might contain a number of files that have the same structure, such as a set of scanned-in forms.
Instead of having to add the same manual overrides for each file, you can use a PDF file in the dataset to create a template that you can apply to other PDF files in the dataset.
To add a PDF template to a dataset:
On the dataset details page, click PDF Templates.
On the template creation and selection panel, click Create a New Template.
On the template details page:
In the Name field, provide a name for the template.
From the file dropdown list, select the dataset file to use to create the template.
Add the manual overrides to the file.
When you finish adding the manual overrides, click Save New Template.
When you update a PDF template, it affects any files that use the template.
To update a PDF template:
On the dataset details page, click PDF Templates.
Under Edit an Existing Template, select the template, then click Edit Selected Template.
On the template details panel, you can change the template name, and add or remove manual overrides.
To save the changes, click Update Template.
On the template details panel, to add a manual to a file:
Select the type of override. To indicate to ignore any automatically detected values in the selected area, click Ignore Redactions. To indicate to redact the selected area, click Add Manual Redaction.
Use the mouse to draw a box around the area to select.
Tonic Textual adds the override to the Redactions list. The icon indicates the type of override.
To select and highlight a manual override in the file content, in the Redactions list, click the navigate icon for the override.
To remove a manual override, in the Redactions list, click the delete icon for the override.
When you delete a PDF template, the template and its manual overrides are removed from any files that the template was assigned to.
To delete a PDF template:
On the dataset details page, click PDF Templates.
Under Edit an Existing Template, select the template, then click Edit Selected Template.
On the template details panel, click Delete.
Tonic Textual comes with built-in models to identify a range of sensitive values, such as:
Locations and addresses
Names of people and organizations
Identifiers and account numbers
The built-in entity types are:
Entity type name | Identifier (for API) | Description |
---|---|---|
CC Exp
CC_EXP
The expiration date of a credit card.
Credit Card
CREDIT_CARD
A credit card number.
CVV
CVV
The card verification value for a credit card.
Date Time
DATE_TIME
A date or timestamp.
Email Address
EMAIL_ADDRESS
An email address.
Event
EVENT
The name of an event.
Gender Identifier
GENDER_IDENTIFIER
An identifier of a person's gender.
IBAN Code
IBAN_CODE
An international bank account number used to identify an overseas bank account.
IP Address
IP_ADDRESS
An IP address.
Language
LANGUAGE
The name of a spoken language.
Law
LAW
A title of a law.
Location
LOCATION
A value related to a location. Can include any part of a mailing address.
Occupation
OCCUPATION
A job title or profession.
Street Address
LOCATION_ADDRESS
A street address.
City
LOCATION_CITY
The name of a city.
State
LOCATION_STATE
A state name or abbreviation.
Zip
LOCATION_ZIP
A postal code.
Medical License
MEDICAL_LICENSE
The identifier of a medical license.
Money
MONEY
A monetary value.
Given Name
NAME_GIVEN
A given name or first name.
Family Name
NAME_FAMILY
A family name or surname.
NRP
NRP
A nationality, religion, or political group.
Numeric Value
NUMERIC_VALUE
A numeric value.
Organization
ORGANIZATION
The name of an organization.
Password
PASSWORD
A password associated with an account.
Person
PERSON
The name of a person.
Phone Number
PHONE_NUMBER
A telephone number.
Product
PRODUCT
The name of a product.
Project Name
PROJECT_NAME
The name of a project.
URL
URL
A URL to a web page.
US Bank Number
US_BANK_NUMBER
The routing number of a bank in the United States.
US Driver License
US_DRIVER_LICENSE
An identifier of a United States driver's license.
US ITIN
US_ITIN
An Individual Taxpayer Identification Number in the United States.
US Passport
US_PASSPORT
A United States passport identifier.
US SSN
US_SSN
A United States Social Security number.
Username
USERNAME
A username associated with an account.
Work of Art
WORK_OF_ART
A name or title of a work of art such as a book or painting.
Each entity type represents a type of value to detect and redact.
On the model details view, the Entities section contains the list of entity types in the model.
To add an entity type, click Add Another Entity.
To delete an entity type from the model, click the Delete Entity option for the model.
When you delete an entity type, Tonic Textual does not change the results for existing files. It does not look for the data type values in files that you add to datasets after you delete the entity type.
For each entity type:
In the Label field, provide the identifier for the entity type. You use the identifier in the model usage examples.
Optionally, in the Description field, provide a description of the entity type.
Under Examples, to provide example values for the entity type:
To add a field for another example value, click Add Another Example.
To remove an example value, click the delete icon for the example value.
In the Additional instructions text area, provide any additional information about the values.
To have Textual generate example values based on the example values that you provided, click Preview Generated Examples. You can use these examples to determine how well Textual understands the entity type, and whether you need to provide additional examples or additional instructions.
To use the custom models feature, you must have an OpenAI key.
You provide the key on the System Settings page. To display the System Settings page, click your user image, then click System Settings.
In addition to the built-in Tonic Textual models, you can create custom models to allow Textual to identify other types of sensitive values that are not included in the built-in models.
For example, you might need to redact terms or identifiers that are specific to your industry or use case, and that are not covered by the built-in models.
When you create a custom model, you provide Textual with:
The entity types in that model
Example values for each entity type
Examples of how the entity type values are used in the context of your data
The example values and usage help Textual to learn how to identify those values.
To use the custom models feature, you must have an OpenAI key.
You provide the key on the System Settings page. To display the System Settings page, click your user image, then click System Settings.
You manage custom models from the Models page. To display the Models page, in the Tonic Textual navigation menu, click Models.
For each model, Models view includes:
The assigned icon
The model name and description
Whether the model is currently enabled
To create a model:
On the Models page, click Create New Model.
On the Create A New Model panel, in the Name field, provide a name for the custom model.
From the Icon dropdown list, select an icon to assign to the custom model. The icon displays on the Models page and also in the list of detected entity types for a dataset.
Optionally, in the Description text area, provide an optional description of the custom model.
Click Create Model.
On the model details page:
Click Save.
Tonic Textual trains the new model.
To update an existing model:
On Models view, click the model name.
On the model details page, update the model configuration.
To save the updated model, but not retrain it, click Save. To both save and retrain the model, click Save & Retrain.
At any time, to retrain a model:
On the Models page, click the options menu for the model.
From the options menu, select Retrain.
When you delete a custom model, Textual does not change the results for existing files.
It does not use the custom model to detect values in files that you add to datasets after you delete the model.
To delete a model:
On the Models page, click the options menu for the model.
From the options menu, select Delete.
On the model details view, the Template Examples section provides examples of how the entity type values appear in the context of your data. The examples provide additional help to Tonic Textual to identify the values.
To add a template field to the list, click Add Another Template.
To remove a template from the list, click its delete icon.
Each template example is a short sentence or phrase that shows how the entity type value might appear.
To represent a data type, use {{entity_type_label}}
. The label used in the template must exactly match the entity label.
For example, for an entity type with the identifier AGE, the following is a good example of a template:
It’s hard to believe I will be {{AGE}} years old next Tuesday!
After you create your template examples, to have Textual generate additional examples, click Preview Generated Templates.
You can use the generated values to see how well Tonic Textual understands the entity type and examples, and whether you need to add or update the examples that you provided.
We recommend that you sample the templates before you save the model.
If the resulting templates are not suitably accurate or diverse, you can use the Extra instructions text area to provide extra instructions in the prompt.
The Tonic Textual images are stored on Quay.io. During onboarding, Tonic.ai provides you with credentials to access the image repository. If you require new credentials, or you experience issues accessing the repository, contact support@tonic.ai.
You can deploy Textual using either Kubernetes or Docker.
System requirements
System requirements to deploy a self-hosted Textual instance
Deploy on Docker
How to use Docker Compose to deploy a self-hosted Textual instance on Docker
Deploy on Kubernetes
How to use Helm to deploy a self-hosted Textual instance on Kubernetes
You install a self-hosted instance of Tonic Textual on either:
A VM or server that runs Linux and on which you have superuser access.
A local machine that runs Mac, Windows, or Linux.
At minimum, we recommend that the server or cluster that you deploy Textual to has access to the following resources:
Nvidia GPU, 16GB GPU RAM. We recommend at least 6GB GPU RAM for each textual-ml worker.
If you only use a CPU and not a GPU, then we recommend an M5.2xLarge. However, without GPU, performance will be significantly slower.
The number of words per second that Textual processes depends on many factors, including:
The hardware that runs the textual-ml
container
The number of workers that are assigned to the textual-ml
container
The auxiliary model, if any, that is used in the textual-ml
container.
To optimize the throughput of and the cost to use Textual, we recommend that the textual-ml
container runs on modern hardware with GPU compute. If you use AWS, we recommend a g5 instance with 1 GPU.
To use GPU resources:
Ensure that the correct Nvidia drivers are installed for your instance.
If you use Kubernetes to deploy Textual, follow the instructions at Kubernetes on NVIDIA GPUs.
If you use Minikube, then use the instructions in Using NVIDIA GPUs with Minikube.
If you use Docker Compose to deploy Textual, follow these steps to install the nvidia-container-runtime.
The Docker Compose file is available in the GitHub repository https://github.com/TonicAI/textual_docker_compose/tree/main.
Fork the repository.
To deploy Textual:
Rename sample.env to .env.
In .env, provide values for the required settings. These are not commented out and have <FILL IN>
as a placeholder value:
SOLAR_VERSION
- Provided by Tonic.ai.
ENVIRONMENT_NAME
- The name that you want to use for your Textual instance. For example, my-company-name
.
SOLAR_SECRET
- The string to use for Textual encryption.
SOLAR_DB_PASSWORD
- The password that you want to use for the Textual application database, which stores the metadata for Textual, including the datasets and custom models.Textual deploys a PostgreSQL database container for the application database.
To deploy and start Textual, run docker-compose up -d
.
On a self-hosted instance of Textual, much of the configuration takes the form of environment variables.
Add the variable to .env in the format:
SETTING_NAME=value
After you update .env, to restart Textual and complete the update, run:
$ docker-compose down
$ docker-compose pull && docker-compose up -d
In values.yaml, add the environment variable to the appropriate env section of the Helm char. For example:
After you update the yaml file, to restart the service and complete the update, run:
$ helm upgrade <name_of_release> -n <namespace_name> <path-to-helm-chart>
The above helm upgrade command is always safe to use when you provide specific version numbers. However, if you use the latest tag, it might result in Textual containers that have different versions.
The TEXTUAL_ML_WORKERS
environment variable specifies the number of workers to use within the textual-ml
container. The default value is 1.
Having multiple workers allows for parallelization of inferences with NER models.
When you deploy Textual with Kubernetes on GPUs, parallelization allows the textual-ml
container to fully utilize the GPU.
We recommend 6GB of GPU RAM for each worker.
On the Playground page, the LLM synthesis option uses a large language model (LLM) to generate synthesized replacement values for the detected entities in the text.
This option requires an OpenAI key.
Before you can use this option on your self-hosted Textual instance, you must provide an OpenAI key as the value of the environment variable SOLAR_OPENAI_KEY
.
To scan PDF files, Tonic Textual uses OCR on PDFS on Azure Cognitive Services. To enable PDF scanning on your self-hosted instance of Textual, you need to provide the Azure document intelligence key and endpoint for your Azure account.
In .env, uncomment and provide values for the following settings:
SOLAR_AZURE_DOC_INTELLIGENCE_KEY=#<FILL IN>
SOLAR_AZURE_DOC_INTELLIGENCE_ENDPOINT=#<FILL IN>
In values.yaml, uncomment and provide values for the following settings:
azureDocIntelligenceKey: <key>
azureDocIntelligenceEndpoint: <endpoint-url>
To improve overall inference, you can configure Textual to use an auxiliary NER model.
An auxiliary model detects the following types:
DATE_TIME
EVENT
LANGUAGE
LAW
LOCATION
MONEY
NRP
NUMERIC_VALUE
ORGANIZATION
PERSON
PRODUCT
WORK_OF_ART
By default, Textual uses Spacy’s en_core_web_trf
model. The en_core_web_lg
and en_core_web_sm
models allow for faster throughput, but with some drop in accuracy for the types listed above.
You can also disable the auxiliary model.
On a self-hosted Textual instance, you configure the auxiliary model as the value of the environment variable SOLAR_AUX_MODEL
. The available values are:
en_core_web_trf
- This is the default value.
en_core_web_lg
en_core_web_sm
none
- Indicates to not use an auxiliary model.
Tonic Textual respects the access control policy of your single sign-on (SSO) provider. To access Textual, users must be granted access to the Textual application within your SSO provider.
To enable SSO, you first complete the required configuration in the SSO provider. You then configure Textual to connect to it.
After you enable SSO, users can use SSO to create an account in Textual.
To only allow SSO authentication, set the environment variable REQUIRE_SSO_AUTH
to true
. This disables standard email/password authentication. All account creation and login is handled through your SSO provider. If multi-factor authentication (MFA) is set up with your SSO, then all authentication must go through your provider's MFA.
Tonic Textual supports the following SSO providers:
Azure
Use Azure to enable SSO on Textual
GitHub
Use GitHub to enable SSO on Textual
Use Google to enable SSO on Textual
Okta
Use Okta to enable SSO on Textual
The Tonic Textual Helm chart is available in the GitHub repository https://github.com/TonicAI/textual_helm_charts.
To use the Helm chart, you can either:
Use the OCI-based registry that Tonic hosts on quay.io.
Fork or clone the repository and then maintain it locally.
During the onboarding period, you are provided access credentials to our docker image repository on Quay.io. If you require new credentials, or you experience issues accessing the repository, contact support@tonic.ai.
The GitHub repository contains a readme with the details on how to populate a values.yaml file and deploy Textual.
Use these instructions to set up GitHub as your SSO provider for Tonic Textual.
In GitHub, navigate to Settings -> Developer Settings -> OAuth Apps, then create a new application.
For Application Name, enter Textual.
For Homepage URL, enter https://textual.tonic.ai
.
For Authorization callback URL, enter https://your-textual-url/sso/callback
.
Replace your-textual-url
with the URL of your Textual instance.
After you create the application, to create a new secret, click Generate a new client secret.
You use the client ID and the client secret in the Textual configuration.
After you complete the configuration in GitHub, you uncomment and configure the required environment variables in Textual.
For Kubernetes, in values.yaml:
For Docker, in .env:
Use these instructions to set up Google as your SSO provider for Tonic Textual.
Click Create credentials, located near the top.
Select OAuth client ID.
Select Web application as the application type.
Choose a name.
Under Authorized redirect URIs, add the URL of the Textual server with the endpoint /sso/callback
.
For example, a local Textual server at http://localhost:3000
would need http://localhost:3000/sso/callback
to be set as the redirect URI.
Also note that internal URLs might not work.
On the confirmation page, note the client ID and client secret. You will need to provide them to Textual.
After you complete the configuration in Google, you uncomment and configure the required environment variables in Textual.
The client ID
The client secret
For Kubernetes, in values.yaml:
For Docker, in .env:
Go to
Use these instructions to set up Azure Active Directory as your SSO provider for Tonic Textual.
Register Textual as an application within the Azure Active Directory Portal:
In the portal, navigate to Azure Active Directory -> App registrations, then click New registration.
Register Textual and create a new web redirect URI that points to your Textual instance's address and the path /sso/callback.
Take note of the values for client ID and tenant ID. You will need them later.
Click Add a certificate or secret, and then create a new client secret. Take note of the secret value. You will need this later.
Navigate to the API permissions page. Add the following permissions for the Microsoft Graph API:
OpenId permissions
openid
profile
GroupMember
GroupMember.Read.All
User
User.Read
Click Grant admin consent for Tonic AI. This allows the application to read the user and group information from your organization. When permissions have been granted, the status should change to Granted for Tonic AI.
Navigate to Enterprise applications and then select Textual. From here, you can assign the users or groups that should have access to Textual.
After you complete the configuration in Azure, you uncomment and configure the required environment variables in Textual.
For Kubernetes, in values.yaml:
For Docker, in .env:
To create a new dataset and then upload a file to it, use textual.create_dataset
. To add files to the dataset, use dataset.upload_then_add_file
.
For example:
Textual creates the dataset, scans the uploaded file, and redacts the detected values.
To get the current status of the files in the current dataset, use dataset.describe
:
The response includes:
The name and identifier of the dataset
The number of files in the dataset
The number of files that are waiting to be processed (scanned and redacted)
The number of files that had errors during processing
For example:
To get the redacted content in JSON format for a dataset, use textual
.get_dataset
:
For example:
The response looks something like:
Use these instructions to set up Okta as your SSO provider for Tonic Structural.
You complete the following configuration steps within Okta:
Create a new application. Choose the OIDC - OpenId Connect method with the Single-Page Application option.
Click Next, then fill out the fields with the values below:
App integration name: The name to use for the Textual application. For example, Textual, Textual-Prod, Textual-Dev.
Grant type: Implicit (hybrid)
Sign-in redirect URIs: <base-url>/sso/callback
Sign-out redirect URIs: <base-url>/sso/logout
Base URIs: The URL to your Textual instance
Controlled access: Configure as needed to limit Textual access to the appropriate users
After saving the above, navigate to the General Settings page for the application and make the following changes:
Grant type: Check Implicit (Hybrid) and Allow ID Token with implicit grant type.
Login initiated by: Either Okta or App
Application visibility: Check Display application icon to users
Initiate login URI: <base-url>
After you complete the configuration in Okta, uncomment and configure the following environment variables in Textual.
For Kubernetes, in values.yaml:
For Docker, in .env:
You first open the file so that Textual can read it, then make then call for Textual to read the file.
The response includes:
The file name
The identifier of the job that processed the file. You use this identifier to retrieve a transformed version of the file.
To identify the file, you use the job identifier that you received from textual.start_file_redaction
. You can also specify whether to redact, synthesize, or ignore specific entity types. By default, all of the values are redacted.
Before you make the call to download the file, you specify the path to download the file content to.
To send an individual .txt or .csv file to Textual, you use .
After you use textual.start_file_redaction
to send the file to Textual, you use to retrieve a transformed version of the file.