Configuring synthesis options
Last updated
Was this helpful?
Last updated
Was this helpful?
When Textual generates replacement values, those values are always consistent. Consistency means that the same original value always produces the same replacement value. You can also enable consistency with some Tonic Structural output values.
For some entity types, you can configure additional options for how Tonic Textual generates the replacement values.
You can also set whether to use the new synthesis process.
If you also use Tonic Structural, then you can configure Textual to enable selected synthesized values to be consistent between the two applications.
For example, a given source telephone number can produce the same replacement telephone number in both Structural and Textual.
To enable this consistency, you configure a statistics seed value as the value of the Textual SOLAR_STATISTICS_SEED
. A statistics seed is a signed 32-bit integer.
The value must match a , either:
The value of the Structural environment setting TONIC_STATISTICS_SEED
.
A statistics seed configured for an individual Structural workspace.
The current statistics seed value is displayed on the System Settings page.
Textual has developed an updated synthesis process that is currently implemented for the following entity types:
URLs
Names
Custom entity types
In particular, the new synthesis process improves the display of the synthesized values in PDF files. The values better match the available space and the original font.
To configure whether to use the new process:
On the dataset details page, click Settings.
On the Dataset Settings page, under PDF Settings, the New PDF synthesis mode (experimental) determines which process to use. To use the new process, toggle the setting to the on position.
Click Save Dataset.
Location values include the following types:
Location
Location Address
Location State
Location Zip
You can select whether to generate HIPAA or non-HIPAA addresses. Address values can be consistent with values generated in Structural.
For each location type other than Location State, you can specify whether to use a realistic replacement value. For Location State, based on HIPAA guidelines, both the Synthesis option and the Off option pass through the value.
For location types that include zip codes, you can also specify how to generate the new zip code values.
In the entity types list, to display the location synthesis options, click Options.
Under Address generator type, select the type of address generator to use:
If you configured a Textual statistics seed that matches a Structural statistics seed, then the generated address values are consistent with values generated in Structural. A given address value produces the same output value in both applications.
For example, in both Textual and Structural, a source address value 123 Main Street might be replaced with 234 Oak Avenue.
By default, Textual replaces a location value with a realistic corresponding value. For example, "Main Street" might be replaced with "Fourth Avenue".
To instead scramble the values, uncheck Replace with realistic values.
By default, to generate a new zip code, Textual selects a real zip code that starts with the same three digits as the original zip code. For a low population area, Textual instead selects a random zip code from the United States.
To instead replace the last two digits of the zip code with zeros, check Replace zeroes for zip codes. For a low population area, Textual instead replaces all of the digits in the zip code with zeros.
By default, when you select the Synthesis option for Date/Time and Date of Birth values, Textual shifts the datetime values to a value that occurs within 7 days before or after the original value.
To customize how Textual sets the new values, you can:
Set a different range within which Textual sets the new values
Indicate whether to scramble date values that Textual cannot parse
Add additional date formats for Textual to recognize
In the entity types list, to display the datetime synthesis options, click Options.
By default, Textual adjusts the dates to values that are within 7 days before or after the original date.
To change the range, in the # of Days To Shift +/- field, enter the number of days before and the original date within which the replacement datetime value must occur. For example, if you enter 10, then the replacement datetime value must occur within 10 days before or after the original value.
Textual can parse datetime values that use either a format in Default supported datetime formats in Textual or a format that you add.
The Scramble Unrecognized Dates checkbox indicates how Textual should handle datetime values that it does not recognize.
By default, the checkbox is checked, and Textual scrambles those values.
To instead pass through the values without changing them, uncheck Scramble Unrecognized Dates.
By default, Textual is able to recognize datetime values that use a format from Default supported datetime formats in Textual.
Under Additional Date Formats, you can add other datetime formats that you know are present in your data.
To add a format, type the format in the field, then click +.
To remove a format, click its delete icon.
By default, Textual supports the following datetime formats.
yyyy/M/d
2024/1/17
yyyy-M-d
2024-1-17
yyyyMMdd
20240117
yyyy.M.d
2024.1.17
yyyy, MMM d
2024, Jan 17
yyyy-M
2024-1
yyyy/M
2024/1
d/M/yyyy
17/1/2024
d-MMM-yyyy
17-Jan-2024
dd-MMM-yy
17-Jan-24
d-M-yyyy
17-1-2024
d/MMM/yyyy
17/Jan/2024
d MMMM yyyy
17 January 2024
d MMM yyyy
17 Jan 2024
d MMMM, yyyy
17 January, 2024
ddd, d MMM yyyy
Wed, 17 Jan 2024
M/d/yyyy
1/17/2024
M/d/yy
1/17/24
M-d-yyyy
1-17-2024
MMddyyyy
01172024
MMMM d, yyyy
January 17, 2024
MMM d, ''yy
Jan 17, '24
MM-yyyy
01-2024
MMMM, yyyy
January, 2024
yyyy-M-d HH:mm
2024-1-17 15:45
d-M-yyyy HH:mm
17-1-2024 15:45
MM-dd-yy HH:mm
01-17-24 15:45
d/M/yy HH:mm:ss
17/1/24 15:45:30
d/M/yyyy HH:mm:ss
17/1/2024 15:45:30
yyyy/M/d HH:mm:ss
2024/1/17 15:45:30
yyyy-M-dTHH:mm:ss
2024-1-17T15:45:30
yyyy/M/dTHH:mm:ss
2024/1/17T15:45:30
yyyy-M-d HH:mm:ss'Z'
2024-1-17 15:45:30Z
yyyy-M-d'T'HH:mm:ss'Z'
2024-1-17T15:45:30Z
yyyy-M-d HH:mm:ss.fffffff
2024-1-17 15:45:30.1234567
yyyy-M-dd HH:mm:ss.FFFFFF
2024-1-17 15:45:30.123456
yyyy-M-dTHH:mm:ss.fff
2024-1-17T15:45:30.123
HH:mm
15:45
HH:mm:ss
15:45:30
HHmmss
154530
hh:mm:ss tt
03:45:30 PM
HH:mm:ss'Z'
15:45:30Z
By default, when you select the Synthesis option for Age values, Textual shifts the age value to a value that is within seven years before or after the original value. For age values that it cannot synthesize, it scrambles the value.
In the entity types list, to display the age synthesis options, click Options.
To configure the synthesis:
In the Range of Years +/- for the Shifted Age field, enter the number of years before and after the original value to use as the range for the synthesized value.
By default, Textual scrambles age values that it cannot parse. To instead pass through the value unchanged, uncheck Scramble Unrecognized Ages.
For Phone Number values, you can choose whether to generate a realistic phone number. If you do, then the generated values can be consistent with values generated in Structural.
In the entity types list, to display the phone number synthesis options, click Options.
From the Phone number generator type dropdown list:
To replace each phone number with a randomly generated number, select Random Number.
If you also configured a Textual statistics seed that matches a Structural statistics seed, then the synthesized values are consistent with values generated in Structural. A given source telephone number produces the same output telephone number in both applications.
For example, in both Textual and Structural, 123-456-6789 might be replaced with 154-567-8901.
The Replace invalid numbers with valid numbers checkbox determines how Textual handles invalid telephone numbers in the data.
To replace the invalid with valid telephone numbers, check the checkbox.
If you do not check the checkbox, then Textual randomly replaces the numeric characters.
HIPAA-compliant address generator. This option generates values similar to those generated by the .
Non-HIPAA address generator. This option generates values similar to those generated by the .
The formats must use a .
To generate a realistic telephone number, select US Phone Number. The US Phone Number option generates values similar to those generated by the .