1 of 3

Working with document-based data

For document-based data connectors - currently MongoDB and Amazon DynamoDB - Database View and Table View are replaced by Collection View.

"Collection" is the term that Structural uses to refer to MongoDB collections and DynamoDB tables.

Structural also allows you to run collection scans to identify the data structure.

Performing scans on collections

Required workspace permission: Run collection scan

When you first connect to a MongoDB or Amazon DynamoDB database, Tonic Structural performs a scan to determine the available fields in each collection, the field types, and how prevalent the fields are. It performs this scan at the same time as the initial sensitivity scan.

For each collection, Structural creates a hybrid document, which is a superset of all of the fields contained in the collection documents.

Configuring the collection scan

By default, for each collection:

The scan includes all of the documents in the collection, and continues until the scan is finished.
Every unique path (field+data type) in the collection is added to the hybrid document.

You can change the default scan behavior. To change the scan configuration, use the following environment settings. You can add these settings manually to the Environment Settings list on Structural Settings.

Note that these settings, including settings that include MONGO in the name, apply to both MongoDB and Amazon DynamoDB.

Configuring how schemas are scanned

The following options control the number of documents that Structural scans in a collection.

These options allow you to limit the number of scanned documents when the additional documents do not add fields to the hybrid document.

For large homogenous collections, where all or most documents have the same structure, configuring these options can improve performance.

TONIC_DOCUMENT_SCAN_MAX_DOCS_COUNT

The maximum number of documents to scan for each schema in a collection. For example, if this is 10, then Structural scans up to 10 documents, and ignores the remaining documents. When this value is empty, Structural scans all of the documents.

TONIC_DOCUMENT_SCAN_MAX_TIME_SECONDS

The maximum amount of time in seconds to scan a schema. For example, if this is 360, then Structural scans a schema for up to 360 seconds. When this value is empty, Structural continues the scan until it is complete.

If you set both options, then the scan completes when it reaches either limit. For example, if the maximum document count is 10 and the maximum scan time is 360 seconds, then the scan completes either after 10 documents or after 360 seconds, whichever comes first.

Configuring how fields are collapsed in the hybrid document

Typically, the number of unique fields in a collection is small relative to the number of documents. However, in some cases the number of fields is similar to or greater than the number of documents. This most commonly occurs when documents have "data as keys", such as keys that are ObjectIds, UUIDs, or incrementing integers.

In these cases, adding every unique field to the hybrid document can result in a large hybrid document that has an undesirable structure.

Structural offers configuration options to "collapse" fields within the hybrid document. This shrinks the size of the hybrid document. It also allows you to assign a generator to the collapsed group instead of to each unique key.

By default, Structural does not collapse fields.

Collapsing fields when the key is an ObjectId

To enable this, set the environment setting TONIC_MONGO_OBJECT_ID_COLLAPSE_THRESHOLD to the number of ObjectId keys that an object can contain before Structural collapses the object schema into a single key.

For example, if this is 10, then any object that has 10 or more ObjectId keys is collapsed into a single key.

A negative value indicates to not collapse the keys.

The default value is -1.

Collapsing fields when the key matches a custom pattern

To enable Structural to collapse fields, you provide a regular expression to identify the fields that can be collapsed into the same field. You then configure the number of matches that must exist before Structural collapses the fields.

To configure how the fields are collapsed, use the following environment settings:

TONIC_DOCUMENT_COLLAPSE_FIELDS_REGEX

The regular expression that identifies the fields that can be collapsed into a single field. By default, this value is empty.

TONIC_DOCUMENT_COLLAPSE_FIELDS_REGEX_THRESHOLD

The number of fields that match the regular expression before Structural collapses the fields into a single field. For example, if this is 5, then after Structural finds 5 fields that match the regular expression, it collapses all of the matching fields into a single field. A negative value indicates to not collapse the fields. The default value is -1.

For example:

To collapse keys that are integer values, use the regular expression [0-9]+ or \d+
To collapse keys that are UUIDs, use the regular expression [0-9a-f]{8}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{12}

Viewing the most recent scans for each collection

On Privacy Hub, the Latest Collection Scan table shows the most recent scans on each scanned collection.

The Build Schema option runs a new scan on the collection.

Starting a collection scan

When the source database has a new collection, then on Collection View, you are prompted to run a scan either on that collection or on all collections.

Using Collection View

For MongoDB and Amazon DynamoDB, Collection View replaces Database View and Table View. From Collection View, you can view the fields in a selected collection. You can then assign a collection mode to the collection, and assign generators to fields.

Selecting the collection to view

From the Collection dropdown list, select the collection to view.

Assigning a collection mode to the collection

Collection mode is the term used for table mode. The collection mode determines at the collection level how Structural uses the collection data to generate the destination database.

Available collection modes

By default, the collection mode is De-Identify. In this mode, Structural uses the assigned generators to transform the source database into the destination database.

For MongoDB and DynamoDB, the only other options are Truncate and Preserve Destination.

Truncate means that only the collection structure is included in the destination database. The collection has no data in the destination database.
Preserve Destination means that Tonic does not change the data that is currently in the destination database.

Assigning the collection mode

Required workspace permission: Assign table modes

To assign the collection mode:

Click the Collection Mode dropdown list.
On the panel, click the current collection mode.
From the drop-down list, select the mode to use.

Selecting the type of view

You can view a collection either as a hybrid document or as single documents. From the View dropdown list, select the view to use.

Hybrid document view

The default view is Hybrid Document. For the hybrid document view, the key list reflects all of the permutations of every field from every document. For example, a field might sometimes be a datetime value and sometimes a string. Hybrid document view lists both types.

Single document view

Single Document view displays a single document at a time. You can then page through up to 100 documents. For each document, you see the structure for that particular document.

Information on the field list

For each field, Collection View always displays:

The field name and type.
For fields that you configured as primary or foreign keys, a key icon.
The assigned generator.
An example value. For the hybrid view, you can use the magnifying glass icon to display additional example values.

For the hybrid document view, there is also a Field Freq column. Field Freq shows the percentage of documents that contain that permutation of field and type.

For example, you might see that a field is Null 33% of the time and contains a numeric value 67% of the time. Or a field value is an Int32 value 3% of the time and an Int64 value 6% of the time. The percentages apply to the first 100 documents.

Toggling between source and preview data

Required workspace permission:

Source data: Preview source data
Destination data: Preview destination data

The Preview toggle at the top right of Collection View allows you to choose whether to display original source data or the transformed data. You can switch back and forth to see exactly how Tonic Structural transforms the data based on the collection and field configuration.

By default, the Preview toggle is in the on position, and the displayed data reflects the selected collection mode and the assigned generators. For collections that use Truncate mode, the preview data is empty. Truncated collections do not have data in the destination database.

To display the original source data, toggle Preview to the off position.

Filtering collection fields

In the single document view, you can filter the fields by either the field name or the field value.

In the hybrid document view, you can filter the fields based on either the field name or field properties.

Filtering single document view by field name or value

You can filter single document view to only display fields that have specific text in either the field name or the field value.

To filter by value, toggle Search by Value to the on position.

After you select the filter type, in the search field, type text that is in the field name or value. As you type, Structural filters the list to only include fields that contain the filter text.

Filtering hybrid view by field name

To filter hybrid view by field name, in the search field, begin to type text that is in the field name. As you type, Structural filters the list to only include fields with names that include the filter text.

Filtering hybrid view by field properties

From the hybrid document view, you can filter the fields based on field properties.

To display the Filters panel, click Filters.

Searching for a filter

To search for a filter or a filter value, in the search field, start to type the value. The search looks for text in the individual settings.

Adding a filter

To add a filter, depending on the filter type, either check the checkbox or select a filter option. As you add filters, Structural applies them to the field list.

Above the list, Structural displays tags for the selected filters.

Clearing the selected filters

To clear all of the currently selected filters, click Clear All.

Filters panel filters

The Filters panel in hybrid view includes the following fields.

At-risk fields

An at-risk field:

Is marked as sensitive
Is assigned the Passthrough generator.

To only display at-risk fields, on the Filters panel, check At-Risk Field.

When you check At-Risk Field, Structural adds the following filters under Privacy Settings:

Sets the sensitivity filter to Sensitive.
Sets the protection status filter to Not protected.

Sensitivity

You can filter the fields based on the field sensitivity.

On the Filters panel, under Privacy Settings, the sensitivity filter is by default set to All, which indicates to display both sensitive and non-sensitive fields.

To only display sensitive fields, click Sensitive.
To only display non-sensitive fields, click Not sensitive.

Note that when you check At-risk Field, Structural automatically selects Sensitive.

Protection status

You can filter the fields based on whether they have any generator other than Passthrough assigned.

On the Filters panel, under Privacy Settings, the field protection filter is by default set to All, which indicates to display both protected and not protected fields.

To only display fields that have an assigned generator, click Protected.
To only display fields that do not have an assigned generator, click Not protected.

Note that when you check At-Risk Field, Structural automatically selects Not protected.

Recommended generators

When Structural detects that a field is sensitive, it can also determine a recommended generator.

For example, when it detects a name value, it also recommends the Name generator.

You can filter the fields to display the fields that have recommended generators.

On the Filters panel, under Recommended Generators, check the checkbox next to the recommended generator for which to display the fields that have that recommendation.

Field data type

You can filter the fields by the field data type. For example, you might only display columns that contain either numeric or integer values.

To only display fields that have specific data types, on the Filters panel, under Database Data Types, check the checkbox for each data type to include.

The list of data types only includes data types that are present in the currently displayed fields and that are compatible with other applied filters.

To search for a specific data type, in the Filters search field, begin to type the data type.

Unresolved schema changes

When the source database schema changes, you might need to update the configuration to reflect those changes. If you do not resolve the schema changes, then the data generation might fail. The data generation fails if there are unresolved conflicting changes, or if you configure Structural to always fail data generation when there are any unresolved changes.

For more information about schema changes, go to Viewing and resolving schema changes.

To only display fields that have unresolved schema changes, on the Filters panel, check Unresolved Schema Changes.

Sensitivity type

For detected sensitive fields, the sensitivity type indicates the type of data that was detected. Examples of sensitivity types include First Name, Address, and Email.

To only display fields that contain specific sensitivity types, on the Filters panel, under Sensitivity Type, check the checkbox for each sensitivity type to include.

The list of sensitivity types only includes sensitivity types that are present in the currently displayed fields.

To search for a specific sensitivity type, in the Filters search field, type the sensitivity type.

Sensitivity confidence

When the Structural sensitivity scan identifies a value as belonging to a sensitivity type, it also determines how confident it is in that determination.

You can filter the columns based on the confidence level.

To only display columns that have a specific confidence level, on the Filters panel, under Sensitivity confidence, check the checkbox next to each confidence level to include.

Primary or foreign keys

You can filter the column list to indicate whether to include:

Columns that are not primary or foreign keys.
Columns that are foreign keys.
Columns that are primary keys.

On the Filters panel, under Field Type:

To display fields that are neither a primary key nor a foreign key, check Non-keyed.
To display fields that are primary keys, check Primary key.
To display fields that are foreign keys, check Foreign key.

Commenting on fields

Required license: Professional or Enterprise

You can add comments to fields. For example, you might use a comment to explain why you selected a particular generator or marked a field as sensitive or not sensitive.

Adding a new comment

If a field does not have any comments, then to add a comment:

Click the comment icon.
In the comment field, type the comment text.
Click Comment.

Replying to an existing comment

When a field has existing comments, the comment icon is green. To add comments:

Click the comment icon. The comments panel shows the previous comments. Each comment includes the comment user and timestamp.
In the comment field, type the comment text.
Click Reply.

Indicating whether a field is sensitive

Required workspace permission: Configure column sensitivity

On the field configuration panel, the sensitivity toggle at the top right indicates whether the field is marked as sensitive.

To mark a field as sensitive, toggle the setting to the Sensitive position.

To mark a field as not sensitive, toggle the setting to the Not Sensitive position.

Assigning a generator to a field and type

Required workspace permission: Configure column generators

You can assign a generator to each combination of field and type. For example, depending on the document, the data type for a field might be either string or integer. You can indicate to use the Character Scramble generator when the field type is a string and the Random Integer generator when the field type is integer.

In hybrid document view, the Null type reflects when the column value is Null. You do not assign a generator to it.

To assign a generator:

Click the generator value for the field.
On the configuration panel, from the Generator Type dropdown list, select the generator.
Configure the generator options. For details about the available configuration options for each generator, go to the Generator reference.

Disabling examples for sparse collections

By default, Structural retrieves 100 documents. It then uses the data in these documents to populate example values in the hybrid document.

For sparsely populated collections, where less common fields are not present in those 100 documents, Structural retrieves extra documents until it has example values for all fields. For very sparsely populated collections, this might cause the collection view to load slowly, because it must retrieve many documents.

To disable examples for sparse collections, set the environment setting TONIC_MONGO_DISABLE_EXTRA_EXAMPLES to true. You can add this setting manually to the Environment Settings list on Structural Settings.

Note that this setting applies to both MongoDB and Amazon DynamoDB.

When this setting is true, fields that do not have a retrieved value use a dummy default value that is based on the data type.

Using Collection View

Selecting the collection to view

From the Collection dropdown list, select the collection to view.

Assigning a collection mode to the collection

Collection mode is the term used for table mode. The collection mode determines at the collection level how Structural uses the collection data to generate the destination database.

Available collection modes

By default, the collection mode is De-Identify. In this mode, Structural uses the assigned generators to transform the source database into the destination database.

For MongoDB and DynamoDB, the only other options are Truncate and Preserve Destination.

Truncate means that only the collection structure is included in the destination database. The collection has no data in the destination database.
Preserve Destination means that Tonic does not change the data that is currently in the destination database.

Assigning the collection mode

Required workspace permission: Assign table modes

To assign the collection mode:

Click the Collection Mode dropdown list.
On the panel, click the current collection mode.
From the drop-down list, select the mode to use.

Selecting the type of view

You can view a collection either as a hybrid document or as single documents. From the View dropdown list, select the view to use.

Hybrid document view

Single document view

Single Document view displays a single document at a time. You can then page through up to 100 documents. For each document, you see the structure for that particular document.

Information on the field list

For each field, Collection View always displays:

The field name and type.
For fields that you configured as primary or foreign keys, a key icon.
The assigned generator.
An example value. For the hybrid view, you can use the magnifying glass icon to display additional example values.

For the hybrid document view, there is also a Field Freq column. Field Freq shows the percentage of documents that contain that permutation of field and type.

Toggling between source and preview data

Required workspace permission:

Source data: Preview source data
Destination data: Preview destination data

To display the original source data, toggle Preview to the off position.

Filtering collection fields

In the single document view, you can filter the fields by either the field name or the field value.

In the hybrid document view, you can filter the fields based on either the field name or field properties.

Filtering single document view by field name or value

You can filter single document view to only display fields that have specific text in either the field name or the field value.

To filter by value, toggle Search by Value to the on position.

After you select the filter type, in the search field, type text that is in the field name or value. As you type, Structural filters the list to only include fields that contain the filter text.

Filtering hybrid view by field name

Filtering hybrid view by field properties

From the hybrid document view, you can filter the fields based on field properties.

To display the Filters panel, click Filters.

Searching for a filter

To search for a filter or a filter value, in the search field, start to type the value. The search looks for text in the individual settings.

Adding a filter

To add a filter, depending on the filter type, either check the checkbox or select a filter option. As you add filters, Structural applies them to the field list.

Above the list, Structural displays tags for the selected filters.

Clearing the selected filters

To clear all of the currently selected filters, click Clear All.

Filters panel filters

The Filters panel in hybrid view includes the following fields.

At-risk fields

An at-risk field:

Is marked as sensitive
Is assigned the Passthrough generator.

To only display at-risk fields, on the Filters panel, check At-Risk Field.

When you check At-Risk Field, Structural adds the following filters under Privacy Settings:

Sets the sensitivity filter to Sensitive.
Sets the protection status filter to Not protected.

Sensitivity

You can filter the fields based on the field sensitivity.

On the Filters panel, under Privacy Settings, the sensitivity filter is by default set to All, which indicates to display both sensitive and non-sensitive fields.

To only display sensitive fields, click Sensitive.
To only display non-sensitive fields, click Not sensitive.

Note that when you check At-risk Field, Structural automatically selects Sensitive.

Protection status

You can filter the fields based on whether they have any generator other than Passthrough assigned.

On the Filters panel, under Privacy Settings, the field protection filter is by default set to All, which indicates to display both protected and not protected fields.

To only display fields that have an assigned generator, click Protected.
To only display fields that do not have an assigned generator, click Not protected.

Note that when you check At-Risk Field, Structural automatically selects Not protected.

Recommended generators

When Structural detects that a field is sensitive, it can also determine a recommended generator.

For example, when it detects a name value, it also recommends the Name generator.

You can filter the fields to display the fields that have recommended generators.

On the Filters panel, under Recommended Generators, check the checkbox next to the recommended generator for which to display the fields that have that recommendation.

Field data type

You can filter the fields by the field data type. For example, you might only display columns that contain either numeric or integer values.

To only display fields that have specific data types, on the Filters panel, under Database Data Types, check the checkbox for each data type to include.

The list of data types only includes data types that are present in the currently displayed fields and that are compatible with other applied filters.

To search for a specific data type, in the Filters search field, begin to type the data type.

Unresolved schema changes

For more information about schema changes, go to Viewing and resolving schema changes.

To only display fields that have unresolved schema changes, on the Filters panel, check Unresolved Schema Changes.

Sensitivity type

For detected sensitive fields, the sensitivity type indicates the type of data that was detected. Examples of sensitivity types include First Name, Address, and Email.

To only display fields that contain specific sensitivity types, on the Filters panel, under Sensitivity Type, check the checkbox for each sensitivity type to include.

The list of sensitivity types only includes sensitivity types that are present in the currently displayed fields.

To search for a specific sensitivity type, in the Filters search field, type the sensitivity type.

Sensitivity confidence

When the Structural sensitivity scan identifies a value as belonging to a sensitivity type, it also determines how confident it is in that determination.

You can filter the columns based on the confidence level.

To only display columns that have a specific confidence level, on the Filters panel, under Sensitivity confidence, check the checkbox next to each confidence level to include.

Primary or foreign keys

You can filter the column list to indicate whether to include:

Columns that are not primary or foreign keys.
Columns that are foreign keys.
Columns that are primary keys.

On the Filters panel, under Field Type:

To display fields that are neither a primary key nor a foreign key, check Non-keyed.
To display fields that are primary keys, check Primary key.
To display fields that are foreign keys, check Foreign key.

Commenting on fields

Required license: Professional or Enterprise

You can add comments to fields. For example, you might use a comment to explain why you selected a particular generator or marked a field as sensitive or not sensitive.

Adding a new comment

If a field does not have any comments, then to add a comment:

Click the comment icon.
In the comment field, type the comment text.
Click Comment.

Replying to an existing comment

When a field has existing comments, the comment icon is green. To add comments:

Click the comment icon. The comments panel shows the previous comments. Each comment includes the comment user and timestamp.
In the comment field, type the comment text.
Click Reply.

Indicating whether a field is sensitive

Required workspace permission: Configure column sensitivity

On the field configuration panel, the sensitivity toggle at the top right indicates whether the field is marked as sensitive.

To mark a field as sensitive, toggle the setting to the Sensitive position.

To mark a field as not sensitive, toggle the setting to the Not Sensitive position.

Assigning a generator to a field and type

Required workspace permission: Configure column generators

In hybrid document view, the Null type reflects when the column value is Null. You do not assign a generator to it.

To assign a generator:

Click the generator value for the field.
On the configuration panel, from the Generator Type dropdown list, select the generator.
Configure the generator options. For details about the available configuration options for each generator, go to the Generator reference.

Disabling examples for sparse collections

By default, Structural retrieves 100 documents. It then uses the data in these documents to populate example values in the hybrid document.

Note that this setting applies to both MongoDB and Amazon DynamoDB.

When this setting is true, fields that do not have a retrieved value use a dummy default value that is based on the data type.

Performing scans on collections

Required workspace permission: Run collection scan

For each collection, Structural creates a hybrid document, which is a superset of all of the fields contained in the collection documents.

Configuring the collection scan

By default, for each collection:

The scan includes all of the documents in the collection, and continues until the scan is finished.
Every unique path (field+data type) in the collection is added to the hybrid document.

Note that these settings, including settings that include MONGO in the name, apply to both MongoDB and Amazon DynamoDB.

Configuring how schemas are scanned

The following options control the number of documents that Structural scans in a collection.

These options allow you to limit the number of scanned documents when the additional documents do not add fields to the hybrid document.

For large homogenous collections, where all or most documents have the same structure, configuring these options can improve performance.

TONIC_DOCUMENT_SCAN_MAX_DOCS_COUNT

TONIC_DOCUMENT_SCAN_MAX_TIME_SECONDS

Configuring how fields are collapsed in the hybrid document

In these cases, adding every unique field to the hybrid document can result in a large hybrid document that has an undesirable structure.

By default, Structural does not collapse fields.

Collapsing fields when the key is an ObjectId

For example, if this is 10, then any object that has 10 or more ObjectId keys is collapsed into a single key.

A negative value indicates to not collapse the keys.

The default value is -1.

Collapsing fields when the key matches a custom pattern

To configure how the fields are collapsed, use the following environment settings:

TONIC_DOCUMENT_COLLAPSE_FIELDS_REGEX

The regular expression that identifies the fields that can be collapsed into a single field. By default, this value is empty.

TONIC_DOCUMENT_COLLAPSE_FIELDS_REGEX_THRESHOLD

For example:

To collapse keys that are integer values, use the regular expression [0-9]+ or \d+
To collapse keys that are UUIDs, use the regular expression [0-9a-f]{8}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{12}

Viewing the most recent scans for each collection

On Privacy Hub, the Latest Collection Scan table shows the most recent scans on each scanned collection.

The Build Schema option runs a new scan on the collection.

Starting a collection scan

When the source database has a new collection, then on Collection View, you are prompted to run a scan either on that collection or on all collections.