LogoLogo
Release notesAPI docsDocs homeStructural CloudTonic.ai
  • Tonic Structural User Guide
  • About Tonic Structural
    • Structural data generation workflow
    • Structural deployment types
    • Structural implementation roles
    • Structural license plans
  • Logging into Structural for the first time
  • Getting started with the Structural free trial
  • Managing your user account
  • Frequently Asked Questions
  • Tutorial videos
  • Creating and managing workspaces
    • Managing workspaces
      • Viewing your list of workspaces
      • Creating, editing, or deleting a workspace
      • Workspace configuration settings
        • Workspace identification and connection type
        • Data connection settings
        • Configuring secrets managers for database connections
        • Data generation settings
        • Enabling and configuring upsert
        • Writing output to Tonic Ephemeral
        • Writing output to a container repository
        • Advanced workspace overrides
      • About the workspace management view
      • About workspace inheritance
      • Assigning tags to a workspace
      • Exporting and importing the workspace configuration
    • Managing access to workspaces
      • Sharing workspace access
      • Transferring ownership of a workspace
    • Viewing workspace jobs and job details
  • Configuring data generation
    • Privacy Hub
    • Database View
      • Viewing and configuring tables
      • Viewing the column list
      • Displaying sample data for a column
      • Configuring an individual column
      • Configuring multiple columns
      • Identifying similar columns
      • Commenting on columns
    • Table View
    • Working with document-based data
      • Performing scans on collections
      • Using Collection View
    • Identifying sensitive data
      • Running the Structural sensitivity scan
      • Manually indicating whether a column is sensitive
      • Built-in sensitivity types that Structural detects
      • Creating and managing custom sensitivity rules
    • Table modes
    • Generator information
      • Generator summary
      • Generator reference
        • Address
        • Algebraic
        • Alphanumeric String Key
        • Array Character Scramble
        • Array JSON Mask
        • Array Regex Mask
        • ASCII Key
        • Business Name
        • Categorical
        • Character Scramble
        • Character Substitution
        • Company Name
        • Conditional
        • Constant
        • Continuous
        • Cross Table Sum
        • CSV Mask
        • Custom Categorical
        • Date Truncation
        • Email
        • Event Timestamps
        • File Name
        • Find and Replace
        • FNR
        • Geo
        • HIPAA Address
        • Hostname
        • HStore Mask
        • HTML Mask
        • Integer Key
        • International Address
        • IP Address
        • JSON Mask
        • MAC Address
        • Mongo ObjectId Key
        • Name
        • Noise Generator
        • Null
        • Numeric String Key
        • Passthrough
        • Phone
        • Random Boolean
        • Random Double
        • Random Hash
        • Random Integer
        • Random Timestamp
        • Random UUID
        • Regex Mask
        • Sequential Integer
        • Shipping Container
        • SIN
        • SSN
        • Struct Mask
        • Timestamp Shift Generator
        • Unique Email
        • URL
        • UUID Key
        • XML Mask
      • Generator characteristics
        • Enabling consistency
        • Linking generators
        • Differential privacy
        • Partitioning a column
        • Data-free generators
        • Supporting uniqueness constraints
        • Format-preserving encryption (FPE)
      • Generator types
        • Composite generators
        • Primary key generators
    • Generator assignment and configuration
      • Reviewing and applying recommended generators
      • Assigning and configuring generators
      • Document View for file connector JSON columns
      • Generator hints and tips
      • Managing generator presets
      • Configuring and using Structural data encryption
      • Custom value processors
    • Subsetting data
      • About subsetting
      • Using table filtering for data warehouses and Spark-based data connectors
      • Viewing the current subsetting configuration
      • Subsetting and foreign keys
      • Configuring subsetting
      • Viewing and managing configuration inheritance
      • Viewing the subset creation steps
      • Viewing previous subsetting data generation runs
      • Generating cohesive subset data from related databases
      • Other subsetting hints and tips
    • Viewing and adding foreign keys
    • Viewing and resolving schema changes
    • Tracking changes to workspaces, generator presets, and sensitivity rules
    • Using the Privacy Report to verify data protection
  • Running data generation
    • Running data generation jobs
      • Types of data generation
      • Data generation process
      • Running data generation manually
      • Scheduling data generation
      • Issues that prevent data generation
    • Managing data generation performance
    • Viewing and downloading container artifacts
    • Post-job scripts
    • Webhooks
  • Installing and Administering Structural
    • Structural architecture
    • Using Structural securely
    • Deploying a self-hosted Structural instance
      • Deployment checklist
      • System requirements
      • Deploying with Docker Compose
      • Deploying on Kubernetes with Helm
      • Enabling the option to write output data to a container repository
        • Setting up a Kubernetes cluster to use to write output data to a container repository
        • Required access to write destination data to a container repository
      • Entering and updating your license key
      • Setting up host integration
      • Working with the application database
      • Setting up a secret
      • Setting a custom certificate
    • Using Structural Cloud
      • Structural Cloud notes
      • Setting up and managing a Structural Cloud pay-as-you-go subscription
      • Structural Cloud onboarding
    • Managing user access to Structural
      • Structural organizations
      • Determining whether users can create accounts
      • Creating a new account in an existing organization
      • Single sign-on (SSO)
        • Structural user authentication with SSO
        • Enabling and configuring SSO on Structural Cloud
        • Synchronizing SSO groups with Structural
        • Viewing the list of SSO groups in Tonic Structural
        • AWS IAM Identity Center
        • Duo
        • GitHub
        • Google
        • Keycloak
        • Microsoft Entra ID (previously Azure Active Directory)
        • Okta
        • OpenID Connect (OIDC)
        • SAML
      • Managing Structural users
      • Managing permissions
        • About permission sets
        • Built-in permission sets
        • Available permissions
        • Viewing the lists of global and workspace permission sets
        • Configuring custom permission sets
        • Selecting default permission sets
        • Configuring access to global permission sets
        • Setting initial access to all global permissions
        • Granting Account Admin access for a Structural Cloud organization
    • Structural monitoring and logging
      • Monitoring Structural services
      • Performing health checks
      • Downloading the usage report
      • Tracking user access and permissions
      • Redacted and diagnostic (unredacted) logs
      • Data that Tonic.ai collects
      • Verifying and enabling telemetry sharing
    • Configuring environment settings
    • Updating Structural
  • Connecting to your data
    • About data connectors
    • Overview for database administrators
    • Data connector summary
    • Amazon DynamoDB
      • System requirements and limitations for DynamoDB
      • Structural differences and limitations with DynamoDB
      • Before you create a DynamoDB workspace
      • Configuring DynamoDB workspace data connections
    • Amazon EMR
      • Structural process overview for Amazon EMR
      • System requirements for Amazon EMR
      • Structural differences and limitations with Amazon EMR
      • Before you create an Amazon EMR workspace
        • Creating IAM roles for Structural and Amazon EMR
        • Creating Athena workgroups
        • Configuration for cross-account setups
      • Configuring Amazon EMR workspace data connections
    • Amazon Redshift
      • Structural process overview for Amazon Redshift
      • Structural differences and limitations with Amazon Redshift
      • Before you create an Amazon Redshift workspace
        • Required AWS instance profile permissions for Amazon Redshift
        • Setting up the AWS Lambda role for Amazon Redshift
        • AWS KMS permissions for Amazon SQS message encryption
        • Amazon Redshift-specific Structural environment settings
        • Source and destination database permissions for Amazon Redshift
      • Configuring Amazon Redshift workspace data connections
    • Databricks
      • Structural process overview for Databricks
      • System requirements for Databricks
      • Structural differences and limitations with Databricks
      • Before you create a Databricks workspace
        • Granting access to storage
        • Setting up your Databricks cluster
        • Configuring the destination database schema creation
      • Configuring Databricks workspace data connections
    • Db2 for LUW
      • System requirements for Db2 for LUW
      • Structural differences and limitations with Db2 for LUW
      • Before you create a Db2 for LUW workspace
      • Configuring Db2 for LUW workspace data connections
    • File connector
      • Overview of the file connector process
      • Supported file and content types
      • Structural differences and limitations with the file connector
      • Before you create a file connector workspace
      • Configuring the file connector storage type and output options
      • Managing file groups in a file connector workspace
      • Downloading generated file connector files
    • Google BigQuery
      • Structural differences and limitations with Google BigQuery
      • Before you create a Google BigQuery workspace
      • Configuring Google BigQuery workspace data connections
      • Resolving schema changes for de-identified views
    • MongoDB
      • System requirements for MongoDB
      • Structural differences and limitations with MongoDB
      • Configuring MongoDB workspace data connections
      • Other MongoDB hints and tips
    • MySQL
      • System requirements for MySQL
      • Before you create a MySQL workspace
      • Configuring MySQL workspace data connections
    • Oracle
      • Known limitations for Oracle schema objects
      • System requirements for Oracle
      • Structural differences and limitations with Oracle
      • Before you create an Oracle workspace
      • Configuring Oracle workspace data connections
    • PostgreSQL
      • System requirements for PostgreSQL
      • Before you create a PostgreSQL workspace
      • Configuring PostgreSQL workspace data connections
    • Salesforce
      • System requirements for Salesforce
      • Structural differences and limitations with Salesforce
      • Before you create a Salesforce workspace
      • Configuring Salesforce workspace data connections
    • Snowflake on AWS
      • Structural process overviews for Snowflake on AWS
      • Structural differences and limitations with Snowflake on AWS
      • Before you create a Snowflake on AWS workspace
        • Required AWS instance profile permissions for Snowflake on AWS
        • Other configuration for Lambda processing
        • Source and destination database permissions for Snowflake on AWS
        • Configuring whether Structural creates the Snowflake on AWS destination database schema
      • Configuring Snowflake on AWS workspace data connections
    • Snowflake on Azure
      • Structural process overview for Snowflake on Azure
      • Structural differences and limitations with Snowflake on Azure
      • Before you create a Snowflake on Azure workspace
      • Configuring Snowflake on Azure workspace data connections
    • Spark SDK
      • Structural process overview for the Spark SDK
      • Structural differences and limitations with the Spark SDK
      • Configuring Spark SDK workspace data connections
      • Using Spark to run de-identification of the data
    • SQL Server
      • System requirements for SQL Server
      • Before you create a SQL Server workspace
      • Configuring SQL Server workspace data connections
    • Yugabyte
      • System requirements for Yugabyte
      • Structural differences and limitations with Yugabyte
      • Before you create a Yugabyte workspace
      • Configuring Yugabyte workspace data connections
      • Troubleshooting Yugabyte data generation issues
  • Using the Structural API
    • About the Structural API
    • Getting an API token
    • Getting the workspace ID
    • Using the Structural API to perform tasks
      • Configure environment settings
      • Manage generator presets
        • Retrieving the list of generator presets
        • Structure of a generator preset
        • Creating a custom generator preset
        • Updating an existing generator preset
        • Deleting a generator preset
      • Manage custom sensitivity rules
      • Create a workspace
      • Connect to source and destination data
      • Manage file groups in a file connector workspace
      • Assign table modes and filters to source database tables
      • Set column sensitivity
      • Assign generators to columns
        • Getting the generator IDs and available metadata
        • Updating generator configurations
        • Structure of a generator assignment
        • Generator API reference
          • Address (AddressGenerator)
          • Algebraic (AlgebraicGenerator)
          • Alphanumeric String Key (AlphaNumericPkGenerator)
          • Array Character Scramble (ArrayTextMaskGenerator)
          • Array JSON Mask (ArrayJsonMaskGenerator)
          • Array Regex Mask (ArrayRegexMaskGenerator)
          • ASCII Key (AsciiPkGenerator)
          • Business Name (BusinessNameGenerator)
          • Categorical (CategoricalGenerator)
          • Character Scramble (TextMaskGenerator)
          • Character Substitution (StringMaskGenerator)
          • Company Name (CompanyNameGenerator)
          • Conditional (ConditionalGenerator)
          • Constant (ConstantGenerator)
          • Continuous (GaussianGenerator)
          • Cross Table Sum (CrossTableAggregateGenerator)
          • CSV Mask (CsvMaskGenerator)
          • Custom Categorical (CustomCategoricalGenerator)
          • Date Truncation (DateTruncationGenerator)
          • Email (EmailGenerator)
          • Event Timestamps (EventGenerator)
          • File Name (FileNameGenerator)
          • Find and Replace (FindAndReplaceGenerator)
          • FNR (FnrGenerator)
          • Geo (GeoGenerator)
          • HIPAA Address (HipaaAddressGenerator)
          • Hostname (HostnameGenerator)
          • HStore Mask (HStoreMaskGenerator)
          • HTML Mask (HtmlMaskGenerator)
          • Integer Key (IntegerPkGenerator)
          • International Address (InternationalAddressGenerator)
          • IP Address (IPAddressGenerator)
          • JSON Mask (JsonMaskGenerator)
          • MAC Address (MACAddressGenerator)
          • Mongo ObjectId Key (ObjectIdPkGenerator)
          • Name (NameGenerator)
          • Noise Generator (NoiseGenerator)
          • Null (NullGenerator)
          • Numeric String Key (NumericStringPkGenerator)
          • Passthrough (PassthroughGenerator)
          • Phone (USPhoneNumberGenerator)
          • Random Boolean (RandomBooleanGenerator)
          • Random Double (RandomDoubleGenerator)
          • Random Hash (RandomStringGenerator)
          • Random Integer (RandomIntegerGenerator)
          • Random Timestamp (RandomTimestampGenerator)
          • Random UUID (UUIDGenerator)
          • Regex Mask (RegexMaskGenerator)
          • Sequential Integer (UniqueIntegerGenerator)
          • Shipping Container (ShippingContainerGenerator)
          • SIN (SINGenerator)
          • SSN (SsnGenerator)
          • Struct Mask (StructMaskGenerator)
          • Timestamp Shift (TimestampShiftGenerator)
          • Unique Email (UniqueEmailGenerator)
          • URL (UrlGenerator)
          • UUID Key (UuidPkGenerator)
          • XML Mask (XmlMaskGenerator)
      • Configure subsetting
      • Check for and resolve schema changes
      • Run data generation jobs
      • Schedule data generation jobs
    • Example script: Starting a data generation job
    • Example script: Polling for a job status and creating a Docker package
Powered by GitBook
On this page
  • File group restrictions
  • Viewing the list of file groups
  • Displaying details for a file group
  • Creating, editing, and deleting file groups
  • Creating a file group
  • Adding files to a file group
  • Previewing added files
  • Deleting files from a file group
  • Deleting a file group
  • Selecting local files
  • Selecting cloud storage or file mount files
  • Navigating to and selecting files and folders
  • Filtering files by file extension
  • Using prefix patterns for automatic file and folder selection
  • Configuring whether to only process new files
  • Configuring delimiters and file settings for .csv files
  • File header row
  • Managing spaces
  • Encoding
  • Delimiters and special characters
  • Options to skip first and last rows

Was this helpful?

Export as PDF
  1. Connecting to your data
  2. File connector

Managing file groups in a file connector workspace

Last updated 10 days ago

Was this helpful?

Required workspace permission: Manage file connector file groups

To identify the source files to transform, you create file groups. A file group is a set of source files that have an identical format and structure.

For local files, you always select each file individually.

For cloud storage and file mounts, you can select individual files and folders. You can also filter by file extension and automatically select files based on prefix patterns.

File group restrictions

The selected files must use a .

Within a file group:

  • All files must use the same format. For example, you cannot have both CSV and XML content in the same file group.

  • All files must have the same structure. For example, for a file group that contains CSV files, the content in all of the files must contain the same columns and use the same delimiters.

  • You can combine .txt and .gzip files with other files, as long as the file content in all of the files has the same format and structure. For example, a file group can contain a .txt file, a .csv file, and a .gzip file, as long they all contain CSV content that has the same structure.

After you add the first file to the file group, Tonic Structural does not allow you to select files that do not have the same format.

Viewing the list of file groups

On the workspace management view for a file connector workspace, to display the list of file groups, click File Groups.

For each file group, the file group list includes:

  • The name of the file group.

  • The type of file that the file group contains.

  • The number of files in the file group.

  • For cloud storage, the number of prefix patterns.

  • When the file group was most recently modified.

To filter the file group list, begin to type the file group name. As you type, the list is filtered to only include the matching file groups.

Displaying details for a file group

On File Groups view, to display the details for a file group, click the file group name.

The file group details view includes:

  • The type of content in the files.

  • For cloud storage files, any file type filters.

  • A list of files that were manually selected. For local file workspaces, you select all files individually. For cloud storage workspaces, you can select files individually or use prefix filters. The Individual Files tab lists the individually selected files.

  • For cloud storage files, a Prefix Patterns tab that contains the list of prefix patterns.

To filter the file or prefix path list, in the filter field, begin to type the name. As you type, the list is updated to only include matching items.

Creating, editing, and deleting file groups

Creating a file group

On File Groups view, to create and populate a new file group:

  1. Click Create group.

  2. Under File group name, enter a name for the file group.

  3. Select the file group files:

    • Selecting local files

    • Selecting cloud storage or file mount files

  4. Click Save.

Adding files to a file group

You can add files to an existing file group. The added files must have the same format and structure as the files that are already in the group.

To add files to a file group:

  1. Either:

    • On the file group details view, click Add Files.

    • On File Groups view, click the + icon for the file group.

  1. Change the file selection:

    • Selecting local files

    • Selecting cloud storage or file mount files

  2. Click Save.

Previewing added files

When you select an individual file to add, Structural automatically displays a preview of the file content. You can use the preview to verify the file content.

Note that for Parquet files, there is no file preview.

To hide the preview, click Hide Preview. To restore the preview, click Show Preview.

For local file file groups, you cannot see a preview for existing files that you added previously. For cloud storage file groups, you can preview existing files.

Deleting files from a file group

For a local files file group, you can delete any file from the group.

For a cloud storage file group, you can only delete files that you selected manually. You cannot delete files that were added by a prefix pattern. You can only change or remove the prefix pattern.

To remove a file from a file group, on the file group details view, click the delete icon for the file.

To remove multiple files:

  1. Check the checkbox for each file to remove.

  2. Click Actions, then click Delete Files.

For local files, when you delete the file from the file group, Structural also deletes the file from the Structural application database.

Deleting a file group

To delete a file group, on the File Groups view, click the delete icon for the file group.

For local files, when you delete a file group, Structural also deletes the file group's files from the Structural application database.

Selecting local files

For local files, you always select the individual files to include in the file group.

To add files to the file group from a local file system, either:

  • Drag and drop files from your file system to the add files panel.

  • Click Select files to upload, then navigate to and select the files.

To remove a selected file, click its delete icon.

Selecting cloud storage or file mount files

For a cloud storage or file mount file group, you can select individual files and folders. When you select a folder, the file group automatically includes the files in that folder.

You can filter the available files based on file type. You can also automatically select files and folders based on their fully qualified path.

Finally, you can configure whether the data generation process only transforms files that were added since the previous data generation.

Navigating to and selecting files and folders

Under Select folders and files to add to file group, navigate to the folder where the file is located. For cloud storage, you can only view and select from buckets and files that the associated IAM user (for Amazon S3) or Google Cloud Platform credentials (for GCS) is granted access to. For more information about the required permissions, go to Before you create a file connector workspace. You can use the search field to search for a particular bucket, folder, or file. For the bucket list, you can search based on any text in the bucket name. For a folder or file, the search text is matched against the beginning of the folder or file name. The file browser can only display a limited number of items. Structural warns you when the number of items reaches the limit. You must use the file filter to locate items that are not displayed.

To select a file, click its name or the file checkbox.

To select a folder, click the folder checkbox. When you select a folder, Structural automatically adds a prefix filter for the folder.

Filtering files by file extension

You use the File types dropdown list to filter the file extensions for the files to include. You can select multiple file types, as long as files of the selected types can be compatible with each other. For example, you cannot filter the files to include both .json and .csv files.

By default, the selected file type is All file types, and the files are not filtered by file extension.

To add a file extension filter:

  1. Click the File extensions dropdown list.

  2. Click each file type to include.

Using prefix patterns for automatic file and folder selection

Prefix patterns allow you to automatically select files based on paths. A prefix pattern is a fully qualified path. When you select a folder, Structural automatically adds the folder as a prefix pattern.

You can specify more than one prefix pattern.

To add a prefix pattern:

  1. In the Prefix pattern field, type the path for which to include folders and files.

  2. Click the add icon.

To remove a prefix pattern, click its delete icon.

When you add a prefix pattern, Structural automatically selects the files that match both the prefix pattern and any file extension filters.

Configuring whether to only process new files

The first time you generate data from a file connector workspace, Structural processes all of the files.

For subsequent data generations, the Only process new files configuration indicates whether Structural processes all of the files in the file group, or only processes files that were added to the file group since the most recent data generation.

For example, for a folder that is in the file group, you might have a regular process that adds new files on a regular basis. In that case, you would only want Structural to process new files, and ignore all of the files that it processed before.

By default, Only process new files is toggled to the on position, and Structural only processes new files. To always process all of the files in the file group, toggle Only process new files to the off position.

Configuring delimiters and file settings for .csv files

For files that contain CSV content, you use the delimiter and file settings fields to provide information about the file structure:

Structural uses these settings to read and write the files.

After you save the file group, you cannot change most of these settings. You can change the Quote spaces setting.

Files that have a different delimiter configuration must be in a different file group.

File header row

If the file contains a header row, then toggle First row is column header to the on position.

Managing spaces

To configure how to process spaces:

  1. The Quote spaces setting indicates whether to enclose spaces in quotes in the output files. You can change this setting after you save the file group.

  2. The Trim whitespace setting indicates whether to trim whitespace from before or after the values when the file is uploaded.

Encoding

To specify the type of encoding that the file uses:

  1. Toggle Specify encoding to the on position.

  2. From the dropdown list, select the type of encoding to use.

If you do not specify the encoding, then Structural attempts to determine the encoding automatically. If Structural cannot identify the encoding, then the default encoding is UTF-8.

Delimiters and special characters

The following configurations identify the delimiter and special characters for the file group.

  • Column Delimiter - The file delimiter. The default is a comma.

  • Escape Character - The character that is used to escape characters. The default is a double quote.

  • Quoting Character - The character that is used to quote text. The default is the double quote.

  • Null Character - How null values are indicated. The default is \N.

Options to skip first and last rows

The following options allow you to omit rows from the beginning or end of the file. By default, Structural does not omit any rows.

  • Skip First N Rows - The number of rows to omit from the beginning of the file.

  • Skip Last N Rows - The number of rows to omit from the end of the file.

For CSV content, .

For files that contain CSV content, the preview also can help you to check that the are correct. If the settings are correct, the preview should show a readable table with the correct columns.

configure the delimiters and file settings
CSV delimiter settings
supported file and content type
File Groups view
File type, file count, and prefix pattern count columns for a cloud storage workspace
File group details for a local files workspace
File group details for a cloud storage workspace
Options column in the file group list
File preview for a newly added file
Actions menu on file group details
Options column in the file group list
File group file selection from local files
Flie group file selection from cloud storage
File extensions filter for a cloud storage file group
Prefix patterns for a cloud storage file group
Delimiter and file settings for a CSV file group
Encoding configuration for a CSV file group