Managing file groups in a file connector workspace

Required workspace permission: Manage file connector file groups

To identify the source files to transform, you create file groups. A file group is a set of source files that have an identical format and structure.

For local files, you always select each file individually.

For cloud storage (Amazon S3 or Google Cloud Storage), you can select individual files and folders. You can also filter by file extension and automatically select files based on prefix patterns.

File group requirements and restrictions

Content and file types

A file group can contain files that contain CSV, XML, JSON, or Parquet content, including:

  • .csv, .tsv, .xml, .json, and .parquet files.

  • .txt files that contain CSV, XML, or JSON content.

  • .gzip files that contain compressed CSV, XML, or JSON content. .gzip files are only supported in workspaces that use files from cloud storage. They are not supported in workspaces that use local files.

  • For Parquet files:

    • The files must use plain encoding.

    • The files must be uncompressed. For example, you cannot select a .snappy.parquet file.

    • You cannot select the following Parquet file types:

      • HalfFloat

      • Struct

      • Union

      • Dictionary

      • Map

      • List

      • FixedSizeList

      • Arrays of any type

The file connector can read files that are ASCII encoded.

File group restrictions

Within a file group:

  • All files must use the same format. For example, you cannot have both CSV and XML content in the same file group.

  • All files must have the same structure. For example, for a file group that contains CSV files, the content in all of the files must contain the same columns and use the same delimiters.

  • You can combine .txt and .gzip files with other files, as long as the file content in all of the files has the same format and structure. For example, a file group can contain a .txt file, a .csv file, and a .gzip file, as long they all contain CSV content that has the same structure.

After you add the first file to the file group, Tonic Structural does not allow you to select files that do not have the same format.

Viewing the list of file groups

On the workspace management view for a file connector workspace, to display the list of file groups, click File Groups.

For each file group, the file group list includes:

  • The name of the file group.

  • The type of file that the file group contains.

  • The number of files in the file group.

  • For cloud storage, the number of prefix patterns.

  • When the file group was most recently modified.

To filter the file group list, begin typing the file group name. As you type, the list is filtered to only include the matching file groups.

Displaying details for a file group

On File Groups view, to display the details for a file group, click the file group name.

The file group details view includes:

  • The type of content in the files.

  • For cloud storage files, any file type filters.

  • A list of files that were manually selected. For local file workspaces, you select all files individually. For cloud storage workspaces, you can select files individually or use prefix filters. The Individual Files tab lists the individually selected files.

  • For cloud storage files, a Prefix Patterns tab that contains the list of prefix patterns.

To filter the file or prefix path list, in the filter field, begin to type the name. As you type, the list is updated to only include matching items.

Creating, editing, and deleting file groups

Creating a file group

On File Groups view, to create and populate a new file group:

  1. Click Create group.

  2. Under File group name, enter a name for the file group.

  3. Click Save.

Adding files to a file group

You can add files to an existing file group. The added files must have the same format and structure as the files that are already in the group.

To add files to a file group:

  1. Either:

    • On the file group details view, click Add Files.

    • On File Groups view, click the + icon for the file group.

  1. Click Save.

Previewing added files

When you select an individual file to add, Structural automatically displays a preview of the file content. You can use the preview to verify the file content.

For files that contain CSV content, the preview also can help you to check that the CSV delimiter settings are correct. If the settings are correct, the preview should show a readable table with the correct columns.

Note that for Parquet files, there is no file preview.

To hide the preview, click Hide Preview. To restore the preview, click Show Preview.

For local file file groups, you cannot see a preview for existing files that you added previously. For cloud storage file groups, you can preview existing files.

Deleting files from a file group

For a local files file group, you can delete any file from the group.

For a cloud storage file group, you can only delete files that you selected manually. You cannot delete files that were added by a prefix pattern. You can only change or remove the prefix pattern.

To remove a file from a file group, on the file group details view, click the delete icon for the file.

To remove multiple files:

  1. Check the checkbox for each file to remove.

  2. Click Actions, then click Delete Files.

For local files, when you delete the file from the file group, Structural also deletes the file from the Structural application database.

Deleting a file group

To delete a file group, on the File Groups view, click the delete icon for the file group.

For local files, when you delete a file group, Structural also deletes the file group's files from the Structural application database.

Selecting local files

For local files, you always select the individual files to include in the file group.

To add files to the file group from a local file system, either:

  • Drag and drop files from your file system to the add files panel.

  • Click Select files to upload, then navigate to and select the files.

To remove a selected file, click its delete icon.

Selecting cloud storage files

For a cloud storage (Amazon S3 or GCS) file group, you can select individual files and folders. When you select a folder, the file group automatically includes the files in that folder.

You can filter the available files based on file type. You can also automatically select files and folders based on their fully qualified path.

Finally, you can configure whether the data generation process only transforms files that were added since the previous data generation.

Navigating to and selecting files and folders

Under Select folders and files to add to file group, navigate to the folder in Amazon S3 or GCS where the file is located. You can only view and select from buckets and files that the associated IAM user (for Amazon S3) or Google Cloud Platform credentials (for GCS) is granted access to. For more information about the required permissions, go to Before you create a file connector workspace. You can use the search field to search for a particular bucket, folder, or file. For the bucket list, you can search based on any text in the bucket name. For a folder or file, the search text is matched against the beginning of the folder or file name. The file browser can only display a limited number of items. Structural warns you when the number of items reaches the limit. You must use the file filter to locate items that are not displayed.

To select a file, click its name or the file checkbox.

To select a folder, click the folder checkbox. When you select a folder, Structural automatically adds a prefix filter for the folder.

Filtering files by file extension

You use the File types dropdown list to filter the file extensions for the files to include. You can select multiple file types, as long as files of the selected types can be compatible with each other. For example, you cannot filter the files to include both .json and .csv files.

By default, the selected file type is All file types, and the files are not filtered by file extension.

To add a file extension filter:

  1. Click the File extensions dropdown list.

  2. Click each file type to include.

Using prefix patterns for automatic file and folder selection

Prefix patterns allow you to automatically select files based on paths. A prefix pattern is a fully qualified path. When you select a folder, Structural automatically adds the folder as a prefix pattern.

You can specify more than one prefix pattern.

To add a prefix pattern:

  1. In the Prefix pattern field, type the path for which to include folders and files.

  2. Click the add icon.

To remove a prefix pattern, click its delete icon.

When you add a prefix pattern, Structural automatically selects the files that match both the prefix pattern and any file extension filters.

Configuring whether to only process new files

The first time you generate data from a file connector workspace, Structural processes all of the files.

For subsequent data generations, the Only process new files configuration indicates whether Structural processes all of the files in the file group, or only processes files that were added to the file group since the most recent data generation.

For example, for a folder that is in the file group, you might have a regular process that adds new files on a regular basis. In that case, you would only want Structural to process new files, and ignore all of the files that it processed before.

By default, Only process new files is toggled to the on position, and Structural only processes new files. To always process all of the files in the file group, toggle Only process new files to the off position.

Configuring delimiters and file settings for .csv files

For files that contain CSV content, you use the delimiter and file settings fields to provide information about the file structure:

  • Whether the file contains a header row. If the file contains a header row, then toggle First row is column header to the on position.

  • Quote spaces - Whether to enclose spaces in quotes in the output files.

  • Trim whitespace - Whether to trim whitespace from before or after the values when the file is uploaded.

  • Column Delimiter - The file delimiter. The default is a comma.

  • Escape Character - The character that is used to escape characters. The default is a double quote.

  • Quoting Character - The character that is used to quote text. The default is the double quote.

  • Null Character - How null values are indicated. The default is \N.

  • Skip First N Rows - The number of rows to omit from the beginning of the file.

  • Skip Last N Rows - The number of rows to omit fro the end of the file.

Tonic uses these settings to read and write the files.

After you save the file group, you cannot change most of these settings. You can change the Quote spaces setting.

Files that have a different delimiter configuration must be in a different file group.

Last updated