1 of 8

File connector

The Tonic Structural file connector allows you to use files as the source data, and to write transformed versions of the files as destination data.

You select files of the supported types from either:

An S3 bucket
A MinIO object store
Google Cloud Storage (GCS)
A local file system

You can also view a video tutorial that provides an overview of the file connector and how to manage file groups.

Overview of the file connector process

The file connector uses text files for the source and destination data.

Using files from cloud storage

The recommended option is to have the files in cloud storage. The files can be in either:

Amazon S3
MinIO
Google Cloud Storage (GCS)

Within a file connector workspace, you create file groups. A file group is a set of files that have the same format and structure. The file group points to the files in the cloud storage location.

Tonic Structural treats each file group as a table. Within the file groups, you configure the generators to apply to the columns.

When you run data generation, Structural writes the output files to the configured destination file storage location for the workspace.

In the destination location, each source file is represented by a corresponding destination file that has the same name.

Local files

You can also use files from a local file system.

For a file connector workspace that uses local files, when you add files to a file group, Structural encrypts the selected files. It then stores the files in the large file store of the Structural application database.

When you remove a file from a file group, remove an entire file group, or delete the workspace, Structural also removes those stored files.

When you run data generation, Structural also writes the output files to the large file store of the Structural application database. You can then download the most recently generated files.

Supported file and content types

A file group can contain files that contain CSV, XML, JSON, Parquet, or Avro content.

The file connector can read files that are ASCII encoded.

Allowed file types

Files can include the following types:

.csv, .tsv, .xml, .json, .parquet, and .avro files.
.txt files that contain CSV, XML, or JSON content.
.gzip files that contain compressed CSV, XML, or JSON content. .gzip files are only supported in workspaces that use files from cloud storage. They are not supported in workspaces that use local files.

Parquet file limitations

For Parquet files:

The files must use plain encoding.
The files must be uncompressed. For example, you cannot select a .snappy.parquet file.
You cannot select files with the following Parquet data types:
- HalfFloat
- Struct
- Union
- Dictionary
- Map
- List
- FixedSizeList
- Arrays of any type

Avro file limitations

For Avro files, you cannot select files with the following Avro data types:

Map
Record

Structural differences and limitations with the file connector

Required license: Professional or Enterprise

No workspace inheritance

File connector workspaces do not support workspace inheritance.

Table mode limitations

In a file connector workspace, you can only use the following table modes:

De-Identify
Truncate

When a file group is assigned Truncate mode, Tonic Structural ignores that file group during data generation.

Generator limitations

The Conditional generator can only be used in a file group that contains CSV files.

Otherwise, the available generators are based on the assigned data type:

For CSV data, each column is assigned the text data type.
For JSON, the table contains a single column that is assigned the json data type.
For XML and HTML, the table contains a single column that is assigned the xml data type.

No subsetting

File connector workspaces do not support subsetting.

No upsert

File connector workspaces do not support upsert.

No output to a container repository

For file connector workspaces, you cannot write the destination data to a container repository.

No output to an Ephemeral snapshot

For file connector workspaces, you cannot write the destination data to an Ephemeral snapshot.

No post-job scripts

For file connector workspaces, there is no option to run post-job scripts after a job.

You can create webhooks that are triggered by data generation jobs.

Before you create a file connector workspace

For a file connector workspace that reads files from and writes files to cloud storage, make sure to set up the appropriate permissions so that Tonic Structural can locate the source files and write the destination files.

You can also set up permissions to protect buckets that contain files that you do not want used in a workspace.

If you have a custom gateway endpoint configured for Amazon S3, then you must identify that endpoint to Structural.

You can also enable MinIO instead of Amazon S3 as a source of file connector files.

Amazon S3 configuration

Amazon S3 permissions

On Structural Cloud, in the workspace configuration, you must configure either an assumed role or AWS credentials.

On a self-hosted instance, you can also have Structural get the credentials from the environment. Structural uses either:

The credentials set in the following environment settings:
- TONIC_AWS_ACCESS_KEY_ID - An AWS access key that is associated with an IAM user or role.
- TONIC_AWS_SECRET_ACCESS_KEY - The secret key that is associated with the access key.
- TONIC_AWS_REGION - The AWS Region to send the authentication request to.
The credentials for the IAM role on the host machine.

The policy that is associated with your IAM role or IAM user must have the following permissions:

{
    "Version": "2012-10-17",
    "Statement": [
        ## Lists all buckets that are tied to given AWS account.
        ## Allows Structural to view the list of buckets during file group creation.
        ## If not granted, then users must type bucket names manually.
        {
            "Effect": "Allow",
            "Action": [
                "s3:ListAllMyBuckets"
            ],
            "Resource": "*"
        },
        ## Allows Structural to list all objects within a specified bucket.
        ## Needed to list files during file group creation.
        ## Use Resource to restrict the displayed buckets to a specific prefix 
        ## or bucket name.
        ## Use Condition to restrict the listed files to a specific prefix.
        {
            "Effect": "Allow",
            "Action": [
                "s3:ListBucket"
            ],
            "Resource": [
                "arn:aws:s3:::*"
            ],
            "Condition": {
                 "StringLike": {
                     "s3:prefix": "*"
                 }
             }
        },
        ## File level permissions for reading and writing.
        ## Can use Resource to restrict access based on bucket name or prefix.
        ## For example: arn:aws:s3:::bucket-prefix-*/object/prefix/*
        {
            "Effect": "Allow",
            "Action": [
                "s3:PutObject",
                "s3:GetObject",
                "s3:GetObjectVersion",
                "s3:DeleteObject"
            ],
            "Resource": "arn:aws:s3:::*/*"
        }
    ]
}

If the source and destination S3 buckets are in different accounts, or are in an account that is different from the account or instance profile that Structural uses, then the configuration must include cross-account permissions. For assistance with this, contact support@tonic.ai.

Providing your custom Amazon S3 gateway endpoint

If you configured a custom gateway endpoint from Amazon S3, then you set that endpoint as the value of the TONIC_AWS_S3_OVERRIDE_URL environment setting.

When you configure a custom URL, then you can also configure Structural to trust the server certificate. To do this, set TONIC_AWS_S3_TRUST_SERVER_CERT to true.

You can add these settings to the Environment Settings list on Structural Settings.

Google Cloud Storage permissions

To get access to Google Cloud Storage, Structural uses the Google Cloud Platform credentials that you provide in the workspace configuration.

The service account that you specify must have the following permissions.

storage.buckets.list - This allows Structural to see the list of buckets when it creates a file group. If the service account does not have this permission, then on the file group creation panel, users must type the name of the bucket where a file is located.
For buckets that contain source files, the following permissions allow Structural to get the list of files within buckets, and to retrieve the actual files and file content.
- storage.objects.get
- storage.objects.list
If the permissions are assigned globally, then Structural can list and retrieve files from any bucket. If the permissions are assigned to individual buckets, the file group creation view displays a list of all buckets. However, if you select a bucket for which the service account does not have the permissions, Structural returns an error.
For buckets that contain destination files, the following permissions allow Structural to see and get access to the bucket content and to create the generated files. This includes deleting and overwriting existing files that are regenerated.
- storage.buckets.get
- storage.objects.get
- storage.objects.list
- storage.objects.create
- storage.objects.delete
If the permissions are assigned globally, then Structural can write files to any bucket. If the permissions are assigned to individual buckets, then Structural can only write files to those buckets.

MinIO configuration

To use MinIO as a source for file connector files, you set the TONIC_AWS_S3_OVERRIDE_URL environment setting to your MinIO endpoint.

When you set a MinIO endpoint URL, you can also configure Structural to trust the server certificate. To do this, set the TONIC_AWS_S3_TRUST_SERVER_CERT environment setting to true.

You can add these settings to the Environment Settings list on Structural Settings.

Note that if you configure TONIC_AWS_S3_OVERRIDE_URL to point to a MinIO endpoint, then you cannot create a file connector workspace that connects to Amazon S3.

Configuring the file connector storage type and output options

On the workspace details view for a file connector workspace, you:

Identify the type of storage. After you add a file group to the workspace, you cannot change the storage type.
Indicate where to write the transformed files.
If needed, provide credentials to access the cloud storage.

Identifying the type of storage

On the workspace creation view:

Under Connection Type, under File/Blob Storage, click Files.
Select the type of file storage where the source files are located.
1. To choose files from Amazon S3, click Amazon S3.
2. To choose files from MinIO, make sure that the TONIC_AWS_S3_OVERRIDE_URL points to your MinIO endpoint, then click Amazon S3.
3. To choose files from GCS, click Google Cloud Storage.
4. To upload files from a local file system, click Local Filesystem.
After you add a file group to the workspace, you cannot change the storage type.

Selecting the cloud storage folder to write the transformed files to

For cloud storage workspaces, under Output location, provide the path to the folder where Structural writes the transformed files.

Providing credentials to access AWS

For a file connector workspace that writes files to Amazon S3, under AWS Credentials, you configure how Structural obtains the credentials to connect to Amazon S3.

Selecting the type of credentials to use

Under AWS Credentials, click the type of credentials to use. The options are:

Environment - Only available on self-hosted instances. Indicates to use either:
- The credentials for the IAM role on the host machine.
- - TONIC_AWS_ACCESS_KEY_ID - An AWS access key that is associated with an IAM user or role.
  - TONIC_AWS_SECRET_ACCESS_KEY - The secret key that is associated with the access key.
  - TONIC_AWS_REGION - The AWS Region to send the authentication request to.
Assumed role - Indicates to use the specified assumed role.
User credentials - Indicates to use the provided user credentials.

Providing an assumed role

To provide an assumed role, click Assume role, then:

In the Role ARN field, provide the Amazon Resource Name (ARN) for the role.
In the Session Name field, provide the role session name. If you do not provide a session name, then Structural automatically generates a default unique value. The generated value begins with TonicStructural.
In the Duration (in seconds) field, provide the maximum length in seconds of the session. The default is 3600, indicating that the session can be active for up to 1 hour. The provided value must be less than the maximum session duration that is allowed for the role.
By default, Structural uses the same assumed role to both retrieve the source files and write the output files. To provide a different assumed role for the output location:
1. Toggle Set different credentials for output to the on position.
2. In the Role ARN field, provide the ARN for the role.
3. In the Session Name field, provide the role session name. If you do not provide a session name, then Structural automatically generates a default unique value. The generated value begins with TonicStructural.
4. In the Duration (in seconds) field, provide the maximum length in seconds of the session. The default is 3600, indicating that the session can be active for up to 1 hour. The provided value must be less than the maximum session duration that is allowed for the role.

For each assumed role, Structural generates the external ID that is used in the assume role request. Your role’s trust policy must be configured to condition on your unique external ID.

Here is an example trust policy:

{
  "Version": "2012-10-17",
  "Statement": {
    "Effect": "Allow",
    "Principal": {
      "AWS": "<originating-account-id>"
    },
    "Action": "sts:AssumeRole",
    "Condition": {
      "StringEquals": {
        "sts:ExternalId": "<external-id>"
      }
    }
  }
}

Providing the AWS credentials

To provide the credentials, under AWS Credentials:

In the AWS Access Key field, enter the AWS access key that is associated with an IAM user or role.
In the AWS Secret Key field, enter the secret key that is associated with the access key.
From the AWS Region dropdown list, select the AWS Region to send the authentication request to.
By default, Structural uses the same AWS credentials to both retrieve the source files and write the output files. To provide different AWS credentials for the output location:
1. Toggle Set different credentials for output to the on position.
2. In the AWS Access Key field, enter the AWS access key that is associated with an IAM user or role.
3. In the AWS Secret Key field, enter the secret key that is associated with the access key.
4. From the AWS Region dropdown list, select the AWS Region to send the authentication request to.
In the AWS Session Token field, you can optionally provide a session token for a temporary set of credentials. You can provide a session token regardless of whether you use the same or different credentials for the source and output.

Providing credentials to access Google Cloud Storage

To write files to a folder in Google Cloud Storage, you must provide Google Cloud Platform credentials in the workspace configuration.

Under GCP Credentials:

For Service Account File, select the service account file (JSON file) for the source files.
In the GCP Project ID field, provide the identifier of the project that contains the source files.

Providing credentials to access MinIO

Under AWS credentials, you provide the MinIO credentials. The MinIO credentials consist of an access key and a secret key.

To provide the credentials, you can either:

- TONIC_AWS_ACCESS_KEY_ID - A MinIO access key
- TONIC_AWS_SECRET_ACCESS_KEY - The secret key that is associated with the access key
Provide the access key and secret key manually

To use the credentials from the environment settings, under AWS Credentials, click Environment.

To provide the credentials manually:

Under AWS Credentials, click User credentials.
In the AWS Access Key field, enter the MinIO access key.
In the AWS Secret Key field, enter the secret key that is associated with the access key.
By default, Structural uses the same credentials to both retrieve the source files and write the output files. To provide different MinIO credentials for the output location:
1. Toggle Set different credentials for output to the on position.
2. In the AWS Access Key field, enter the MinIO access key.
3. In the AWS Secret Key field, enter the secret key that is associated with the access key.

Managing file groups in a file connector workspace

Required workspace permission: Manage file connector file groups

To identify the source files to transform, you create file groups. A file group is a set of source files that have an identical format and structure.

For local files, you always select each file individually.

For cloud storage, you can select individual files and folders. You can also filter by file extension and automatically select files based on prefix patterns.

File group restrictions

The selected files must use a .

Within a file group:

All files must use the same format. For example, you cannot have both CSV and XML content in the same file group.
All files must have the same structure. For example, for a file group that contains CSV files, the content in all of the files must contain the same columns and use the same delimiters.
You can combine .txt and .gzip files with other files, as long as the file content in all of the files has the same format and structure. For example, a file group can contain a .txt file, a .csv file, and a .gzip file, as long they all contain CSV content that has the same structure.

After you add the first file to the file group, Tonic Structural does not allow you to select files that do not have the same format.

Viewing the list of file groups

On the workspace management view for a file connector workspace, to display the list of file groups, click File Groups.

For each file group, the file group list includes:

The name of the file group.
The type of file that the file group contains.
The number of files in the file group.
For cloud storage, the number of prefix patterns.

When the file group was most recently modified.

To filter the file group list, begin to type the file group name. As you type, the list is filtered to only include the matching file groups.

Displaying details for a file group

On File Groups view, to display the details for a file group, click the file group name.

The file group details view includes:

The type of content in the files.
For cloud storage files, any file type filters.
A list of files that were manually selected. For local file workspaces, you select all files individually. For cloud storage workspaces, you can select files individually or use prefix filters. The Individual Files tab lists the individually selected files.
For cloud storage files, a Prefix Patterns tab that contains the list of prefix patterns.

To filter the file or prefix path list, in the filter field, begin to type the name. As you type, the list is updated to only include matching items.

Creating, editing, and deleting file groups

Creating a file group

On File Groups view, to create and populate a new file group:

Click Create group.
Under File group name, enter a name for the file group.
Select the file group files:
Click Save.

Adding files to a file group

You can add files to an existing file group. The added files must have the same format and structure as the files that are already in the group.

To add files to a file group:

Either:
- On the file group details view, click Add Files.
- On File Groups view, click the + icon for the file group.

Change the file selection:
Click Save.

Previewing added files

When you select an individual file to add, Structural automatically displays a preview of the file content. You can use the preview to verify the file content.

Note that for Parquet files, there is no file preview.

To hide the preview, click Hide Preview. To restore the preview, click Show Preview.

For local file file groups, you cannot see a preview for existing files that you added previously. For cloud storage file groups, you can preview existing files.

Deleting files from a file group

For a local files file group, you can delete any file from the group.

For a cloud storage file group, you can only delete files that you selected manually. You cannot delete files that were added by a prefix pattern. You can only change or remove the prefix pattern.

To remove a file from a file group, on the file group details view, click the delete icon for the file.

To remove multiple files:

Check the checkbox for each file to remove.
Click Actions, then click Delete Files.

For local files, when you delete the file from the file group, Structural also deletes the file from the Structural application database.

Deleting a file group

To delete a file group, on the File Groups view, click the delete icon for the file group.

For local files, when you delete a file group, Structural also deletes the file group's files from the Structural application database.

Selecting local files

For local files, you always select the individual files to include in the file group.

To add files to the file group from a local file system, either:

Drag and drop files from your file system to the add files panel.
Click Select files to upload, then navigate to and select the files.

To remove a selected file, click its delete icon.

Selecting cloud storage files

For a cloud storage file group, you can select individual files and folders. When you select a folder, the file group automatically includes the files in that folder.

You can filter the available files based on file type. You can also automatically select files and folders based on their fully qualified path.

Finally, you can configure whether the data generation process only transforms files that were added since the previous data generation.

Navigating to and selecting files and folders

To select a file, click its name or the file checkbox.

To select a folder, click the folder checkbox. When you select a folder, Structural automatically adds a prefix filter for the folder.

Filtering files by file extension

You use the File types dropdown list to filter the file extensions for the files to include. You can select multiple file types, as long as files of the selected types can be compatible with each other. For example, you cannot filter the files to include both .json and .csv files.

By default, the selected file type is All file types, and the files are not filtered by file extension.

To add a file extension filter:

Click the File extensions dropdown list.
Click each file type to include.

Using prefix patterns for automatic file and folder selection

Prefix patterns allow you to automatically select files based on paths. A prefix pattern is a fully qualified path. When you select a folder, Structural automatically adds the folder as a prefix pattern.

You can specify more than one prefix pattern.

To add a prefix pattern:

In the Prefix pattern field, type the path for which to include folders and files.
Click the add icon.

To remove a prefix pattern, click its delete icon.

When you add a prefix pattern, Structural automatically selects the files that match both the prefix pattern and any file extension filters.

Configuring whether to only process new files

The first time you generate data from a file connector workspace, Structural processes all of the files.

For subsequent data generations, the Only process new files configuration indicates whether Structural processes all of the files in the file group, or only processes files that were added to the file group since the most recent data generation.

For example, for a folder that is in the file group, you might have a regular process that adds new files on a regular basis. In that case, you would only want Structural to process new files, and ignore all of the files that it processed before.

By default, Only process new files is toggled to the on position, and Structural only processes new files. To always process all of the files in the file group, toggle Only process new files to the off position.

Configuring delimiters and file settings for .csv files

For files that contain CSV content, you use the delimiter and file settings fields to provide information about the file structure:

Structural uses these settings to read and write the files.

After you save the file group, you cannot change most of these settings. You can change the Quote spaces setting.

Files that have a different delimiter configuration must be in a different file group.

File header row

If the file contains a header row, then toggle First row is column header to the on position.

Managing spaces

To configure how to process spaces:

The Quote spaces setting indicates whether to enclose spaces in quotes in the output files. You can change this setting after you save the file group.
The Trim whitespace setting indicates whether to trim whitespace from before or after the values when the file is uploaded.

Encoding

To specify the type of encoding that the file uses:

Toggle Specify encoding to the on position.
From the dropdown list, select the type of encoding to use.

If you do not specify the encoding, then Structural attempts to determine the encoding automatically. If Structural cannot identify the encoding, then the default encoding is UTF-8.

Delimiters and special characters

The following configurations identify the delimiter and special characters for the file group.

Column Delimiter - The file delimiter. The default is a comma.
Escape Character - The character that is used to escape characters. The default is a double quote.
Quoting Character - The character that is used to quote text. The default is the double quote.
Null Character - How null values are indicated. The default is \N.

Options to skip first and last rows

The following options allow you to omit rows from the beginning or end of the file. By default, Structural does not omit any rows.

Skip First N Rows - The number of rows to omit from the beginning of the file.
Skip Last N Rows - The number of rows to omit from the end of the file.

Downloading generated file connector files

For a file group that is based on files from a local file system, you can download the most recently generated files.

The download is a .zip file that contains the generated files.

Downloading all of the most recently generated files

To download all of the most recently generated files for a file group, either:

On the File Groups view, click the download icon for the file group.
On the file group details view, to download all of the generated files, click Export Files.

Downloading selected generated files

To download the most recent generated files for specific source files:

On the file group details view, check the checkbox for each file to include.
Click Actions, then click Export Files.

Downloading files from a specific data generation job

You also can download generated files for a specific data generation job.

On the job details view, when files are available to download, the Data available for file groups panel displays.

To download the files for that job for a selected file group:

Click Download Results.
From the list, select the file group. Use the filter field to filter the list by the file group name.

Before you create a file connector workspace

You can also set up permissions to protect buckets that contain files that you do not want used in a workspace.

If you have a custom gateway endpoint configured for Amazon S3, then you must identify that endpoint to Structural.

You can also enable MinIO instead of Amazon S3 as a source of file connector files.

Amazon S3 configuration

Amazon S3 permissions

On Structural Cloud, in the workspace configuration, you must configure either an assumed role or AWS credentials.

On a self-hosted instance, you can also have Structural get the credentials from the environment. Structural uses either:

The credentials set in the following environment settings:
- TONIC_AWS_ACCESS_KEY_ID - An AWS access key that is associated with an IAM user or role.
- TONIC_AWS_SECRET_ACCESS_KEY - The secret key that is associated with the access key.
- TONIC_AWS_REGION - The AWS Region to send the authentication request to.
The credentials for the IAM role on the host machine.

The policy that is associated with your IAM role or IAM user must have the following permissions:

{
    "Version": "2012-10-17",
    "Statement": [
        ## Lists all buckets that are tied to given AWS account.
        ## Allows Structural to view the list of buckets during file group creation.
        ## If not granted, then users must type bucket names manually.
        {
            "Effect": "Allow",
            "Action": [
                "s3:ListAllMyBuckets"
            ],
            "Resource": "*"
        },
        ## Allows Structural to list all objects within a specified bucket.
        ## Needed to list files during file group creation.
        ## Use Resource to restrict the displayed buckets to a specific prefix 
        ## or bucket name.
        ## Use Condition to restrict the listed files to a specific prefix.
        {
            "Effect": "Allow",
            "Action": [
                "s3:ListBucket"
            ],
            "Resource": [
                "arn:aws:s3:::*"
            ],
            "Condition": {
                 "StringLike": {
                     "s3:prefix": "*"
                 }
             }
        },
        ## File level permissions for reading and writing.
        ## Can use Resource to restrict access based on bucket name or prefix.
        ## For example: arn:aws:s3:::bucket-prefix-*/object/prefix/*
        {
            "Effect": "Allow",
            "Action": [
                "s3:PutObject",
                "s3:GetObject",
                "s3:GetObjectVersion",
                "s3:DeleteObject"
            ],
            "Resource": "arn:aws:s3:::*/*"
        }
    ]
}

Providing your custom Amazon S3 gateway endpoint

If you configured a custom gateway endpoint from Amazon S3, then you set that endpoint as the value of the TONIC_AWS_S3_OVERRIDE_URL environment setting.

When you configure a custom URL, then you can also configure Structural to trust the server certificate. To do this, set TONIC_AWS_S3_TRUST_SERVER_CERT to true.

You can add these settings to the Environment Settings list on Structural Settings.

Google Cloud Storage permissions

To get access to Google Cloud Storage, Structural uses the Google Cloud Platform credentials that you provide in the workspace configuration.

The service account that you specify must have the following permissions.

storage.buckets.list - This allows Structural to see the list of buckets when it creates a file group. If the service account does not have this permission, then on the file group creation panel, users must type the name of the bucket where a file is located.
For buckets that contain source files, the following permissions allow Structural to get the list of files within buckets, and to retrieve the actual files and file content.
- storage.objects.get
- storage.objects.list
If the permissions are assigned globally, then Structural can list and retrieve files from any bucket. If the permissions are assigned to individual buckets, the file group creation view displays a list of all buckets. However, if you select a bucket for which the service account does not have the permissions, Structural returns an error.
For buckets that contain destination files, the following permissions allow Structural to see and get access to the bucket content and to create the generated files. This includes deleting and overwriting existing files that are regenerated.
- storage.buckets.get
- storage.objects.get
- storage.objects.list
- storage.objects.create
- storage.objects.delete
If the permissions are assigned globally, then Structural can write files to any bucket. If the permissions are assigned to individual buckets, then Structural can only write files to those buckets.

MinIO configuration

To use MinIO as a source for file connector files, you set the TONIC_AWS_S3_OVERRIDE_URL environment setting to your MinIO endpoint.

When you set a MinIO endpoint URL, you can also configure Structural to trust the server certificate. To do this, set the TONIC_AWS_S3_TRUST_SERVER_CERT environment setting to true.

You can add these settings to the Environment Settings list on Structural Settings.

Note that if you configure TONIC_AWS_S3_OVERRIDE_URL to point to a MinIO endpoint, then you cannot create a file connector workspace that connects to Amazon S3.

Configuring the file connector storage type and output options

On the workspace details view for a file connector workspace, you:

Identify the type of storage. After you add a file group to the workspace, you cannot change the storage type.
Indicate where to write the transformed files.
If needed, provide credentials to access the cloud storage.

Identifying the type of storage

On the workspace creation view:

Under Connection Type, under File/Blob Storage, click Files.
Select the type of file storage where the source files are located.
1. To choose files from Amazon S3, click Amazon S3.
2. To choose files from MinIO, make sure that the TONIC_AWS_S3_OVERRIDE_URL points to your MinIO endpoint, then click Amazon S3.
3. To choose files from GCS, click Google Cloud Storage.
4. To upload files from a local file system, click Local Filesystem.
After you add a file group to the workspace, you cannot change the storage type.

Selecting the cloud storage folder to write the transformed files to

For cloud storage workspaces, under Output location, provide the path to the folder where Structural writes the transformed files.

When the source files come from a local file system, Tonic Structural writes the output files to the large file store in the Structural application database. You can then .

Providing credentials to access AWS

For a file connector workspace that writes files to Amazon S3, under AWS Credentials, you configure how Structural obtains the credentials to connect to Amazon S3.

Selecting the type of credentials to use

Under AWS Credentials, click the type of credentials to use. The options are:

Environment - Only available on self-hosted instances. Indicates to use either:
- The credentials for the IAM role on the host machine.
- The credentials set in the following :
  - TONIC_AWS_ACCESS_KEY_ID - An AWS access key that is associated with an IAM user or role.
  - TONIC_AWS_SECRET_ACCESS_KEY - The secret key that is associated with the access key.
  - TONIC_AWS_REGION - The AWS Region to send the authentication request to.
Assumed role - Indicates to use the specified assumed role.
User credentials - Indicates to use the provided user credentials.

Providing an assumed role

To provide an assumed role, click Assume role, then:

In the Role ARN field, provide the Amazon Resource Name (ARN) for the role.
In the Session Name field, provide the role session name. If you do not provide a session name, then Structural automatically generates a default unique value. The generated value begins with TonicStructural.
In the Duration (in seconds) field, provide the maximum length in seconds of the session. The default is 3600, indicating that the session can be active for up to 1 hour. The provided value must be less than the maximum session duration that is allowed for the role.
By default, Structural uses the same assumed role to both retrieve the source files and write the output files. To provide a different assumed role for the output location:
1. Toggle Set different credentials for output to the on position.
2. In the Role ARN field, provide the ARN for the role.
3. In the Session Name field, provide the role session name. If you do not provide a session name, then Structural automatically generates a default unique value. The generated value begins with TonicStructural.
4. In the Duration (in seconds) field, provide the maximum length in seconds of the session. The default is 3600, indicating that the session can be active for up to 1 hour. The provided value must be less than the maximum session duration that is allowed for the role.

For each assumed role, Structural generates the external ID that is used in the assume role request. Your role’s trust policy must be configured to condition on your unique external ID.

Here is an example trust policy:

{
  "Version": "2012-10-17",
  "Statement": {
    "Effect": "Allow",
    "Principal": {
      "AWS": "<originating-account-id>"
    },
    "Action": "sts:AssumeRole",
    "Condition": {
      "StringEquals": {
        "sts:ExternalId": "<external-id>"
      }
    }
  }
}

Providing the AWS credentials

To provide the credentials, under AWS Credentials:

In the AWS Access Key field, enter the AWS access key that is associated with an IAM user or role.
In the AWS Secret Key field, enter the secret key that is associated with the access key.
From the AWS Region dropdown list, select the AWS Region to send the authentication request to.
By default, Structural uses the same AWS credentials to both retrieve the source files and write the output files. To provide different AWS credentials for the output location:
1. Toggle Set different credentials for output to the on position.
2. In the AWS Access Key field, enter the AWS access key that is associated with an IAM user or role.
3. In the AWS Secret Key field, enter the secret key that is associated with the access key.
4. From the AWS Region dropdown list, select the AWS Region to send the authentication request to.
In the AWS Session Token field, you can optionally provide a session token for a temporary set of credentials. You can provide a session token regardless of whether you use the same or different credentials for the source and output.

Providing credentials to access Google Cloud Storage

To write files to a folder in Google Cloud Storage, you must provide Google Cloud Platform credentials in the workspace configuration.

Under GCP Credentials:

For Service Account File, select the service account file (JSON file) for the source files.
In the GCP Project ID field, provide the identifier of the project that contains the source files.

Providing credentials to access MinIO

When the TONIC_AWS_S3_OVERRIDE_URL points to a MinIO endpoint, then when you select Amazon S3 as the source, you create a MinIO workspace.

Under AWS credentials, you provide the MinIO credentials. The MinIO credentials consist of an access key and a secret key.

To provide the credentials, you can either:

(Self-hosted only) Use the credentials set in the following :
- TONIC_AWS_ACCESS_KEY_ID - A MinIO access key
- TONIC_AWS_SECRET_ACCESS_KEY - The secret key that is associated with the access key
Provide the access key and secret key manually

To use the credentials from the environment settings, under AWS Credentials, click Environment.

To provide the credentials manually:

Under AWS Credentials, click User credentials.
In the AWS Access Key field, enter the MinIO access key.
In the AWS Secret Key field, enter the secret key that is associated with the access key.
By default, Structural uses the same credentials to both retrieve the source files and write the output files. To provide different MinIO credentials for the output location:
1. Toggle Set different credentials for output to the on position.
2. In the AWS Access Key field, enter the MinIO access key.
3. In the AWS Secret Key field, enter the secret key that is associated with the access key.

Managing file groups in a file connector workspace

Required workspace permission: Manage file connector file groups

To identify the source files to transform, you create file groups. A file group is a set of source files that have an identical format and structure.

For local files, you always select each file individually.

For cloud storage, you can select individual files and folders. You can also filter by file extension and automatically select files based on prefix patterns.

File group restrictions

The selected files must use a .

Within a file group:

All files must use the same format. For example, you cannot have both CSV and XML content in the same file group.
All files must have the same structure. For example, for a file group that contains CSV files, the content in all of the files must contain the same columns and use the same delimiters.
You can combine .txt and .gzip files with other files, as long as the file content in all of the files has the same format and structure. For example, a file group can contain a .txt file, a .csv file, and a .gzip file, as long they all contain CSV content that has the same structure.

After you add the first file to the file group, Tonic Structural does not allow you to select files that do not have the same format.

Viewing the list of file groups

On the workspace management view for a file connector workspace, to display the list of file groups, click File Groups.

For each file group, the file group list includes:

The name of the file group.
The type of file that the file group contains.
The number of files in the file group.
For cloud storage, the number of prefix patterns.

When the file group was most recently modified.

To filter the file group list, begin to type the file group name. As you type, the list is filtered to only include the matching file groups.

Displaying details for a file group

On File Groups view, to display the details for a file group, click the file group name.

The file group details view includes:

The type of content in the files.
For cloud storage files, any file type filters.
A list of files that were manually selected. For local file workspaces, you select all files individually. For cloud storage workspaces, you can select files individually or use prefix filters. The Individual Files tab lists the individually selected files.
For cloud storage files, a Prefix Patterns tab that contains the list of prefix patterns.

To filter the file or prefix path list, in the filter field, begin to type the name. As you type, the list is updated to only include matching items.

Creating, editing, and deleting file groups

Creating a file group

On File Groups view, to create and populate a new file group:

Click Create group.
Under File group name, enter a name for the file group.
Select the file group files:
For CSV content, .
Click Save.

Adding files to a file group

You can add files to an existing file group. The added files must have the same format and structure as the files that are already in the group.

To add files to a file group:

Either:
- On the file group details view, click Add Files.
- On File Groups view, click the + icon for the file group.

Change the file selection:
Click Save.

Previewing added files

When you select an individual file to add, Structural automatically displays a preview of the file content. You can use the preview to verify the file content.

For files that contain CSV content, the preview also can help you to check that the are correct. If the settings are correct, the preview should show a readable table with the correct columns.

Note that for Parquet files, there is no file preview.

To hide the preview, click Hide Preview. To restore the preview, click Show Preview.

For local file file groups, you cannot see a preview for existing files that you added previously. For cloud storage file groups, you can preview existing files.

Deleting files from a file group

For a local files file group, you can delete any file from the group.

For a cloud storage file group, you can only delete files that you selected manually. You cannot delete files that were added by a prefix pattern. You can only change or remove the prefix pattern.

To remove a file from a file group, on the file group details view, click the delete icon for the file.

To remove multiple files:

Check the checkbox for each file to remove.
Click Actions, then click Delete Files.

For local files, when you delete the file from the file group, Structural also deletes the file from the Structural application database.

Deleting a file group

To delete a file group, on the File Groups view, click the delete icon for the file group.

For local files, when you delete a file group, Structural also deletes the file group's files from the Structural application database.

Selecting local files

For local files, you always select the individual files to include in the file group.

To add files to the file group from a local file system, either:

Drag and drop files from your file system to the add files panel.
Click Select files to upload, then navigate to and select the files.

To remove a selected file, click its delete icon.

Selecting cloud storage files

For a cloud storage file group, you can select individual files and folders. When you select a folder, the file group automatically includes the files in that folder.

You can filter the available files based on file type. You can also automatically select files and folders based on their fully qualified path.

Finally, you can configure whether the data generation process only transforms files that were added since the previous data generation.

Navigating to and selecting files and folders

Under Select folders and files to add to file group, navigate to the folder where the file is located. You can only view and select from buckets and files that the configured cloud storage credentials have access to. For more information about the required permissions, go to . You can use the search field to search for a particular bucket, folder, or file. For the bucket list, you can search based on any text in the bucket name. For a folder or file, the search text is matched against the beginning of the folder or file name. The file browser can only display a limited number of items. Structural warns you when the number of items reaches the limit. You must use the file filter to locate items that are not displayed.

To select a file, click its name or the file checkbox.

To select a folder, click the folder checkbox. When you select a folder, Structural automatically adds a prefix filter for the folder.

Filtering files by file extension

By default, the selected file type is All file types, and the files are not filtered by file extension.

To add a file extension filter:

Click the File extensions dropdown list.
Click each file type to include.

Using prefix patterns for automatic file and folder selection

You can specify more than one prefix pattern.

To add a prefix pattern:

In the Prefix pattern field, type the path for which to include folders and files.
Click the add icon.

To remove a prefix pattern, click its delete icon.

When you add a prefix pattern, Structural automatically selects the files that match both the prefix pattern and any file extension filters.

Configuring whether to only process new files

The first time you generate data from a file connector workspace, Structural processes all of the files.

Configuring delimiters and file settings for .csv files

For files that contain CSV content, you use the delimiter and file settings fields to provide information about the file structure:

Structural uses these settings to read and write the files.

After you save the file group, you cannot change most of these settings. You can change the Quote spaces setting.

Files that have a different delimiter configuration must be in a different file group.

File header row

If the file contains a header row, then toggle First row is column header to the on position.

Managing spaces

To configure how to process spaces:

The Quote spaces setting indicates whether to enclose spaces in quotes in the output files. You can change this setting after you save the file group.
The Trim whitespace setting indicates whether to trim whitespace from before or after the values when the file is uploaded.

Encoding

To specify the type of encoding that the file uses:

Toggle Specify encoding to the on position.
From the dropdown list, select the type of encoding to use.

If you do not specify the encoding, then Structural attempts to determine the encoding automatically. If Structural cannot identify the encoding, then the default encoding is UTF-8.

Delimiters and special characters

The following configurations identify the delimiter and special characters for the file group.

Column Delimiter - The file delimiter. The default is a comma.
Escape Character - The character that is used to escape characters. The default is a double quote.
Quoting Character - The character that is used to quote text. The default is the double quote.
Null Character - How null values are indicated. The default is \N.

Options to skip first and last rows

The following options allow you to omit rows from the beginning or end of the file. By default, Structural does not omit any rows.

Skip First N Rows - The number of rows to omit from the beginning of the file.
Skip Last N Rows - The number of rows to omit from the end of the file.