Writing data generation output to a container repository
Last updated
Last updated
Requires Kubernetes.
For self-hosted Docker deployments, you can install and configure a separate Kubernetes cluster to use. For more information, go to Setting up a Kubernetes cluster to use to write output data to container artifacts.
For information about required Kubernetes permissions, go to Required access to write destination data to container artifacts.
Not compatible with upsert.
Not compatible with Preserve Destination or Incremental table modes.
Only supported for PostgreSQL, MySQL, and SQL Server.
You can configure a workspace to write destination data to a container repository instead of a database server.
When it writes data generation output to a repository, Structural writes the destination data to a container volume. From the list of container artifacts, you can copy the volume digest, and download a Docker Compose file that provides connection settings for the database on the volume. Structural generates the Compose file when you make the request to download it. For more information about getting access to the container artifacts, go to Viewing and downloading container artifacts.
You can also use the data volume to start a Tonic Ephemeral database. However, if the data is larger than 10 GB, we recommend that you write the data to an Ephemeral user snapshot instead. For information about writing to an Ephemeral snapshot, go to Writing data generation output to a Tonic Ephemeral snapshot.
For an overview of writing destination data to container artifacts, you can also view the video tutorial.
Under Destination Settings, to indicate to write the destination data to container artifacts, click Container Repository.
For a Structural instance that is deployed on Docker, unless you set up a separate Kubernetes cluster, the Container Repository option is hidden.
You can switch between writing to a database server and writing to a container repository at any time. Structural preserves the configuration details for both options. When you run data generation, it uses the currently selected option for the workspace.
From the Database Image dropdown list, select the image to use to create the container artifacts.
Select an image version that is compatible with the version of the database that is used in the workspace.
For a MySQL workspace, you can provide a customization file that helps to ensure that the temporary destination database is configured correctly.
To provide the customization details:
Toggle Use customization to the on position.
In the text area, paste the contents of the customization file.
To provide the location where Structural publishes the container artifacts:
In the Registry field, type the path to the container registry where Structural publishes the data volume.
In the Repository Path field, provide the path within the registry where Structural publishes the data volume.
You next provide the credentials that Structural uses to read from and write to the registry.
When you provide the registry, Structural detects whether the registry is from Amazon Elastic Container Registry (Amazon ECR), Google Artifact Registry (GAR), or a different container solution.
It displays the appropriate fields based on the registry type.
For a registry other than an Amazon ECR or a GAR registry, the credentials can be either a username and access token, or a secret.
The option to use a secret is not available on Structural Cloud.
In general, the credentials must be for a user that has read and write permissions for the registry.
The secret is the name of a Kubernetes secret that lives on the pod that the Structural worker runs on. The secret type must be kubernetes.io/dockerconfigjson
. The Kubernetes documentation provides information on how to create a registry credentials secret.
To use a username and access token:
Click Access token.
In the Username field, provide the username.
In the Access Token field, provide the access token.
To use a secret:
Click Secret name.
In the Secret Name field, provide the name of the secret.
For ACR, the provided credentials must be for a service principal that has sufficient permissions on the registry.
For Structural, the service principal must at least have the permissions that are associated with the AcrPush role.
Structural only supports Google Artifact Registry (GAR). It does not support Google Container Registry (GCR).
For a GAR registry, you upload a service account file, which is a JSON file that contains credentials that provide access to Google Cloud Platform (GCP).
The associated service account must have the Artifact Registry Writer role.
For Service Account File, to search for and select the file, click Browse.
For an Amazon ECR registry, you can either:
Provide the AWS access and secret key that is associated with the IAM user that will connect to the registry
Provide an assumed role
(Self-hosted only) Use the credentials configured in the Structural environment settings TONIC_AWS_ACCESS_KEY_ID
and TONIC_AWS_SECRET_ACCESS_KEY
.
(Self-hosted only) If Structural is deployed in Amazon Elastic Kubernetes Service (Amazon EKS), then you can use the AWS credentials that live on the EC2 instance.
To provide an AWS access key and secret key:
Click Access Keys.
In the Access Key field, enter an AWS access key that is associated with an IAM user or role.
In the Secret Key field, enter the secret key that is associated with the access key.
To provide an assumed role:
Click Assume Role.
In the Role ARN field, provide the Amazon Resource Name (ARN) for the role.
In the Session Name field, provide the role session name.
If you do not provide a session name, then Structural automatically generates a default unique value. The generated value begins with TonicStructural
.
In the Duration (in seconds) field, provide the maximum length in seconds of the session.
The default is 3600
, indicating that the session can be active for up to 1 hour.
The provided value must be less than the maximum session duration that is allowed for the role.
For the assumed role, Structural generates the external ID that is used in the assume role request. Your role’s trust policy must be configured to condition on your unique external ID.
Here is an example trust policy:
On a self-hosted instance, to use the credentials configured in the environment settings, click Environment Variables.
On a self-hosted instance, to use the AWS credentials from the EC2 instance, click Instance Profile.
The IAM user must have permission to list, push, and pull images from the registry. The following example policy includes the required permissions.
For additional security, a repository name filter allows you to limit access to only the repositories that are used in Structural. You need to make sure that the repositories that you create for Structural match the filter.
For example, you could prefix Structural repository names with tonic-
. In the policy, you include a filter based on the tonic-
prefix:
In the Tags field, provide the tag values to apply to the container artifacts. You can also change the tag configuration for individual data generation jobs.
Use commas to separate the tags.
A tag cannot contain spaces. Structural provides the following built-in values for you to use in tags:
{workspaceId}
- The identifier of the workspace.
{workspaceName}
- The name of the workspace.
{timestamp}
- The timestamp when the data generation job that created the artifact completed.
{jobId}
- The identifier of the data generation job that created the artifact.
For example, the following creates a tag that contains the workspace name, job identifier, and timestamp:
{workspaceName}_{jobId}_{timestamp}
To also tag the artifacts as latest, check the Tag as "latest" in your repository checkbox.
You can also optionally configure custom resource values for the Kubernetes pods. You can specify the ephemeral storage, memory, and CPU millicores.
To provide custom resources:
Toggle Set custom pod resources to the on position.
Under Storage Size:
In the field, provide the number of megabytes or gigabytes of storage.
From the dropdown list, select the unit to use.
The storage can be between 32MB and 25GB.
Under Memory Size:
In the field, provide the number of megabytes or gigabytes of RAM.
From the dropdown list, select the unit to use.
The memory can be between 512MB and 4 GB.
Under Processor Size:
In the field, provide the number of millicores.
From the dropdown list, select the unit.
The processor size can be between 250m and 1000m.
Only available for PostgreSQL and SQL Server. Not available for MySQL.
In the Custom Database Name field, provide the name to use for the destination database.
If you do not provide a custom database name, then the destination database uses the same name as the source database.
In the Custom Password field, provide the password for the destination database user.
If you do not provide a password, then Structural generates a password.
If your Kubernetes nodes are configured with taints, then on a self-hosted instance, you can configure the tolerations that enable the datapacker pods to be scheduled on the nodes. The datapacker pod hosts the temporary database that Structural uses during the data generation.
For an overview of taints and tolerations, go to the Kubernetes documentation.
To configure the tolerations, you configure the following environment settings. You can add these settings to the Environment Settings list on Structural Settings.
CONTAINERIZATION_POD_NODE_TOLERATION_KEY
- The toleration key value to apply to the datapacker pods. This setting is required. If you do not configure this setting, then Structural ignores the other settings.
CONTAINERIZATION_POD_NODE_TOLERATION_VALUES
- A comma-separated list of toleration values to apply to the datapacker pods.
CONTAINERIZATION_POD_NODE_TOLERATION_EFFECT
- The toleration effect to apply to the datapacker pods.
CONTAINERIZATION_POD_NODE_TOLERATION_OPERATOR
- The toleration operator to apply to the datapacker pods.