Writing data generation output to a container repository

Requires Kubernetes.

For self-hosted Docker deployments, you can install and configure a separate Kubernetes cluster to use. For more information, go to Setting up a Kubernetes cluster to use to write output data to container artifacts.

For information about required Kubernetes permissions, go to Required access to write destination data to container artifacts.

Not compatible with upsert.

Not compatible with Preserve Destination or Incremental table modes.

Only supported for PostgreSQL, MySQL, and SQL Server.

You can configure a workspace to write destination data to a container repository instead of a database server.

When it writes data generation output to a repository, Structural writes the destination data to a container volume. From the list of container artifacts, you can copy the volume digest, and download a Docker Compose file that provides connection settings for the database on the volume. Structural generates the Compose file when you make the request to download it. For more information about getting access to the container artifacts, go to Viewing and downloading container artifacts.

You can also use the data volume to start a Tonic Ephemeral database. However, if the data is larger than 10 GB, we recommend that you write the data to an Ephemeral user snapshot instead. For information about writing to an Ephemeral snapshot, go to Writing data generation output to a Tonic Ephemeral snapshot.

For an overview of writing destination data to container artifacts, you can also view the video tutorial.

Indicating to write destination data to container artifacts

Under Destination Settings, to indicate to write the destination data to container artifacts, click Container Repository.

For a Structural instance that is deployed on Docker, unless you set up a separate Kubernetes cluster, the Container Repository option is hidden.

You can switch between writing to a database server and writing to a container repository at any time. Structural preserves the configuration details for both options. When you run data generation, it uses the currently selected option for the workspace.

Identifying the base image to use to create the container artifacts

From the Database Image dropdown list, select the image to use to create the container artifacts.

Select an image version that is compatible with the version of the database that is used in the workspace.

Providing a customization file for MySQL

For a MySQL workspace, you can provide a customization file that helps to ensure that the temporary destination database is configured correctly.

To provide the customization details:

  1. Toggle Use customization to the on position.

  2. In the text area, paste the contents of the customization file.

Setting the location for the container artifacts

To provide the location where Structural publishes the container artifacts:

  1. In the Registry field, type the path to the container registry where Structural publishes the data volume.

  2. In the Repository Path field, provide the path within the registry where Structural publishes the data volume.

Providing the credentials to write to the registry

You next provide the credentials that Structural uses to read from and write to the registry.

When you provide the registry, Structural detects whether the registry is from Amazon Elastic Container Registry (Amazon ECR), Google Artifact Registry (GAR), or a different container solution.

It displays the appropriate fields based on the registry type.

Fields for registries other than Amazon ECR or GAR

For a registry other than an Amazon ECR or a GAR registry, the credentials can be either a username and access token, or a secret.

The option to use a secret is not available on Structural Cloud.

In general, the credentials must be for a user that has read and write permissions for the registry.

The secret is the name of a Kubernetes secret that lives on the pod that the Structural worker runs on. The secret type must be kubernetes.io/dockerconfigjson. The Kubernetes documentation provides information on how to create a registry credentials secret.

To use a username and access token:

  1. Click Access token.

  2. In the Username field, provide the username.

  3. In the Access Token field, provide the access token.

To use a secret:

  1. Click Secret name.

  2. In the Secret Name field, provide the name of the secret.

Azure Container Registry (ACR) permission requirements

For ACR, the provided credentials must be for a service principal that has sufficient permissions on the registry.

For Structural, the service principal must at least have the permissions that are associated with the AcrPush role.

Providing a service file for GAR

Structural only supports Google Artifact Registry (GAR). It does not support Google Container Registry (GCR).

For a GAR registry, you upload a service account file, which is a JSON file that contains credentials that provide access to Google Cloud Platform (GCP).

The associated service account must have the Artifact Registry Writer role.

For Service Account File, to search for and select the file, click Browse.

Amazon ECR registries

For an Amazon ECR registry, you can either:

  • Provide the AWS access and secret key that is associated with the IAM user that will connect to the registry

  • (Self-hosted only) Use the credentials configured in the Structural environment settings TONIC_AWS_ACCESS_KEY_ID and TONIC_AWS_SECRET_ACCESS_KEY.

  • (Self-hosted only) If Structural is deployed in Amazon Elastic Kubernetes Service (Amazon EKS), then you can use the AWS credentials that live on the EC2 instance.

On Structural Cloud, you must provide an AWS access key and secret key.

On a self-hosted instance, you can choose the source of the credentials. The default is Access Keys.

  • To provide an AWS access key and secret key, click Access Keys.

  • To use the credentials configured in the environment settings, click Environment Variables.

  • To use the AWS credentials from the EC2 instance, click Instance Profile.

The IAM user must have permission to list, push, and pull images from the registry. The following example policy includes the required permissions.

{
  {
    "Sid": "ManageTonicRepositoryContents",
    "Effect": "Allow",
    "Action": [
      "ecr:DescribeRepositories",
      "ecr:ListImages",
      "ecr:DescribeImages",
      "ecr:BatchGetImage",
      "ecr:BatchCheckLayerAvailability",
      "ecr:InitiateLayerUpload",
      "ecr:UploadLayerPart",
      "ecr:CompleteLayerUpload",
      "ecr:PutImage"
    ],
    "Resource": [
       "arn:aws:ecr:<region>:<account_id>:repository/<optional name filter>"
    ]
  },
  {
    "Sid": "GetAuthorizationToken",
    "Effect": "Allow",
    "Action": [
      "ecr:GetAuthorizationToken"
    ],
    "Resource": "*"
  }
}

For additional security, a repository name filter allows you to limit access to only the repositories that are used in Structural. You need to make sure that the repositories that you create for Structural match the filter.

For example, you could prefix Structural repository names with tonic-. In the policy, you include a filter based on the tonic- prefix:

"Resource": [
  "arn:aws:ecr:<region>:<account_id>:repository/tonic-*"
]

Providing tags for the container artifacts

In the Tags field, provide the tag values to apply to the container artifacts. You can also change the tag configuration for individual data generation jobs.

Use commas to separate the tags.

A tag cannot contain spaces. Structural provides the following built-in values for you to use in tags:

  • {workspaceId} - The identifier of the workspace.

  • {workspaceName} - The name of the workspace.

  • {timestamp} - The timestamp when the data generation job that created the artifact completed.

  • {jobId} - The identifier of the data generation job that created the artifact.

For example, the following creates a tag that contains the workspace name, job identifier, and timestamp:

{workspaceName}_{jobId}_{timestamp}

To also tag the artifacts as latest, check the Tag as "latest" in your repository checkbox.

Specifying custom resources for the Kubernetes pods

You can also optionally configure custom resource values for the Kubernetes pods. You can specify the ephemeral storage, memory, and CPU millicores.

To provide custom resources:

  1. Toggle Set custom pod resources to the on position.

  2. Under Storage Size:

    1. In the field, provide the number of megabytes or gigabytes of storage.

    2. From the dropdown list, select the unit to use.

    The storage can be between 32MB and 25GB.

  3. Under Memory Size:

    1. In the field, provide the number of megabytes or gigabytes of RAM.

    2. From the dropdown list, select the unit to use.

    The memory can be between 512MB and 4 GB.

  4. Under Processor Size:

    1. In the field, provide the number of millicores.

    2. From the dropdown list, select the unit.

    The processor size can be between 250m and 1000m.

Setting a custom database name

Only available for PostgreSQL and SQL Server. Not available for MySQL.

In the Custom Database Name field, provide the name to use for the destination database.

If you do not provide a custom database name, then the destination database uses the same name as the source database.

Configuring the required tolerations for datapacker node taints

If your Kubernetes nodes are configured with taints, then on a self-hosted instance, you can configure the tolerations that enable the datapacker pods to be scheduled on the nodes. The datapacker pod hosts the temporary database that Structural uses during the data generation.

For an overview of taints and tolerations, go to the Kubernetes documentation.

To configure the tolerations, you configure the following environment settings. You can add these settings to the Environment Settings list on Structural Settings.

  • CONTAINERIZATION_POD_NODE_TOLERATION_KEY - The toleration key value to apply to the datapacker pods. This setting is required. If you do not configure this setting, then Structural ignores the other settings.

  • CONTAINERIZATION_POD_NODE_TOLERATION_VALUES - A comma-separated list of toleration values to apply to the datapacker pods.

  • CONTAINERIZATION_POD_NODE_TOLERATION_EFFECT - The toleration effect to apply to the datapacker pods.

  • CONTAINERIZATION_POD_NODE_TOLERATION_OPERATOR - The toleration operator to apply to the datapacker pods.

Last updated