Google BigQuery is a cloud-based enterprise data warehouse.
Tonic Structural can move data from one dataset to another within the same BigQuery project. Structural also can move data between different BigQuery projects.
Structural uses service accounts to access both the source and destination databases.
During workspace creation, under Connection Type, click BigQuery.
In the Source Settings section, you provide the connection information for the source database:
In the Project ID field, provide the identifier of the project that contains the source database.
In the Dataset Name field, provide the name of the dataset that contains the source database.
For Service Account File, select the service account file (JSON file) for the source database.
To test the connection to the source database, click Test Source Connection.
By default, data generation is not blocked for schema changes that do not conflict with your workspace configuration.
To block data generation when there are any schema changes, regardless of whether they conflict with your workspace configuration, toggle Block data generation on schema changes to the on position.
By default, Tonic Structural copies views from the source database to the destination database. You do not assign generators to view data.
To enable de-identification of view data, toggle the De-identify views option to the on position.
You cannot de-identify views that are written in Google Legacy SQL.
When De-identify views is enabled:
Structural includes views in the lists on Privacy Hub, Database View, and Table View. To identify an item as a view, the name is followed by (view). Structural does not display views that are written in Google Legacy SQL.
You can assign table modes to the views, and generators to the view columns.
When you run data generation:
Structural applies the table modes and generators to the view data. The data generation does not include views that are written in Google Legacy SQL.
For each view, Structural writes the de-identified data to a table called <view name>_tonic_table
.
Structural also creates a view that has the same name and metadata as the view in the source data, but is populated from the destination table.
For example, for a source view called customers
, in the destination database, Structural:
Uses the assigned table mode and generators to create a table called customers_tonic_table
.
Creates a customers
view that uses the schema and metadata from the source customers
view, but is populated by the data from customers_tonic_table
.
Note that destination tables cannot be external tables. They must be native BigQuery tables.
In the Destination Settings section, you provide the connection information for the destination database:
In the Project ID field, provide the identifier of the project that contains the destination database.
In the Dataset Name field, provide the name of the dataset that contains the destination database.
For Service Account File, select the service account file (JSON file) for the destination database.
To test the connection to the destination database, click Test Destination Connection.
Tonic Structural uses service accounts to access both the source and destination datasets. Even if the source and destination are in the same BigQuery project, Structural recommends that you create separate service accounts for the source and destination.
When you create the service account for the source dataset, you should use the "Viewer" role to grant the service account access to the project. Structural does not require write permissions to the source dataset.
When you create the service account for the destination dataset, you should use the "Editor" role to grant the service account access to the project.
Required license: Professional or Enterprise
Google BigQuery workspaces cannot have a hidden dataset as its source or destination dataset.
Post-job scripts run inside of transactions. They are limited to statements that are supported within transactions.
Google BigQuery workspaces cannot use the following table modes:
Incremental
Google BigQuery workspaces cannot use the following generators:
Cross Table Sum
Tonic Structural cannot process STRUCT or INTERVAL types in Google BigQuery.
In the destination database, Structural creates external, snapshot, and cloned tables as normal tables.
You cannot assign generators to partitioned tables that require a partition filter.
The environment setting TONIC_GRPC_ENABLED
indicates whether to use GRPC-based endpoints to access Google BigQuery.
If TONIC_GRPC_ENABLED
is true
, then you can leave all of the columns set to Passthrough.
If TONIC_GRPC_ENABLED
is false
, then you must truncate the tables.
Google BigQuery workspaces do not support subsetting.
However, for tables that use the De-Identify table mode, you can provide a WHERE
clause to filter the table. For details, go to Using table filtering for data warehouses and Spark-based data connectors.
Google BigQuery workspaces do not support upsert.
For Google BigQuery workspaces, you cannot write the destination data to a container repository.
For Google BigQuery workspaces, you cannot write the destination data to an Ephemeral snapshot.
If you enable view de-identification, then you might need to rescynchronize out-of-sync view schemas.
Tonic Structural uses the schema metadata from the INFORMATION_SCHEMAS
metadata tables to display a view’s schema and to manage its schema changes.
For performance reasons, Structural assumes this schema metadata is accurate.
However, schema metadata for BigQuery views might be out of sync with the actual results of querying the view. This can occur for various reasons, but is typically because the query definition is written in such a way that changes to referenced tables result in changes to the schema of the query results.
Data generation jobs ignore and issue a warning for any views that have out-of-date schema metadata.
Before you can de-identify the affected views, you must manually resynchronize the schema metadata in BigQuery.
For logical views, schema metadata is not automatically updated during the lifecycle of the view.
If the underlying schema of the logical view has changed, then to trigger a refresh, you must either change or recreate the view definition.
To trigger a schema refresh, use the BigQuery user interface to modify the view query, then save the view.
No-op changes do trigger schema metadata refreshes.
However, saving the view without modifying the query does not trigger a refresh.
To recreate the view, run the following command:
"CREATE OR REPLACE VIEW '{project}.{dataset}.{view}' OPTIONS({options}) AS ({view_query})"
The full DDL for a view is in the the following table: {dataset}.INFORMATION_SCHEMA.TABLES
For materialized views, refreshes automatically synchronize the schema metadata.
For details on how to configure automatic refreshes and trigger manual refreshes of materialized views, go to the BigQuery documentation.