Partitioning a column

Partitioning allows the value of a column to be based on the values of other related columns. It is one way to generate more realistic destination values.

Generators that support partitioning

The following generators support partitioning:

Note that partitioning cannot be configured as part of a generator preset. You can only configure partitioning when you configure a specific column.

How partitioning works

Choose the columns to partition by

To enable partitioning, from the Partition by dropdown list, you choose one or more columns to partition by.

You can only choose columns that have the generator set to Passthrough or Categorical.

Generating a distribution of column values for each partition

For each value or combination of values in the partitioning columns, Tonic Structural generates a distribution of values for the original column.

For example, you assign the Continuous generator to an Income column, and partition it by an Occupation column. For each Occupation value, Structural generates a distribution of Income values. In other words, it generates a range of incomes for each occupation, such as Doctor and Construction Worker.

If you choose multiple columns, then the distribution is for each combination of column values. For example, you partition by both Occupation and Region. Structural creates a distribution of income values for each combination of occupation and region. So there is a distribution for Doctor and Northeast, and a different distribution for Doctor and Southeast.

Choosing a value from the appropriate distribution

In the destination database, Structural sets the value of the partitioned column to a value from the appropriate distribution. The distribution that Structural uses is based on the value of the partitioning columns in the destination database, not the original value of the partitioning columns in the source database.

To continue our example, assume that the Occupation column uses the Categorical generator. During data generation, Structural assigns to each record a random occupation value from the current values. For one of the records, the occupation value is Doctor in the source database and Construction Worker in the destination database.

For the Income column for that record, Structural assigns a value from the distribution of income values for the Construction Worker occupation. In other words, it assigns an income value that is realistic for the destination occupation value based on the source data.

Last updated 9 months ago

Was this helpful?