Partitioning a column
Partitioning allows the value of a column to be based on the values of other related columns. It is one way to generate more realistic destination values.
The following generators support partitioning:
Note that partitioning cannot be configured as part of a generator preset. You can only configure partitioning when you configure a specific column.
To enable partitioning, from the Partition by dropdown list, you choose one or more columns to partition by.
For each value or combination of values in the partitioning columns, Tonic generates a distribution of values for the original column.
For example, you assign the Continuous generator to an Income column, and partition it by an Occupation column. For each Occupation value, Tonic generates a distribution of Income values. In other words, it generates a range of incomes for each occupation, such as Doctor and Construction Worker.
If you choose multiple columns, then the distribution is for each combination of column values. For example, you partition by both Occupation and Region. Tonic creates a distribution of income values for each combination of occupation and region. So there is a distribution for Doctor and Northeast, and a different distribution for Doctor and Southeast.
In the destination database, Tonic sets the value of the partitioned column to a value from the appropriate distribution. The distribution that Tonic uses is based on the value of the partitioning columns in the destination database, not the original value of the partitioning columns in the source database.
To continue our example, assume that the Occupation column uses the Categorical generator. During data generation, Tonic assigns to each record a random occupation value from the current values. For one of the records, the occupation value is Doctor in the source database and Construction Worker in the destination database.
For the Income column for that record, Tonic assigns a value from the distribution of income values for the Construction Worker occupation. In other words, it assigns an income value that is realistic for the destination occupation value based on the source data.
The partitioning option works well when you partition by only one or two columns.
To create a more complex model across several columns, instead of partitioning, use the AI Synthesizer.