Calculated or related values

These generators calculate values based on a script, formula, or on other values in the database.

For information about the API data models for these generators, go to Calculated or related values in the column attributes.

AI

Generates data values based on a prompt that you provide.

In the AI Prompt field, enter the prompt to use to generate the values.

If you do not provide an AI prompt, then the generator uses the column name as a guide to generate the values.

You can configure a group key to link the column to other columns.

Character Sequence

Populates a column with a value that uses a specific pattern. The pattern can include both random characters of a specific type and specific characters.

For example, you could specify that the column value consists of three random uppercase letters followed by a dash and then 5 random numbers.

In the Pattern field, provide the pattern to use for the value. For more complex patterns, use the Regular Expression generator.

The pattern can include the following to indicate specific types of random characters:

  • - Random digit.

  • @ - Random lowercase letter.

  • ^ - Random uppercase letter.

  • - Random digit or letter. The letter can be either uppercase or lowercase.

  • $ - Random digit or lowercase letter.

  • % - Random digit or uppercase letter.

All other characters are included as entered.

Here are some example patterns and possible output.

Pattern
Example output

###-##-####

232-66-7439

***-##

A0c-34

^@@@-##:###

Cght-87:485

Column From Another Table

Populates the column with values from a column in a different table.

For example, you configure table1.column_1 to be populated with values in table2.column_2.

To configure the generator:

  1. From the Source Table dropdown list, select the table that contains the column to get the values from.

  2. From the Source Column dropdown list, select the source table column to get the values from.

  3. Optionally, you can provide criteria to determine how to select the value, based on matching column values between the two tables. For example, to populate orders.product_name from products.product_name, use a value from a product row where products.product_id is equal to orders.product_id in the row that is being populated. To enable the criteria fields, check Select values based on the following criteria. If the box is not checked, then Fabricate selects a random value from the source column.

  4. To configure the criteria:

    1. From the source table column dropdown list, select the source table column.

    2. From the current table column dropdown list, select the current table column.

You can configure a group key to link the column to other columns.

Foreign Key

Makes the column a foreign key to another table. For example, a product-id column might be a foreign key to a products table.

You can optionally configure how often each value from the linked table appears in the current table.

Selecting the referenced table and column

At a minimum, for a foreign key column, you configure where to find the value to use to populate the column.

  1. From the References Table dropdown list, select the table that the column references.

  2. From the References Column dropdown list, select the column in the reference table to use to populate the value in the current column.

Configuring the relative prevalence of rows from the referenced table

To determine how frequently each row from the referenced table is referenced in the current table, one option is to identify a column from the referenced table that sets the relative weight for the row.

From the Relative Prevalence dropdown list, select the column to use to determine the relative weight. When you use a relative prevalence column, the number of columns does not change.

The column with the weight information is often a column for which the value is generated using the SQL generator. The column is usually excluded from the data export.

For example, when you reference customers from the customers table, you might want most customers to be from California, a smaller number of customers to be from New York, and no customers from other states.

To do that, in the customers table, you might add a weight column that uses the SQL generator, then provide SQL that checks the value of the state where the customer is located:

  • If the state is California, set weight to 10.

  • If the state is New York, set weight to 5.

  • Otherwise, set weight to 0.

You then select that column from the Relative Prevalence dropdown list.

Configuring the number of table rows based on the referenced table

Instead of a relative prevalence of rows from the referenced table, you can instead set a cardinality distribution that determines the number of rows in the current table that reference each row in the referenced table.

For example, you might configure the current table to include between 1 and 5 rows for each customer in the customers table.

When you configure a cardinality distribution, the number of rows in the table can vary, and the row count is replaced with information about how the number of rows is determined. For example:

Example of row count information when the number of rows is calculated

To configure a cardinality distribution:

  1. Check Base the number of rows in this table on this foreign key relationship.

  2. From the Distribution dropdown list, select the method to use to distribute the values among the rows.

  3. Configure the options for that distribution method.

Fixed distribution

In the Exactly field, specify the number of times each row in the referenced column appears in the current table.

For example, you might indicate to use each product from a products table exactly 5 times in an orders table.

From column distribution

From the column dropdown list, select the column from the referenced table that contains the number of rows to generate in the current table.

For this type of distribution, you might create a SQL column for which the value is calculated based on another column. The column is usually excluded from the data export.

For example, in a customers table that populates the customer_id for an orders table, you might create an order_count column where:

  • When the customer is located in California, set the value between 1 and 5.

  • When the customer is located in New York, set the value between 1 and 10 rows.

  • For other states, set the value to 0.

In the orders table, each customer from California appears between 1 and 5 times, based on the value of order_count for that customer. Each customer from New York appears between 1 and 10 times. Customers in other states do not appear.

Normal distribution

For a normal distribution:

  1. In the Minimum field, set the minimum number of times to use each row.

  2. In the Mean field, set the mean number of times to use each row.

  3. In the Standard Deviation field, set the standard deviation from the mean for the number of times to use each row.

  4. In the Maximum field, set the maximum number of times to use each row.

Uniform distribution

For a uniform distribution:

  1. In the Minimum field, set the minimum number of times to use each row.

  2. In the Maximum field, set the maximum number of times to use each row.

Until column <= 0

The until column <= 0 distribution type is used when a referenced record reflects a limited amount that can be consumed. After that amount is consumed, you can no longer refer to that record.

For example, a loan_payments table includes a foreign key to a loans table. Each loan has multiple payments against it, but the payments cannot continue after the loan is paid off. So in the loan_payments table, you could add a column that calculates the remaining balance on the loan after the most recent payment. When the remaining balance for the loan reaches zero, then the loan is paid off and is no longer used to add records to the loan_payments table.

You could do something similar for values such as a limited amount of time (add rental payments until the months remaining is 0) or a limited distance (add miles traveled until the distance remaining is 0).

From the column dropdown, select the column that contains the remaining amount to consume.

List

Populates the column from a provided list of values. You can optionally provide a weight for each value.

To provide specific values, use the Values text area.

  • Provide each value on a separate line.

  • To weight the values, append |<n>, where <n> is the weighting factor. The lower the number, the lower the weight. For example, a value with a weight of 1 appears the least number of times. A value with a weight of 3 appears 3 times as often.

Instead of providing specific values, in the AI field, you can provide a text description. For example, "All of the colors of the rainbow weighted evenly."

You can also provide a group key to link the column to other columns.

Rank

Assigns a rank value based on the values in 2 other columns.

To determine the ranking, the generator partitions the data by the values in a specified column. It then orders the values based on the value in another column.

For example, to populate a rank column, the data is partitioned by a product column. For each value of product, it determines how often each value of color occurs.

For example, for rows that contain the product value product1, in the color column:

  • yellow occurs 20 times

  • red occurs 17 times

  • green occurs 12 times

For a row that contains product1 and yellow, rank is 1.

For a row that contains product1 and red, rank is 2.

For a row that contains product1 and green, rank is 3.

To configure the generator:

  1. From the Partition By dropdown list, select the column to use to partition the data.

  2. From the Order By dropdown list, select the column to use to set the ranking order.

Regular Expression

Produces a value that matches a regular expression that you provide. It uses the Peri-compatible regular expression syntax.

In the Pattern field, provide the regular expression to use.

Here are some examples of regular expressions and matching output:

Pattern
Example output

\d{3}-\d{2}-\d{4}

232-66-7439

[A-Z]{3}-\d{1,2}

ABC-34

[A-Z]{3}-\d{2}:\d{3,5}

ABC-34:4853

SQL

Uses a SQL expression to generate the value. The expression can refer to other columns and other tables.

In the SQL Expression field, provide the SQL expression. You can provide any expression that is supported by SQLite. You can also use the Fabricate custom functions.

Alternatively, you can have Fabricate generate the expression for you. In the AI field, provide a text description of how to generate the values, then click Generate. For example, "The value of email-address from the users table where user-id matches user-id in this table".

Sum From Another Table

Populates a column with the sum of column values from rows in another table. To identify the rows to include, you provide join criteria.

To configure the generator:

  1. From the Source Table dropdown list, select the table that contains the column to sum the values for.

  2. From the Column To Sum dropdown list, select the column that contains the values to sum.

  3. Under Join based on the following criteria, to indicate how Fabricate identifies the rows to include:

    1. From the first dropdown list, select the column from the source table.

    2. From the second dropdown list, select the column from the current table that the first column must match.

For example, an order_items table contains a price column.

In the orders table, you add an order_amount column. The value of order_amount is the sum of order_items.price where order_items.order_id is equal to orders.order_id.

Sum Of Previous Rows

Adds together the values of a specified numeric column in rows that are before the current row.

To configure the generator:

  1. From the Column to Sum dropdown list, select the column for which to add together the values. The column must contain numeric values.

  2. From the Group By dropdown list, select the column to use the limit the rows that are used. The sum only includes rows for which the grouping column has the same value as the current row.

  3. From the Order by dropdown list, select the column to use to sort the rows. This determines the rows that are included in the sum.

For example, a previous-total-quantity column sums the value of the quantity column.

The sum is grouped by the product column, meaning that each sum only includes rows that have the same value of product as the current row. For example, if product for the current row is shirt, then the sum only includes rows where product is shirt.

The rows are sorted by date, to ensure that the sum only includes rows for entries that occurred before the current row.

Value From Previous Row

Populates a column with a value or values from another column in the previous row or rows.

To configure the generator:

  1. From the Source Column dropdown list, select the column from which to pull the value.

  2. From the Format dropdown list, select whether to use only the value from the previous row, or the values from all of the previous rows.

    1. previous value - Populate the current column with the single value from the previous row.

    2. list of all prior values - Populate the current column with a column-separated list of values from all of the previous rows.

    3. json array of all prior values - Populate the current column with a JSON array of values from all of the previous rows.

  3. You can limit the previous values to rows for which a value for a selected column matches the value in the current row. From the Partition By dropdown list, select the column to use. For example, the previous-product column is populated from the product column in the previous row. When color is the partition column, the value is populated from the closest previous row that has the same value of color as the current row.

  4. From the Order By column, select the column to use to sort the rows. If you selected a partition column, then the sorting is within each partition. The sorting determines which rows are previous to the current row. For example, the rows might be sorted by a datetime column, to ensure that the previous row is the most recent row before the current row.

Last updated