# Configuring the model parameters

On the model configuration view, the

**Advanced**tab at the left contains the options that Tonic uses during the data training and generation process.Parameters tab on the model configuration page

By default, models are tabular. A tabular model focuses on the relationships between columns.

However, a model might be event driven, meaning that you want to correspond both rows and columns. For example, you might want to track financial transactions across time for each user.

For an event driven model, you specify:

- The column to use to identify the row. For example, to track activity for users, you might use a column that contains a user name or identifier.
- The column to use to sort the rows. This column contains a numeric representation of a datetime value.
- Optionally, columns to use to provide conditions for sampling the data. When you sample the data, you specify the column values to use in the generated events. For example, you choose to condition the data based on a region column. When you sample the data, you can specify the regions for which to generate events.

To indicate that a model is event driven:

- 1.From the
**Model**dropdown, select**Event Driven**. - 2.From the
**Primary Entity**drop-down list, select the column to use to identify the row. - 3.From the
**Order**drop-down list, select the column to use to sort the rows. The order column can be a numeric column, a date column, or a datetime column. - 4.Under
**Condition On**, to configure a list of columns for conditional sampling:- 1.To add a column, begin to type the column name. From the list of matching columns, select the column to add. You can only use categorical columns. The columns also should contain static data. For example, for a transaction, the account type is static. It is not affected by the transaction. The transaction type and remaining balance are dynamic. They are specific to an individual transaction.
- 2.To remove a column, click its delete icon.

Event data configuration

The parameters under

**General Parameters**are common to all models:- 1.In the
**Epochs**field, enter the number of times that the training process goes over the data. - 2.Use the
**Early Stopping**toggle to indicate whether to use early stopping for model training. If**Early Stopping**is turned on, then the model training does not have to run the full number of epochs. It stops running when the model begins to overfit to the training data. If**Early Stopping**is turned off, then the model training runs the full number of configured epochs. - 3.In the
**Batch Size**field, enter the number of examples to use during each training step. The default is 500. A higher value can make the training more regular, but might require more epochs to converge to similar results. - 4.In the
**Reconstruction Loss Factor**field, type the loss function for the model. The default is 2. The loss function for a variational autoencoder is essentially the sum of a “reconstruction loss” function and a regularization term. A higher value can help to produce decoded samples that are close to encoded samples, but also can make latent representations more complicated and reduce diversity of synthetic samples. - 5.In the
**Latent Dimension**field, enter the dimension of latent representation. The default is 128. This latent dimension represents the complexity of the data. If the specified value is much higher than the dimensionality of the issue that you want to analyze, it can reduce the quality of the results. - 6.In the
**Maximum Categorical Dimension**field, enter the dimension for columns that have categorical or location encoding. The default is 35. If a column contains more distinct categories than this parameter, the most frequent categories are embedded as distinct one-hot vectors. The remaining categories are combined into a single one-hot vector. This limit prevents the model size from becoming extremely large and generally improves data quality.

For an event driven model, to configure the

**RNN-VAE Parameters**:- 1.In the
**Maximum Sequence Length**field, enter the maximum number of steps in a sequence that Tonic considers when it trains the event model. The default is 20. Longer source sequences are truncated to the maximum length. The resulting synthetic sequences have a length up to this value. Long sequences take longer to process, and can reduce the quality of the results. - 2.In the
**Maximum Order Dimension**field:- If the order column is numeric, then the order column is discretized. Set
**Maximum Order Dimension**to the number of pieces to discretize the order column into. - If the order column is a date or datetime, set
**Maximum Order Dimension**to the maximum number of distinct dates that the model considers. For datetime values, the time is ignored. If the number of dates in the data exceeds**Maximum Order Dimension**, then the model training fails.

- 3.In the
**RNN Encoder Hidden Size**field, enter the number of parameters in the RNN internal states to use for the encoder network. The default is 256. - 4.In the
**RNN Decoder Hidden Size**field, enter the number of parameters in the RNN internal states to use for the decoder network. The default is 256. - 5.In the
**RNN Decoder Fully Connected Size**field, enter the value to represent the complexity of the decoder’s fully connected layer. The default is 128. The hidden state passes through the fully connected layer to generate samples at each time interval. - 6.In the
**Sequence Length Loss Factor**field, enter the loss factor for sequencing for the model. The default is 128. The sequence length loss factor indicates how important it is to predict the sequence length. When you increase this number, Tonic uses more of the model's capacity to capture the statistical properties of sequence lengths. - 7.In the
**Order Column Loss Factor**field, enter the loss factor for the column value order. The default is 128. The order column loss factor determines how important it is to predict the order of the column values. Similar to the sequence loss factor, when you increase this factor, it increases the realism of the synthetic order column values. The scale is different because order column values use different encodings.

For a tabular model, to configure the

**VAE Parameters**:- 1.In the
**Encoder Layer Sizes**field, type a comma-separated list of non-negative integers to specify the number of layers and the size of each layer for the encoder. The default is 256,256,256, which indicates that there are three layers, and that the size of each layer is 256. A higher number of layers or larger layer size increases the expressive capacity of the model. However, to produce good results, you must start with a larger dataset. - 2.In the
**Decoder Layer Sizes**field, type a comma-separated list of non-negative integers to specify the number of layers and the size of each layer for the decoder. The default is 256,256,256, which indicates that there are three layers, and that the size of each layer is 256. A higher number of layers or larger layer size increases the expressive capacity of the model. However, to produce good results, you must start with a larger dataset.

Last modified 4mo ago