Reviewing the training results
The model details page displays the list of training jobs that were run against the model. The job list can include information about the job itself, as well as about the model configuration that was in place when the job was run.
Model details page with training jobs list
For jobs that are running, queued, or failed, you can view the job details. For queued and running jobs, you can cancel the job.
For completed jobs, you can view a visual summary of the results for a specific job, and compare jobs.
You can configure the columns to include in the jobs list. By default, the jobs list includes:
- The job identifier
- The job status
- The model version that the job ran against. When you change either the query or the column types, Djinn updates the model version. If the updates cause the model configuration to match an existing version, the model is assigned that existing version number. Djinn only assigns new version numbers to unique versions. Note that for training jobs that ran before we introduced model versioning, the model version is always 0.
- When the job was submitted
- When the job was completed
- The general model parameter values that were used
For a tabular model, you can also display the tabular-specific parameters. For an event-driven model, you can also display the event-specific parameters.
To manage the displayed columns, click the list icon at the top right of the table. The column list contains the full list of available columns, and indicates whether each column is currently displayed.
To change whether a column is currently displayed, click the column name.
You can use the job status to filter the list. To filter the list:
- 1.In the Training Status column heading, click the filter icon.
- 2.On the filter panel, check the checkbox for each status to include.
As you check and uncheck the checkboxes, Djinn updates the list.
You can use the following columns to sort the list:
- Model version
- Job submission
- Job completion
To sort by a column, click the column heading. To reverse the sort order, click the column heading again.
The Model Synthesis Report for a completed model training job provides a visual summary of the training results. It allows you to see how well the values in the generated data correspond to those in the original data. This indicates how realistic the generated data is.
Djinn produces a Model Synthesis Report for each completed model training job.
From the model details page, to display the Model Synthesis Report for a previous training job, click the Synthesis Report option for that job. The option is only available for completed jobs.
From the job details page, to display the Model Synthesis Report for the job, click Synthesis Report.
The Model Synthesis Report contains the following sets of visualizations:
- Categorical - For each categorical column, the Categorical section shows the distribution of each value in both the original data and the generated data. For example, the possible values for a
payment-methodcolumn are Check, Electronic Transfer, and Credit Card. In the Categorical section, the visualization for
payment-methodshows the number of real and generated columns that have each value. The closer the value counts match, the more realistic the generated data.
- Continuous - For each numeric column, the Continuous section shows the distribution of values in the original data and the generated data. The closer the distributions match, the more realistic the generated data.
- Correlations - The Correlations section contains a correlation matrix for the original data and a correlation matrix for the generated data. Each correlation matrix shows how the values in each numeric column correspond to the values in the other numeric columns. For example, as the tenure for a customer increases, does their bill amount also increase? The correlation is displayed using a color code that represents a value between -1 and 1. -1 indicates that an increase in one value always corresponds to a decrease in the other value. 0 indicates that there is no correlation between the values. 1 indicates that an increase in one value always corresponds to an increase in the other value. The blocks that correlate a column to itself always have a correlation of 1. The more similar the correlations between the matrices, the more realistic the generated data.
- Measure of Privacy - The Measure of Privacy section shows how closely each generated record matches the most similar original record. It also plots how closely each original record matches the most similar other original record. While the overall shape of the data should be similar between the original and generated data, the generated data should not replicate actual records.
To compare the model configuration and results for multiple jobs:
- 1.Check the check box for each job to include in the comparison.
- 2.Click Compare Jobs.
The comparison page displays a panel for each job.
Comparison of model training jobs
At the top of the panel are the job start and end times.
Below that are tabs that summarize the results and contain the configuration that was in place when the training job ran:
- Parameters shows the model version and the model parameter values
- Schema contains the data schema
- Query contains the query used to produce the model data
From the actions menu at the top right of the panel, you can:
- Display the job details
- View the Model Synthesis Report for the job