A benchmark is a set of questions that can optionally include the expected answers. For a Tonic Validate development project, a benchmark is one way to provide the questions for a run.
A run assesses how your RAG system answers the benchmark questions. If your benchmark includes answers, then Validate compares the answers from the benchmark with the answers from your RAG system.
To create and update benchmarks, you can use either the Validate application or the Validate SDK.
To display your list of benchmarks, in the Validate navigation menu, click Benchmarks.
For each benchmark, the Benchmarks page displays:
The name of the benchmark
The number of questions in the benchmark
You create a benchmark from the Benchmarks page.
To create a benchmark from the Benchmarks page:
Click Create A New Benchmark.
In the Name field, enter a name for the benchmark.
Click Save.
You can update the name and questions for an existing benchmark.
To update a benchmark:
On the Benchmarks page, either:
Click the benchmark name.
Click the options menu for the benchmark, then click Edit.
On the Edit Benchmark panel, to change the benchmark name, in the Name field, enter the new name.
You can also:
To save the changes, click Save.
To delete a benchmark, on the Benchmarks page:
Click the options menu for the benchmark.
In the options menu, click Delete.
A benchmark consists of a set of questions. For each question, you can optionally provide the expected response.
To add a question to a benchmark:
Click Add Q&A.
In the Question field, type the text of the question.
Optionally, in the Answer field, type the text of the expected answer. If you do not provide an answer, then Validate cannot calculate an answer similarity score for the question.
Click Finish Editing.
To update an existing question:
Click the edit icon for the question.
Update the Question and Answer fields.
Click Finish Editing.
To delete a question from a benchmark, click the delete icon for the question.
To delete all of the questions, click Clear All.
You can use the benchmarks from the UI in the Validate SDK via calling get_benchmark
You can use the Validate SDK to create a benchmark from a list of questions and answers.
To upload this benchmark to the UI, use the new_benchmark
method in the ValidateApi