Managing benchmarks in Validate

A benchmark is a set of questions that can optionally include the expected answers. For a Tonic Validate development project, a benchmark is one way to provide the questions for a run.

A run assesses how your RAG system answers the benchmark questions. If your benchmark includes answers, then Validate compares the answers from the benchmark with the answers from your RAG system.

To create and update benchmarks, you can use either the Validate application or the Validate SDK.

Managing benchmarks

Displaying the list of benchmarks

To display your list of benchmarks, in the Validate navigation menu, click Benchmarks.

For each benchmark, the Benchmarks page displays:

The name of the benchmark
The number of questions in the benchmark

Creating a benchmark

You create a benchmark from the Benchmarks page.

To create a benchmark from the Benchmarks page:

Click Create A New Benchmark.
In the Name field, enter a name for the benchmark.
Add questions to the benchmark.
Click Save.

Updating a benchmark

You can update the name and questions for an existing benchmark.

To update a benchmark:

On the Benchmarks page, either:
- Click the benchmark name.
- Click the options menu for the benchmark, then click Edit.

On the Edit Benchmark panel, to change the benchmark name, in the Name field, enter the new name.
You can also:
To save the changes, click Save.

Deleting a benchmark

To delete a benchmark, on the Benchmarks page:

Click the options menu for the benchmark.
In the options menu, click Delete.

Configuring benchmark questions

Adding a question to a benchmark

A benchmark consists of a set of questions. For each question, you can optionally provide the expected response.

To add a question to a benchmark:

Click Add Q&A.
In the Question field, type the text of the question.
Optionally, in the Answer field, type the text of the expected answer. If you do not provide an answer, then Validate cannot calculate an answer similarity score for the question.
Click Finish Editing.

Updating a benchmark question

To update an existing question:

Click the edit icon for the question.
Update the Question and Answer fields.
Click Finish Editing.

Deleting questions from a benchmark

To delete a question from a benchmark, click the delete icon for the question.

To delete all of the questions, click Clear All.

Using Benchmarks from the UI

You can use the benchmarks from the UI in the Validate SDK via calling get_benchmark

from tonic_validate import ValidateApi
validate_api = ValidateApi("your-api-key")
benchmark = validate_api.get_benchmark("benchmark_id")

Using the Validate SDK to manage benchmarks

You can use the Validate SDK to create a benchmark from a list of questions and answers.

from tonic_validate import Benchmark
benchmark = Benchmark(
    questions=["What is the capital of France?"],
    answers=["Paris"]
)

To upload this benchmark to the UI, use the new_benchmark method in the ValidateApi

from tonic_validate import ValidateApi
benchmark = Benchmark(
    questions=["What is the capital of France?"],
    answers=["Paris"]
)
validate_api.new_benchmark(benchmark, "benchmark_name")

Last updated 1 year ago

Was this helpful?