Validate components and tools

Validate components

Runs

A run represents an assessment of the RAG responses to a set of questions based on the RAG system configuration at a given point in time.

For each response, the run includes:

  • The question and, optionally, the corresponding ideal answer. A benchmark is one option for providing the questions.

  • The LLM's response and the context that the RAG system retrieved

  • Metadata in the form of key-value pairs that you specify. For example, "Model": "GPT-4"

  • Scores for the responses that use your chosen metrics

The run also includes overall scores for the given metrics.

Projects

Projects are collections of runs that allow you to see how your run performance changes over time. Before you can upload runs to the Validate application, you must first create a project.

Benchmarks

In Validate, a benchmark is a collection of questions with or without responses. The responses represent the ideal answers to the given questions.

A benchmark is one way to provide the questions for Validate to use to evaluate your RAG system.

Metrics

Metrics are used to score each item in a benchmark. Validate has different metrics that represent different aspects of a RAG system. For more information about metrics, go to the metrics section.

Validate tools

Validate SDK (tonic-validate)

You must use the Validate SDK to:

You can also use the SDK to:

  • Manage benchmarks and projects

  • Calculate RAG metrics outside the context of a Tonic Validate project

Validate application

You can use the Validate application to manage benchmarks and projects.

You must use the Validate application to view the results of Tonic Validate runs, including:

  • Overall scores

  • Visualizations of the metrics for the run

Last updated