LogoLogo
Tonic Validatetonic_validateDocs homeTonic.ai
  • Tonic Validate guide
  • About Tonic Validate
    • What is Tonic Validate?
    • Validate components and tools
    • Validate workflows
  • Getting started with Validate
    • Starting your Validate account
    • Setting up the Validate SDK
    • Quickstart example - create a run and log responses
    • Creating and revoking Validate API keys
  • About RAG metrics
    • About using metrics to evaluate a RAG system
    • RAG components summary
    • RAG metrics summary
    • RAG metrics reference
  • Benchmarks and projects
    • Managing benchmarks in Validate
    • Managing projects in Validate
  • Runs
    • Starting a Validate run
    • Viewing and managing runs
  • Production monitoring
    • Configuring your RAG system to send questions to Validate
    • Viewing the metric scores and logged questions
  • Code examples
    • End-to-end example using LlamaIndex
Powered by GitBook
On this page
  • Validate components
  • Projects
  • Metrics
  • Runs
  • Benchmarks
  • Validate tools
  • Validate SDK (tonic-validate)
  • Validate application

Was this helpful?

Export as PDF
  1. About Tonic Validate

Validate components and tools

Validate components

Projects

A development project is designed to be used during RAG system development. It is a collection of runs that allow you to see how the run performance for a given set of questions changes over time.

A production monitoring project allows you to monitor the performance over time of a production RAG system. You configure the RAG system to automatically send to the production monitoring project the questions your users asked, the answers the RAG system provided, and the associated context.

For more information, go to Managing projects in Validate.

Metrics

Metrics are used to score the RAG system responses to questions.

For a development project, Validate calculates metric scores for the benchmark questions that are provided for the project.

For a production monitoring project, Validate calculates metric scores for the questions that users ask the RAG system. The RAG system sends the questions to Validate.

Validate calculates different metrics that represent different aspects of a RAG system. For more information about metrics, go to the metrics section.

Runs

For a Validate development project, a run represents an assessment of the RAG responses to a set of questions based on the RAG system configuration at a given point in time.

For each response, the run includes:

  • The question and, optionally, the corresponding ideal answer. A benchmark is one option for providing the questions.

  • The LLM's response and the context that the RAG system retrieved

  • Metadata in the form of key-value pairs that you specify. For example, "Model": "GPT-4"

  • Scores for the responses that use your chosen metrics

The run also includes overall scores for the given metrics.

For more information, go to Viewing and managing runs.

Benchmarks

For a Validate development project, a benchmark is a collection of questions with or without responses. The responses represent the ideal answers to the given questions.

A benchmark is one way to provide the questions for Validate to use to evaluate your RAG system.

For more information, go to Managing benchmarks in Validate.

Validate tools

Validate SDK (tonic-validate)

You must use the Validate SDK to:

  • Create Validate runs for a development project

  • Provide questions and RAG system responses to Validate for a development project run

  • Send questions and RAG system responses from your RAG system to a production monitoring project

You can also use the SDK to:

  • Manage projects

  • Manage benchmarks for a development project

  • Calculate RAG metrics outside the context of a Validate project

Validate application

You can use the Validate application to manage benchmarks and projects.

You must use the Validate application to view:

  • The results of Validate runs for a development project

  • The metric scores over time for a production monitoring project

Last updated 1 year ago

Was this helpful?