LogoLogo
Tonic Validatetonic_validateDocs homeTonic.ai
  • Tonic Validate guide
  • About Tonic Validate
    • What is Tonic Validate?
    • Validate components and tools
    • Validate workflows
  • Getting started with Validate
    • Starting your Validate account
    • Setting up the Validate SDK
    • Quickstart example - create a run and log responses
    • Creating and revoking Validate API keys
  • About RAG metrics
    • About using metrics to evaluate a RAG system
    • RAG components summary
    • RAG metrics summary
    • RAG metrics reference
  • Benchmarks and projects
    • Managing benchmarks in Validate
    • Managing projects in Validate
  • Runs
    • Starting a Validate run
    • Viewing and managing runs
  • Production monitoring
    • Configuring your RAG system to send questions to Validate
    • Viewing the metric scores and logged questions
  • Code examples
    • End-to-end example using LlamaIndex
Powered by GitBook
On this page
  • Configuring the run
  • Configuring the run metrics
  • Providing the RAG system responses and retrieved context to Validate
  • Using parallelism to speed up runs

Was this helpful?

Export as PDF
  1. Runs

Starting a Validate run

To start and upload a run, use ValidateScorer to score the run.

Next, to upload the run with ValidateApi, use an API token to connect to Tonic Validate, then specify the project identifier.

from tonic_validate import ValidateScorer, ValidateApi, Benchmark

# Function to simulate getting a response and context from your LLM
# Replace this with your actual function call
def get_rag_response(question):
    return {
        "llm_answer": "Paris",
        "llm_context_list": ["Paris is the capital of France."]
    }

benchmark = Benchmark(questions=["What is the capital of France?"], answers=["Paris"])
# Score the responses for each question and answer pair
scorer = ValidateScorer()
run = scorer.score(benchmark, get_rag_response)

# Uploads the project to the UI
validate_api = ValidateApi("your-api-key")
validate_api.upload_run("your-project-id", run)

Configuring the run

When you create a run, you specify the LLM evaluator and the metrics to calculate on the responses that are logged during the run.

For the LLM evaluator, we currently support the following models:

  • OpenAI, including Azure's OpenAI service

  • Gemini

  • Anthropic

To change your model, use the model_evaluator argument to pass the model string to ValidateScorer .

For example:

scorer = ValidateScorer(model_evaluator="gpt-4")
scorer = ValidateScorer(model_evaluator="gemini/gemini-1.5-pro-latest")
scorer = ValidateScorer(model_evaluator="claude-3")

If you use Azure, then instead of the model name, you pass in your deployment name:

scorer = ValidateScorer(model_evaluator="my-azure-deployment-name")

Configuring the run metrics

By default, the following RAG metrics are calculated:

  • Answer similarity - Note that if you do not provide expected answers to your benchmark questions, then Tonic Validate cannot determine answer similarity.

  • Augmentation precision

  • Answer consistency

When you create a run, if you only pass in the LLM evaluator, then the run calculates all of these metrics.

To specify the metrics to calculate during the run, pass a list of the metrics:

from tonic_validate import ValidateScorer
from tonic_validate.metrics import AnswerConsistencyMetric, AugmentationAccuracyMetric

scorer = ValidateScorer([
    AnswerConsistencyMetric(),
    AugmentationAccuracyMetric()
])

Providing the RAG system responses and retrieved context to Validate

To provide the RAG system response, create a callback function that returns:

  • A string that contains the RAG system's response

  • A list of strings that represent the list of retrieved context from the RAG system

# Function to simulate getting a response and context from your LLM
# Replace this with your actual function call
def get_rag_response(question):
    return {
        "llm_answer": "Paris",
        "llm_context_list": ["Paris is the capital of France."]
    }
     
# Score the responses
scorer = ValidateScorer()
run = scorer.score(benchmark, ask_rag)

When you log the RAG system answer and retrieved context, the metrics for the answer and retrieved context are calculated on your machine. They are then sent to the Validate application.

This may take some time. To calculate each metric requires at least one call to the LLM's API.

Using parallelism to speed up runs

If you want to speed up your runs, you can pass a parallelism argument to the score function to create additional threads to score your metrics faster

scorer.score(
    benchmark,
    get_rag_response,
    scoring_parallelism=2,
    callback_parallelism=2
)
  • scoring_parallelism controls the number of threads that are used to score the responses. For example, if scoring_parallelism is set to 2, then for scoring, 2 threads can call the LLM's API simultaneously.

  • callback_parallelism controls the number of threads that are used to call the callback that you provided. For example, if callback_parallelism is 2, then two threads can call your callback function simultaneously.

Last updated 1 year ago

Was this helpful?

For information on how to set up an Azure deployment, go to .

Using Azure's OpenAI service