A development project is designed to be used during RAG system development. It is a collection of runs that allow you to see how the run performance for a given set of questions changes over time.
A production monitoring project allows you to monitor the performance over time of a production RAG system. You configure the RAG system to automatically send to the production monitoring project the questions your users asked, the answers the RAG system provided, and the associated context.
For more information, go to Managing projects in Validate.
Metrics are used to score the RAG system responses to questions.
For a development project, Validate calculates metric scores for the benchmark questions that are provided for the project.
For a production monitoring project, Validate calculates metric scores for the questions that users ask the RAG system. The RAG system sends the questions to Validate.
Validate calculates different metrics that represent different aspects of a RAG system. For more information about metrics, go to the metrics section.
For a Validate development project, a run represents an assessment of the RAG responses to a set of questions based on the RAG system configuration at a given point in time.
For each response, the run includes:
The question and, optionally, the corresponding ideal answer. A benchmark is one option for providing the questions.
The LLM's response and the context that the RAG system retrieved
Metadata in the form of key-value pairs that you specify. For example, "Model": "GPT-4"
Scores for the responses that use your chosen metrics
The run also includes overall scores for the given metrics.
For more information, go to Viewing and managing runs.
For a Validate development project, a benchmark is a collection of questions with or without responses. The responses represent the ideal answers to the given questions.
A benchmark is one way to provide the questions for Validate to use to evaluate your RAG system.
For more information, go to Managing benchmarks in Validate.
You must use the Validate SDK to:
You can also use the SDK to:
Manage projects
Calculate RAG metrics outside the context of a Validate project
You can use the Validate application to manage benchmarks and projects.
You must use the Validate application to view: