RAG metrics summary
Last updated
Last updated
RAG metrics include the following types of scores:
Score name | Input | Formula | What does it measure? | Evaluated components |
---|---|---|---|---|
Retrieved context
LLM answer
(Count of main points in answer that can be attributed to context) /
(Count of main points in the answer)
Whether the LLM answer contains information that does not come from the context.
Prompt builder
LLM
LLM answer
List of PII types
Calculated by Textual
Whether the LLM answer contains personally identifiable information (PII) of the specified types. Requires a Tonic Textual API key.
Prompt builder
LLM
LLM answer
Text string
Case-sensitivity flag
Compare LLM answer to text string
Whether the answer matches the provided text string.
LLM
Question
Reference answer
LLM answer
Score between 0 and 5
How well the reference answer matches the LLM answer. Cannot be used for production monitoring projects.
All components
Retrieved context
LLM answer
(Count of retrieved context in LLM answer) /
(Count of retrieved context)
Whether all of the context is in the LLM answer.
Prompt builder
LLM
Question
Retrieved context
LLM answer
(Count of relevant retrieved context in LLM answer) / (Count of relevant retrieved context)
Whether the relevant context is in the LLM answer.
Prompt builder
LLM
Callback
User-defined
Returns a true or false value based on a callback function that you provide. Cannot be used for production monitoring projects.
User-defined
LLM answer
Text string
Text.in(LLM answer)
Whether the response contains the provided text string.
LLM
Retrieved context
List of PII types
Calculated by Textual
Whether the context used for the response contains PII of the specified types. Requires a Tonic Textual API key.
Prompt builder
Retrieved context
Minimum length
Maximum length
(Minimum length) <= len(Context) <= (Maximum length)
Whether the length of a context item falls within the specified range.
Prompt builder
LLM answer
Returns 1 or 0 based on whether there is duplicate information
Whether the response contains duplicate information.
LLM
LLM answer
Returns 1 or 0 based on whether there is hate speech
Whether the response contains hate speech.
LLM
Target length of time
(Run time) <= (Target time)
Whether the response takes longer than the provided target time.
Entire system
LLM answer
Regular expression
Expected number of matches
Runs a regex search and then counts the matches. Returns true if the number of matches is equal to the expected match count.
Whether the response contains the expected number of matches for the provided regular expression.
LLM
LLM answer
Minimum length
Maximum length
(Minimum length) <= len(LLM response) <= (Maximum length)
Whether the response length falls within the specified range.
LLM
Question
Retrieved context
(Count of relevant retrieved context) / (Count of retrieved context)
Whether the context retrieved is relevant to answer the given question.
Chunker
Embedder
Retriever