RAG metrics summary

RAG metrics include the following types of scores:

Score name

Input

Formula

What does it measure?

Evaluated components

Answer consistency

Answer consistency binary

Retrieved context
LLM answer

(Count of main points in answer that can be attributed to context) /

(Count of main points in the answer)

Whether the LLM answer contains information that does not come from the context.

Prompt builder
LLM

Answer contains PII

LLM answer
List of PII types

Calculated by Textual

Whether the LLM answer contains personally identifiable information (PII) of the specified types. Requires a Tonic Textual API key.

Prompt builder
LLM

Answer match

LLM answer
Text string
Case-sensitivity flag

Compare LLM answer to text string

Whether the answer matches the provided text string.

LLM

Answer similarity score

Question
Reference answer
LLM answer

Score between 0 and 5

How well the reference answer matches the LLM answer. Cannot be used for production monitoring projects.

All components

Augmentation accuracy

Retrieved context
LLM answer

(Count of retrieved context in LLM answer) /

(Count of retrieved context)

Whether all of the context is in the LLM answer.

Prompt builder
LLM

Augmentation precision

Question
Retrieved context
LLM answer

(Count of relevant retrieved context in LLM answer) / (Count of relevant retrieved context)

Whether the relevant context is in the LLM answer.

Prompt builder
LLM

Binary

Callback

User-defined

Returns a true or false value based on a callback function that you provide. Cannot be used for production monitoring projects.

User-defined

Contains text

LLM answer
Text string

Text.in(LLM answer)

Whether the response contains the provided text string.

LLM

Context contains PII

Retrieved context
List of PII types

Calculated by Textual

Whether the context used for the response contains PII of the specified types. Requires a Tonic Textual API key.

Prompt builder

Context length

Retrieved context
Minimum length
Maximum length

(Minimum length) <= len(Context) <= (Maximum length)

Whether the length of a context item falls within the specified range.

Prompt builder

Duplication

LLM answer

Returns 1 or 0 based on whether there is duplicate information

Whether the response contains duplicate information.

LLM

Hate speech content

LLM answer

Returns 1 or 0 based on whether there is hate speech

Whether the response contains hate speech.

LLM

Latency

Target length of time

(Run time) <= (Target time)

Whether the response takes longer than the provided target time.

Entire system

Regex

LLM answer
Regular expression
Expected number of matches

Runs a regex search and then counts the matches. Returns true if the number of matches is equal to the expected match count.

Whether the response contains the expected number of matches for the provided regular expression.

LLM

Response length

LLM answer
Minimum length
Maximum length

(Minimum length) <= len(LLM response) <= (Maximum length)

Whether the response length falls within the specified range.

LLM

Retrieval precision

Question
Retrieved context

(Count of relevant retrieved context) / (Count of retrieved context)

Whether the context retrieved is relevant to answer the given question.

Chunker
Embedder
Retriever

Last updated 5 months ago

Was this helpful?