1 of 1

RAG metrics summary

RAG metrics include the following types of scores:

Score name

Input

Formula

What does it measure?

Evaluated components

Retrieved context
LLM answer

(Count of main points in answer that can be attributed to context) /

(Count of main points in the answer)

Whether the LLM answer contains information that does not come from the context.

Prompt builder
LLM

LLM answer
List of PII types

Calculated by Textual

Whether the LLM answer contains personally identifiable information (PII) of the specified types. Requires a Tonic Textual API key.

Prompt builder
LLM

LLM answer
Text string
Case-sensitivity flag

Compare LLM answer to text string

Whether the answer matches the provided text string.

LLM

Question
Reference answer
LLM answer

Score between 0 and 5

How well the reference answer matches the LLM answer. Cannot be used for production monitoring projects.

All components

Retrieved context
LLM answer

(Count of retrieved context in LLM answer) /

(Count of retrieved context)

Whether all of the context is in the LLM answer.

Prompt builder
LLM

Question
Retrieved context
LLM answer

(Count of relevant retrieved context in LLM answer) / (Count of relevant retrieved context)

Whether the relevant context is in the LLM answer.

Prompt builder
LLM

Callback

User-defined

Returns a true or false value based on a callback function that you provide. Cannot be used for production monitoring projects.

User-defined

LLM answer
Text string

Text.in(LLM answer)

Whether the response contains the provided text string.

LLM

Retrieved context
List of PII types

Calculated by Textual

Whether the context used for the response contains PII of the specified types. Requires a Tonic Textual API key.

Prompt builder

Retrieved context
Minimum length
Maximum length

(Minimum length) <= len(Context) <= (Maximum length)

Whether the length of a context item falls within the specified range.

Prompt builder

LLM answer

Returns 1 or 0 based on whether there is duplicate information

Whether the response contains duplicate information.

LLM

LLM answer

Returns 1 or 0 based on whether there is hate speech

Whether the response contains hate speech.

LLM

Target length of time

(Run time) <= (Target time)

Whether the response takes longer than the provided target time.

Entire system

LLM answer
Regular expression
Expected number of matches

Runs a regex search and then counts the matches. Returns true if the number of matches is equal to the expected match count.

Whether the response contains the expected number of matches for the provided regular expression.

LLM

LLM answer
Minimum length
Maximum length

(Minimum length) <= len(LLM response) <= (Maximum length)

Whether the response length falls within the specified range.

LLM

Question
Retrieved context

(Count of relevant retrieved context) / (Count of retrieved context)

Whether the context retrieved is relevant to answer the given question.

Chunker
Embedder
Retriever

RAG metrics summary

RAG metrics include the following types of scores:

Score name

Input

Formula

What does it measure?

Evaluated components

Retrieved context
LLM answer

(Count of main points in answer that can be attributed to context) /

(Count of main points in the answer)

Whether the LLM answer contains information that does not come from the context.

Prompt builder
LLM

LLM answer
List of PII types

Calculated by Textual

Whether the LLM answer contains personally identifiable information (PII) of the specified types. Requires a Tonic Textual API key.

Prompt builder
LLM

LLM answer
Text string
Case-sensitivity flag

Compare LLM answer to text string

Whether the answer matches the provided text string.

LLM

Question
Reference answer
LLM answer

Score between 0 and 5

How well the reference answer matches the LLM answer. Cannot be used for production monitoring projects.

All components

Retrieved context
LLM answer

(Count of retrieved context in LLM answer) /

(Count of retrieved context)

Whether all of the context is in the LLM answer.

Prompt builder
LLM

Question
Retrieved context
LLM answer

(Count of relevant retrieved context in LLM answer) / (Count of relevant retrieved context)

Whether the relevant context is in the LLM answer.

Prompt builder
LLM

Callback

User-defined

Returns a true or false value based on a callback function that you provide. Cannot be used for production monitoring projects.

User-defined

LLM answer
Text string

Text.in(LLM answer)

Whether the response contains the provided text string.

LLM

Retrieved context
List of PII types

Calculated by Textual

Whether the context used for the response contains PII of the specified types. Requires a Tonic Textual API key.

Prompt builder

Retrieved context
Minimum length
Maximum length

(Minimum length) <= len(Context) <= (Maximum length)

Whether the length of a context item falls within the specified range.

Prompt builder

LLM answer

Returns 1 or 0 based on whether there is duplicate information

Whether the response contains duplicate information.

LLM

LLM answer

Returns 1 or 0 based on whether there is hate speech

Whether the response contains hate speech.

LLM

Target length of time

(Run time) <= (Target time)

Whether the response takes longer than the provided target time.

Entire system

LLM answer
Regular expression
Expected number of matches

Runs a regex search and then counts the matches. Returns true if the number of matches is equal to the expected match count.

Whether the response contains the expected number of matches for the provided regular expression.

LLM

LLM answer
Minimum length
Maximum length

(Minimum length) <= len(LLM response) <= (Maximum length)

Whether the response length falls within the specified range.

LLM

Question
Retrieved context

(Count of relevant retrieved context) / (Count of retrieved context)

Whether the context retrieved is relevant to answer the given question.

Chunker
Embedder
Retriever