RAG metrics summary

RAG metrics include the following types of scores:

Score nameInputFormulaWhat does it measure?Evaluated components
  • Retrieved context

  • LLM answer

(Count of main points in answer that can be attributed to context) /

(Count of main points in the answer)

Whether the LLM answer contains information that does not come from the context.

  • Prompt builder

  • LLM

  • LLM answer

  • List of PII types

Calculated by Textual

Whether the LLM answer contains personally identifiable information (PII) of the specified types. Requires a Tonic Textual API key.

  • Prompt builder

  • LLM

  • LLM answer

  • Text string

  • Case-sensitivity flag

Compare LLM answer to text string

Whether the answer matches the provided text string.

LLM

  • Question

  • Reference answer

  • LLM answer

Score between 0 and 5

How well the reference answer matches the LLM answer. Cannot be used for production monitoring projects.

All components

  • Retrieved context

  • LLM answer

(Count of retrieved context in LLM answer) /

(Count of retrieved context)

Whether all of the context is in the LLM answer.

  • Prompt builder

  • LLM

  • Question

  • Retrieved context

  • LLM answer

(Count of relevant retrieved context in LLM answer) / (Count of relevant retrieved context)

Whether the relevant context is in the LLM answer.

  • Prompt builder

  • LLM

  • Callback

User-defined

Returns a true or false value based on a callback function that you provide. Cannot be used for production monitoring projects.

User-defined

  • LLM answer

  • Text string

Text.in(LLM answer)

Whether the response contains the provided text string.

LLM

  • Retrieved context

  • List of PII types

Calculated by Textual

Whether the context used for the response contains PII of the specified types. Requires a Tonic Textual API key.

Prompt builder

  • Retrieved context

  • Minimum length

  • Maximum length

(Minimum length) <= len(Context) <= (Maximum length)

Whether the length of a context item falls within the specified range.

Prompt builder

  • LLM answer

Returns 1 or 0 based on whether there is duplicate information

Whether the response contains duplicate information.

LLM

  • LLM answer

Returns 1 or 0 based on whether there is hate speech

Whether the response contains hate speech.

LLM

  • Target length of time

(Run time) <= (Target time)

Whether the response takes longer than the provided target time.

Entire system

  • LLM answer

  • Regular expression

  • Expected number of matches

Runs a regex search and then counts the matches. Returns true if the number of matches is equal to the expected match count.

Whether the response contains the expected number of matches for the provided regular expression.

LLM

  • LLM answer

  • Minimum length

  • Maximum length

(Minimum length) <= len(LLM response) <= (Maximum length)

Whether the response length falls within the specified range.

LLM

  • Question

  • Retrieved context

(Count of relevant retrieved context) / (Count of retrieved context)

Whether the context retrieved is relevant to answer the given question.

  • Chunker

  • Embedder

  • Retriever

Last updated