RAG metrics summary
RAG metrics include the following types of scores:
Score name | Input | Formula | What does it measure? | Evaluated components |
---|---|---|---|---|
| (Count of main points in answer that can be attributed to context) / (Count of main points in the answer) | Whether the LLM answer contains information that does not come from the context. |
| |
| Calculated by Textual | Whether the LLM answer contains personally identifiable information (PII) of the specified types. Requires a Tonic Textual API key. |
| |
| Compare LLM answer to text string | Whether the answer matches the provided text string. | LLM | |
| Score between 0 and 5 | How well the reference answer matches the LLM answer. Cannot be used for production monitoring projects. | All components | |
| (Count of retrieved context in LLM answer) / (Count of retrieved context) | Whether all of the context is in the LLM answer. |
| |
| (Count of relevant retrieved context in LLM answer) / (Count of relevant retrieved context) | Whether the relevant context is in the LLM answer. |
| |
| User-defined | Returns a true or false value based on a callback function that you provide. Cannot be used for production monitoring projects. | User-defined | |
| Text.in(LLM answer) | Whether the response contains the provided text string. | LLM | |
| Calculated by Textual | Whether the context used for the response contains PII of the specified types. Requires a Tonic Textual API key. | Prompt builder | |
| (Minimum length) <= len(Context) <= (Maximum length) | Whether the length of a context item falls within the specified range. | Prompt builder | |
| Returns 1 or 0 based on whether there is duplicate information | Whether the response contains duplicate information. | LLM | |
| Returns 1 or 0 based on whether there is hate speech | Whether the response contains hate speech. | LLM | |
| (Run time) <= (Target time) | Whether the response takes longer than the provided target time. | Entire system | |
| Runs a regex search and then counts the matches. Returns true if the number of matches is equal to the expected match count. | Whether the response contains the expected number of matches for the provided regular expression. | LLM | |
| (Minimum length) <= len(LLM response) <= (Maximum length) | Whether the response length falls within the specified range. | LLM | |
| (Count of relevant retrieved context) / (Count of retrieved context) | Whether the context retrieved is relevant to answer the given question. |
|
Last updated