End-to-end example using LlamaIndex

This development project example uses:

The data found in the examples/paul_graham_essays folder of the tonic_validate SDK repository
The list of questions and reference answers found in examples/question_and_answer_list.json

We use six Paul Graham essays about startup founders taken from his blog. With these we build a RAG system that uses the simplest LlamaIndex default model.

from llama_index import VectorStoreIndex, SimpleDirectoryReader

documents = SimpleDirectoryReader("./paul_graham_essays").load_data()
index = VectorStoreIndex.from_documents(documents)
query_engine = index.as_query_engine()

# Gets the response from llama index
def get_llama_response(prompt):
    response = query_engine.query(prompt)
    context = [x.text for x in response.source_nodes]
    return {
        "llm_answer": response.response,
        "llm_context_list": context
    }

We load the question and answer list and use it to create a Tonic Validate benchmark.

import json
from tonic_validate import Benchmark

import json
qa_pairs = []
with open("question_and_answer_list.json", "r") as qa_file:
    qa_pairs = json.load(qa_file)[:10]

question_list = [qa_pair['question'] for qa_pair in qa_pairs]
answer_list = [qa_pair['answer'] for qa_pair in qa_pairs]

benchmark = Benchmark(questions=question_list, answers=answer_list)

Next, we connect to Validate using an API token we generated from the Validate application, and create a new development project and benchmark.

from tonic_validate import ValidateApi
validate_api = ValidateApi("api-key-here")

Finally, we can create a run and score it.

from tonic_validate import ValidateScorer, ValidateApi
from tonic_validate.classes.benchmark import BenchmarkItem

# Score the responses
scorer = ValidateScorer()
run = scorer.score(benchmark, get_llama_response)

After you execute this code, you can upload your results to the Validate application and view it.

from tonic_validate import ValidateApi
# Upload the run
validate_api = ValidateApi("your-api-key")
validate_api.upload_run("your-project-id", run)

The metrics are automatically calculated and logged to Validate. The distribution of the scores over the benchmark are also graphed.

Last updated 1 year ago

Was this helpful?