generated from cotes2020/chirpy-starter
-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
new post on data for evaluating RAGs
- Loading branch information
Santiago Olivar
committed
Feb 21, 2024
1 parent
c30c431
commit d1f54d3
Showing
1 changed file
with
30 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,30 @@ | ||
--- | ||
title: Data for evaluating different RAG configurations | ||
date: 2024-02-20 12:00:00 -0 | ||
categories: [LlamaIndex, Learning] | ||
tags: [RAGs, LlamaIndex, Evaluation] | ||
--- | ||
|
||
In any AI project, it's crucial to clearly define a problem and continually assess how well your solution addresses it. Regular evaluations and feedback loops are essential for maximizing its effectiveness. The choice of data for building and evaluating your product is a key factor. It's important to select relevant data for the problem at hand and ensure a proper evaluation of your solution. | ||
|
||
In my case, I aimed for a simple dataset to explore various ideas and measure their impact on overall performance. Without a direct equivalent to MNIST for RAGs, I decided to create my own dataset using family tree information, perhaps influenced by binge-watching "The Crown" in the last few weeks. | ||
|
||
This dataset could effectively evaluate RAGs due to its interconnected documents and complex family tree structures, providing a good benchmark into different RAG configurations' performance. It is known that RAGs excel at answering questions found directly in the indexed documents but struggle with meta-level questions. Therefore, my plan to compare their performance on simple queries like "Who is person X" versus more complex ones like "How many cousins does person X have?" | ||
|
||
Now, let me explain how I created this dataset. I defined two classes, [Person and Family](https://github.com/bubl-ai/llamaindex-project/blob/main/bubls/bubls/synthetic_data/family_tree.py). | ||
|
||
- Person: Contains relevant information such as name, birthday, first-degree relatives, and life status. This class facilitates information organization and retrieval. | ||
|
||
- Family: A collection of Persons forming a family tree. This class allows retrieving information about an individual and their immediate relatives. | ||
|
||
I developed code that integrates these classes and initializes a fictional family, [The Williams](https://github.com/bubl-ai/llamaindex-project/blob/main/builders/family_tree_synthetic_data/williams_family.py). To create the dataset, I designed two "GPT assistants": a [Biographer](https://github.com/bubl-ai/llamaindex-project/blob/main/bubls/bubls/openai_assistants/biographer.py) and a [QA Generator](https://github.com/bubl-ai/llamaindex-project/blob/main/bubls/bubls/openai_assistants/qa_generator.py). The Biographer generates biographies based on the individual and its relatives, and the QA Generator creates question-answer pairs about each biography. The dataset is available on our [Hugging Face profile]((https://huggingface.co/datasets/bubl-ai/williams_family_tree)). | ||
|
||
The dataset is organized into two folders: | ||
|
||
- Biographies: Contains narratives intricately woven based on our predefined family structure. | ||
|
||
- Test Questions: Curates pairs of questions and answers derived from the biographies, serving as a valuable test dataset for evaluating various Retrieval-Augmented Generative (RAG) configurations. | ||
|
||
Without delay, let's dive into the experimentation phase. Here, we'll test various RAG customizations and configurations, assessing their performances to gain insights into their strengths and weaknesses. | ||
|
||
|