A FastAPI-based Web Application for Retrieval-Augmented Generation (RAG) using Qdrant and Local LlamaFile
This project updates an existing Azure-based Retrieval-Augmented Generation (RAG) web application by integrating the Qdrant vector database and a local LlamaFile model. The updates include:
- Replacing Azure services with a local LlamaFile for text generation.
- Generating and storing embeddings in Qdrant.
- Using FastAPI to verify the functionality of the RAG implementation via an interactive web interface.
- Qdrant Integration: Qdrant serves as the vector database for storing and querying text embeddings.
- Local LlamaFile: A lightweight, local alternative to Azure for generating responses.
- FastAPI Interface: An intuitive API for interacting with the RAG setup.
- Easy Verification: Access the
/docs
URL to test the/ask
endpoint.
Ensure you have the following installed:
- Python 3.8+
- Required Python dependencies (listed in
requirements.txt
)
git clone https://github.com/APsenpai42/llamafile-qdrant-rag
cd llamafile-qdrant-rag
-
Create a virtual environment and activate it:
python -m venv .venv source .venv/bin/activate # On Windows, use .venv\Scripts\activate
-
Install the required dependencies:
pip install -r requirements.txt
-
Run the local Llamafile.
-
Update the
.env
file to point to your local LlamaFile. Example:LLAMAFILE_API_URL=http://127.0.0.1:8080 # Update with your Llamafile server URL LLAMAFILE_API_KEY="your-llamafile-api-key" # Replace with your actual API key if required
For this project, we are using an in-memory Qdrant instance. The embeddings will be automatically loaded when running the application.
Start the application using Uvicorn:
uvicorn main:app --reload
-
Open your browser and navigate to:
http://127.0.0.1:8000
-
Interact with the
/ask
endpoint by providing a query.
- Use the
/ask
endpoint to send queries and verify responses. - Confirm that embeddings are correctly stored and retrieved from Qdrant.