This project demonstrates how to process, chunk, and index documents using LangChain to build a vector store for retrieving information about AI agents. It supports multiple document types like text, web pages, and PDFs, and uses OpenAI embeddings for similarity search.
- Load documents from:
- Text files
- Web pages
- PDF files
- Process documents by chunking with customizable size and overlap.
- Index documents in a vector store using OpenAI embeddings.
- Perform a similarity search to retrieve the most relevant documents for a query.
- Python 3.7+
- Libraries:
langchain-community
dotenv
bs4
chromadb
- Clone the repository:
git clone https://github.com/aditya10avg/RAG-Pipeline---LangChain-.git cd RAG-Pipeline---LangChain-
- Create a virtual environment and activate it:
python3 -m venv venv
source venv/bin/activate # On Windows: venv\Scripts\activate
- Install dependencies
pip install -r requirements.txt
- Set up your .env file.
OPEN_AI_API_KEY=your_openai_api_key