The Content Engine is a powerful tool designed to analyze and compare multiple PDF documents, specifically Form 10-K filings from multinational companies. Leveraging Retrieval Augmented Generation (RAG) techniques, it retrieves, assesses, and generates insights from these documents. Users can query and compare critical content such as risk factors, revenue figures, and business differences.
- PDF Parsing: Efficiently extracts and processes content from Form 10-K filings.
- Embedding: Utilizes local embedding models to generate document embeddings for quick and effective comparison.
- Vector Search: Implements a local vector database for storing and retrieving document content.
- Local LLM: Integrates a local instance of a Large Language Model (LLM) to provide contextual insights based on user queries.
- Interactive Chatbot: Engage with the system through a user-friendly chatbot interface powered by Streamlit.
- Comparison: Seamlessly compare business data, revenue, risk factors, and more between documents.
- Backend Framework: LangChain or LlamaIndex
- Frontend Framework: Streamlit
- Vector Store: FAISS (Other options: ChromaDB, Pinecone, Weaviate)
- Embedding Model: Sentence-BERT for generating document embeddings.
- Local LLM: meta/llama-2-70b-chat for contextual insights.
The system is designed to analyze and compare the following Form 10-K filings:
- Alphabet Inc. Form 10-K
- Tesla, Inc. Form 10-K
- Uber Technologies, Inc. Form 10-K
- What are the risk factors associated with Google and Tesla?
- What is the total revenue for Google Search?
- What are the differences in the business of Tesla and Uber?
Ensure you have the following installed:
- Python 3.8 or later: Download it from here.
- Required Libraries: Check the
requirements.txt
file for all necessary libraries and dependencies. - Add API: Include the REPLICATE_API_TOKEN in your
.env
file, which can be generated from Replicate.
-
Clone this repository to your local machine:
git clone https://github.com/Aman-Vishwakarma1729/Content-Engine cd Content-Engine
-
Install the required dependencies:
pip install -r requirements.txt
-
Run the application using Streamlit:
streamlit run app.py
Happy analyzing