Skip to content

DocChat is an AI-powered Multi-Agent RAG system using Docling for structured document parsing and BM25 + vector search retrievers to retrieve fact-checked answers from PDFs, DOCX, and text files, preventing hallucinations. πŸš€

License

Notifications You must be signed in to change notification settings

HaileyTQuach/docchat-docling

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

17 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

DocChat πŸ“πŸ€–

πŸš€ AI-powered Multi-Agent RAG system for intelligent document querying with fact verification

DocChat Cover Image


πŸ“Œ Overview

DocChat is a multi-agent Retrieval-Augmented Generation (RAG) system designed to help users query long, complex documents with accurate, fact-verified answers. Unlike traditional chatbots like ChatGPT or DeepSeek, which hallucinate responses and struggle with structured data, DocChat retrieves, verifies, and corrects answers before delivering them.

πŸ’‘ Key Features:
βœ… Multi-Agent System – A Research Agent generates answers, while a Verification Agent fact-checks responses.
βœ… Hybrid Retrieval – Uses BM25 and vector search to find the most relevant content.
βœ… Handles Multiple Documents – Selects the most relevant document even when multiple files are uploaded.
βœ… Scope Detection – Prevents hallucinations by rejecting irrelevant queries.
βœ… Fact Verification – Ensures responses are accurate before presenting them to the user.
βœ… Web Interface with Gradio – Allowing seamless document upload and question-answering.


πŸŽ₯ Demo Video

πŸ“Ή Click here to watch the DocChat demo
(Opens in a new tab)


πŸ› οΈ How DocChat Works

1️⃣ Query Processing & Scope Analysis

  • Users upload documents and ask a question.
  • DocChat analyzes query relevance and determines if the question is within scope.
  • If the query is irrelevant, DocChat rejects it instead of generating hallucinated responses.

2️⃣ Multi-Agent Research & Retrieval

  • Docling parses documents into a structured format (Markdown, JSON).
  • LangChain & ChromaDB handle hybrid retrieval (BM25 + vector embeddings).
  • Even when multiple documents are uploaded, DocChat finds the most relevant sections dynamically.

3️⃣ Answer Generation & Verification

  • Research Agent generates an answer using retrieved content.
  • Verification Agent cross-checks the response against the source document.
  • If verification fails, a self-correction loop re-runs retrieval and research.

4️⃣ Response Finalization

  • If the answer passes verification, it is displayed to the user.
  • If the question is out of scope, DocChat informs the user instead of hallucinating.

🎯 Why Use DocChat Instead of ChatGPT or DeepSeek?

Feature ChatGPT/DeepSeek ❌ DocChat βœ…
Retrieves from uploaded documents ❌ No βœ… Yes
Handles multiple documents ❌ No βœ… Yes
Extracts structured data from PDFs ❌ No βœ… Yes
Prevents hallucinations ❌ No βœ… Yes
Fact-checks answers ❌ No βœ… Yes
Detects out-of-scope queries ❌ No βœ… Yes

πŸš€ DocChat is built for enterprise-grade document intelligence, research, and compliance workflows.


πŸ“¦ Installation

1️⃣ Clone the Repository

git clone https://github.com/HaileyTQuach/docchat-docling.git docchat
cd docchat

2️⃣ Set Up Virtual Environment

python -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate

3️⃣ Install Dependencies

pip install -r requirements.txt

4️⃣ Set Up API Keys

DocChat requires an OpenAI API key for processing. Add it to a .env file:

OPENAI_API_KEY=your-api-key-here

5️⃣ Run the Application

python app.py

DocChat will be accessible at http://0.0.0.0:7860.

πŸ–₯️ Usage Guide

1️⃣ Upload one or more documents (PDF, DOCX, TXT, Markdown).

2️⃣ Enter a question related to the document.

3️⃣ Click "Submit" – DocChat retrieves, analyzes, and verifies the response.

4️⃣ Review the answer & verification report for confidence.

5️⃣ If the question is out of scope, DocChat will inform you instead of fabricating an answer.

🀝 Contributing

Want to improve DocChat? Feel free to:

  • Fork the repo
  • Create a new branch (feature-xyz)
  • Commit your changes
  • Submit a PR (Pull Request)

We welcome contributions from AI/NLP enthusiasts, researchers, and developers! πŸš€


πŸ“œ License

This project is licensed under a Customed Non-Commercial License – check LICENSE for more details.


πŸ’¬ Contact & Support

πŸ“§ Email: [[email protected]]

About

DocChat is an AI-powered Multi-Agent RAG system using Docling for structured document parsing and BM25 + vector search retrievers to retrieve fact-checked answers from PDFs, DOCX, and text files, preventing hallucinations. πŸš€

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages