🤖 CustomGPT - Chat with your Data 📚

CustomGPT is a sophisticated, multilingual chatbot designed to streamline the extraction, processing, and interaction with text data from PDF documents.
Leveraging advanced NLP and machine learning models, it enables rich, interactive communication across multiple languages, making it ideal for businesses, educational institutions or individuals dealing with diverse document formats.

📖 Introduction

CustomGPT harnesses the power of conversational AI to enhance the way organizations or individuals handle document-based information.
By automatically extracting and analyzing text from PDFs and facilitating dynamic interactions through its chatbot interface, CustomGPT transforms static data into actionable insights.
This integration of document processing with advanced dialogue systems offers a unique solution that significantly boosts productivity and user engagement.

Screenshot

✨ Features

PDF Text Extraction: Utilizes PyPDF2 for efficient text extraction from PDFs, handling multiple layouts and formats.
Advanced Text Processing: Integrates tokenizers and Spacy text splitters for text segmentation, and employs Spacy Language Detection module for robust language detection, ensuring precise text analysis.
Multilingual Support: Powered by multiple instances of the transformer-based large language models Mistral-7B-Instruct-v0.2, supports interactions in multiple languages using Hugging Face API:
- English 🇬🇧
- Spanish 🇪🇸
- French 🇫🇷
- German 🇩🇪
- Italian 🇮🇹
- Ukrainian 🇺🇦
- Russian 🇷🇺
- Chinese 🇨🇳
- Japanese 🇯🇵
Interactive User Interface: Offers a user-friendly command-line interface that may evolve into a more graphical interface.

🚀 Getting Started

⚙️ Installation

Step 1: clone the repo

git clone https://github.com/wayzeek/CustomGPT.git

Step 2: navigate to the directory

cd CustomGPT

Step 3: install dependencies

bash install.sh

Step 4: move to virtual environment

source .venv/bin/activate

Step 5: start application

python3 main.py

🔍 Usage

Process PDFs

Step 1: add your PDFs to the data directory
Step 2: launch application

python3 main.py

Step 3: select if your PDFs is structured by Markdowns (Chapters, Titles, ...) or not
Step 4: Choose the chunk size aka the average sizes of your paragraph
Step 5: Wait & enjoy chating with your data !

🤝 Contributing

Fork the repo
Create your feature branch (git checkout -b feature/amazingFeature)
Commit your changes (git commit -am 'Add some amazingFeature')
Push to the branch (git push origin feature/amazingFeature)
Open a pull request

🏆 Credits

This is a solo project made by myself

⚖️ License

MIT License - see the LICENSE file for details

Name		Name	Last commit message	Last commit date
Latest commit History 16 Commits
data		data
srcs		srcs
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
install.sh		install.sh
main.py		main.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

🤖 CustomGPT - Chat with your Data 📚

📚 Table of Contents

📖 Introduction

Screenshot

✨ Features

🚀 Getting Started

⚙️ Installation

🔍 Usage

Process PDFs

🤝 Contributing

🏆 Credits

⚖️ License

About

Releases

Packages

Languages

License

wayzeek/CustomGPT

Folders and files

Latest commit

History

Repository files navigation

🤖 CustomGPT - Chat with your Data 📚

📚 Table of Contents

📖 Introduction

Screenshot

✨ Features

🚀 Getting Started

⚙️ Installation

🔍 Usage

Process PDFs

🤝 Contributing

🏆 Credits

⚖️ License

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages