Scientific Paper Generator and Database Management

This project provides a framework for managing scientific document databases and generating structured scientific texts. It leverages OpenAI models and the LangChain framework to streamline querying, organizing, and utilizing research materials effectively.

Features

Document Parsing and Storage: Extracts content from PDFs and stores them in a searchable vector database.
Contextual Querying: Uses OpenAI embeddings to find relevant documents for a given query.
Scientific Text Generation: Creates well-structured, context-based scientific texts ready for submission to top-tier conferences.
Flexible API Integration: Incorporates OpenAI API for embedding and text generation tasks.

Installation

Clone the Repository
```
git clone <repository-url>
cd Paper_RAG
```
Install Dependencies
Ensure Python 3.8+ is installed. Then, run:
```
pip install -r requirements.txt
```
Set Up Your OpenAI API Key
During runtime, the script will prompt you to input your OpenAI API key. Alternatively, set it as an environment variable:
```
export OPENAI_API_KEY="your-api-key"
```

Usage

Preparing the Database
To process documents and populate the vector database:
```
python create_database.py
```
Place the following documents in their respective directories:
- Primary context documents: data/pdf/
- Supplemental scientific articles: data/articles/
Querying the Database
Run the script with your query:
```
python query_database.py "Your research question or topic"
```
The script will:
- Search the database for relevant documents.
- Generate a scientific text based on the found documents and the provided query.

Code Structure

create_database.py: Handles document loading, splitting, and storage into the vector database.
query_database.py: Facilitates querying the database and generating responses using OpenAI models.
Utility Functions:
- load_documents: Loads PDF documents.
- split_text: Splits documents into manageable chunks for processing.
- save_to_chroma: Saves document embeddings to the Chroma vector database.

Prompt Template

The generated scientific text adheres to the following template:

Using exclusively the following contexts:

Primary context (your thesis document): {context}

Supplemental context (scientific articles from top-tier conferences): {external_context}

Compose a scientifically structured text on the topic below, suitable for submission to top-tier scientific conferences. The text should:

Clearly and accurately explain the concept or technique.

Include relevant analysis or discussion based on the context.

Be well-structured and coherent.

Specific topic: {question}

Example Workflow

Populate the directories with relevant PDFs.
Create the database:
```
python create_database.py
```

Query for a topic:

python query_database.py "Explain the implications of quantum computing in AI."

Requirements

Python 3.8+
Libraries: See requirements.txt. Includes:
- langchain_community
- langchain_openai
- PyPDFLoader
- Chroma

Troubleshooting

Permission Errors: Ensure you have sufficient permissions to delete or create directories.
Corrupt PDF Files: Validate your PDFs manually if they fail to load.
API Issues: Check your OpenAI API key and rate limits.

Future Enhancements

Add support for non-PDF formats.
Improve error handling for database operations.
Extend the prompt template for different scientific disciplines.

Author: Afonso Carvalho
License: MIT

Name		Name	Last commit message	Last commit date
Latest commit History 16 Commits
chroma		chroma
chroma_articles		chroma_articles
data/articles		data/articles
create_database.py		create_database.py
query_data.py		query_data.py
readme.md		readme.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Scientific Paper Generator and Database Management

Features

Installation

Usage

Code Structure

Prompt Template

Example Workflow

Requirements

Troubleshooting

Future Enhancements

About

Releases 1

Packages

Languages

afonsomartingo/Paper_RAG

Folders and files

Latest commit

History

Repository files navigation

Scientific Paper Generator and Database Management

Features

Installation

Usage

Code Structure

Prompt Template

Example Workflow

Requirements

Troubleshooting

Future Enhancements

About

Resources

Stars

Watchers

Forks

Releases 1

Packages 0

Languages

Packages