pgvector + python

What do you need to run this project?

Docker
A bunch of articles in html format
- Articles should be in /rag/articles
  - currently you have 3 folders:
    - arxiv
    - ieeexplore
    - scientificdirect

The folders represent the databases each article should be placed in the proper folder so parser.py knows how to read them.

In order to download the html files we opt for an semi automated process. The file download_html.js contains a small script that downloads the html of a page for your. With that you can:

Open an article(it has to be in one of the databases of the list) Open your Browser's console(i.e Ctrl + Shift + J to open Chrome's console) then copy and paste the js script mentioned

For instance if you go to https://ieeexplore.ieee.org/document/7465730 and do the steps above you will download the HTML file. After having the HTML file place it in the proper folder, in this case we should put into /rag/articles/ieeexplore

How to run this

Firstly, clone the repo using the command below

git clone https://github.com/bruno-braga/pgvector-docker.git
cd pgvector-docker
cp .env.example .env

Don't forget to set the OPENAI_API_KEY environment variable in the .env file

After, inside the folder, do:

docker-compose up -d

After, the containers are up:

docker ps

Check the python container id and then:

docker exec -it <python_container_id> /bin/bash
cd /app/rag
python populate_database.py

(Don't forget to add the articles in the proper folder according ot is database (i.e /articles/scientificdirect))

Folder structure & System "Architecture"

├── app/
│   ├── rag/
│   │   ├── articles/
│   │   │   └── scientificdirect/
│   │   ├── api.py
│   │   ├── app.py  
│   │   ├── db.py
│   │   ├── parser.py
│   │   ├── populate_database.py
│   │   ├── populatedb.png
│   │   ├── prompt_template.yml
│   │   ├── prompt.py
│   │   ├── README.md
│   │   ├── strategy.png
│   │   └── strategy_png.png
│   └── src/
│       ├── .pytest_cache/
│       ├── database/
│       ├── Http/
│       ├── models/
│       ├── services/
│       │   ├── embedding_service.py
│       │   ├── prompt_service.py
│       │   └── prompt_template.yml
│       ├── static/
│       ├── tests/
│       ├── api.py
│       └── runner.py

the /rag folder is where the first experiments have happened. In there right now there are two files there that are being used, they are:

populate_database.py
parser.py

populate_database.py is responsible for reading the /articles folder and populate the database. parser.py is responsible for ready the html files and extract the text from them.

/src is where the project itself lives, which is, a Flask app built with Vuejs for its frontend.

/database returns an db singleton /Http is our controllers(Blueprints in Flask) /models is where we have our models /services is where we have our services /static is where we have our static files /api.py is our main entry point /runner.py is our main runner

RAG Pipeline

The diagram below illustrates the RAG pipeline as a sequence diagram.

Name		Name	Last commit message	Last commit date
Latest commit History 66 Commits
app		app
.env.example		.env.example
.gitignore		.gitignore
Dockerfile		Dockerfile
RAG_pipe.png		RAG_pipe.png
README.md		README.md
docker-compose.yml		docker-compose.yml
download_html.js		download_html.js
initpgvector.sql		initpgvector.sql
requirements.txt		requirements.txt
system.png		system.png

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

pgvector + python

What do you need to run this project?

How to run this

Folder structure & System "Architecture"

RAG Pipeline

About

Releases

Packages

Languages

bruno-braga/pgvector-docker

Folders and files

Latest commit

History

Repository files navigation

pgvector + python

What do you need to run this project?

How to run this

Folder structure & System "Architecture"

RAG Pipeline

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages