NOTE: Example codes in the README.md are written for Docker Compose v2
.
Here, we show example prerequisites installation codes for Ubuntu. If prerequisites are already installed in your environment, please skip this section. If you want to install it in another environment, please follow the official documentation.
- Docker and Docker Compose: Install Docker Engine
- NVIDIA Container Toolkit (nvidia-docker2): Installation Guide
# Set up the repository
$ sudo apt update
$ sudo apt install ca-certificates curl gnupg lsb-release
$ sudo mkdir -p /etc/apt/keyrings
$ curl -fsSL https://download.docker.com/linux/ubuntu/gpg | sudo gpg --dearmor -o /etc/apt/keyrings/docker.gpg
$ echo \
"deb [arch=$(dpkg --print-architecture) signed-by=/etc/apt/keyrings/docker.gpg] https://download.docker.com/linux/ubuntu \
$(lsb_release -cs) stable" | sudo tee /etc/apt/sources.list.d/docker.list > /dev/null
# Install Docker and Docker Compose
$ sudo apt update
$ sudo apt install docker-ce docker-ce-cli containerd.io docker-compose-plugin
If sudo docker run hello-world
works, the installation succeeded.
NOTE: JSON file created by this step is already included in the repo. So this step is optional.
Please run the following command inside the "core" container (This command generates papers.json
under the data directory). This process will take 10-20 minutes.
$ poetry run python3 src/scripts/parse_cvf_page.py
Please run the following command from inside the "core" container.
$ poetry run python3 src/scripts/download_papers.py
We use the following APIs to generate summaries:
- Mathpix API: To convert PDF into Latex format text.
- OpenAI API: To use LLM (GPT).
To use the above APIs, we need to set the following environmental variables:
- MATHPIX_API_ID
- MATHPIX_API_KEY
- OPENAI_API_KEY
So please run the following command to create an envs.env
file and replace sample values with actual ones.
% cp environments/envs.env.sample environments/envs.env
Values written in the envs.env
file are automatically loaded by docker and stored as environmental variables in the container.
Here convert PDF to Latex format using Mathpix API. This makes it possible to extract the original structure of papers.
Please run the following command from inside the "core" container.
$ poetry run python3 src/scripts/convert_to_latex.py
Now we are ready to generate summaries by using LLM (GPT). Please run the following command from inside the "core" container.
% poetry run python3 src/scripts/generate_summaries.py