This project implements a small scientific articles search engine. For now it processes only the abstract provided by the articles.
- It can scrape new data from arxiv or use existing data provided in this repository.
- Processes data (tokenizing, lemmatizing/stemming, creates an inverted index).
- Supports 3 searching algorithms:
- Boolean Retrieval
- Vector Space Model
- Okapi BM25
- Users can put basic filters while searching for documents:
- By author name
- By date of publishment
- Users can install the
requirements.txt
in their local environment and execute themain.py
script. This script automatically downloads all the importantnltk data
in users$HOME
directory. - Users can use the containerized version of the application from my docker repository by pulling the
information_retrieval
tag.
If someone would like to contribute to the project or suggest improvements, they can submit a PR or open an Issue.