Information Retrieval Project

What this project does?

This project implements a small scientific articles search engine. For now it processes only the abstract provided by the articles.

It can scrape new data from arxiv or use existing data provided in this repository.
Processes data (tokenizing, lemmatizing/stemming, creates an inverted index).
Supports 3 searching algorithms:
1. Boolean Retrieval
2. Vector Space Model
3. Okapi BM25
Users can put basic filters while searching for documents:
1. By author name
2. By date of publishment

Users can install the requirements.txt in their local environment and execute the main.py script. This script automatically downloads all the important nltk data in users $HOME directory.
Users can use the containerized version of the application from my docker repository by pulling the information_retrieval tag.

If someone would like to contribute to the project or suggest improvements, they can submit a PR or open an Issue.

Name		Name	Last commit message	Last commit date
Latest commit History 104 Commits
.devcontainer		.devcontainer
data		data
functions		functions
.gitignore		.gitignore
Dockerfile		Dockerfile
LICENSE		LICENSE
Pipfile		Pipfile
Pipfile.lock		Pipfile.lock
README.md		README.md
information-retrieval-project.pyproj		information-retrieval-project.pyproj
information-retrieval-project.sln		information-retrieval-project.sln
main.py		main.py
requirements.txt		requirements.txt