Skip to content

svanomm/super-search

Repository files navigation

super-search

Semantic indexing tool for PDF repositories

Motivation

This project is motivated by existing text searching tools like Agent Ransack (FileLocator), which can quickly keyword search through thousands of files

To-Do List:

  • Use PyMuPDF for faster file reads
  • Add support for text file reading
  • Implement custom chunking algorithm
  • Switch to approximate NN search (pynndescent) for fast queries
  • Add BM25 search as a complement to semantic search (in progress)
  • Test multiprocessing for faster PDF reads (in progress)
  • Add a GUI
  • Create Windows executable

About

Semantic indexing tool for PDF repositories

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages