super-search

Semantic indexing tool for PDF repositories

Motivation

This project is motivated by existing text searching tools like Agent Ransack (FileLocator), which can quickly keyword search through thousands of files

To-Do List:

~~Use PyMuPDF for faster file reads~~
~~Add support for text file reading~~
~~Implement custom chunking algorithm~~
~~Switch to approximate NN search (pynndescent) for fast queries~~
Add BM25 search as a complement to semantic search (in progress)
Test multiprocessing for faster PDF reads (in progress)
Add a GUI
Create Windows executable

Name		Name	Last commit message	Last commit date
Latest commit History 39 Commits
Python Code		Python Code
data/tests		data/tests
scratch		scratch
.gitattributes		.gitattributes
.gitignore		.gitignore
README.md		README.md
requirements.txt		requirements.txt
test file.do		test file.do
test file.txt		test file.txt
test file.xyz		test file.xyz

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

super-search

Motivation

To-Do List:

About

Releases

Packages

Languages

svanomm/super-search

Folders and files

Latest commit

History

Repository files navigation

super-search

Motivation

To-Do List:

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages