Skip to content

A high-performance text indexing engine for searching large documents or corpora which I am currently working on

License

Notifications You must be signed in to change notification settings

mohamedrhoulam/rustsearch

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

41 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

rustsearch

A high-performance text indexing engine for searching large documents or corpora implemented in Rust and inspired by the C++ Pisa text search engine.

Overview

Workflow

  • The following workflow is inspired from the PISA Index Building Pipeline (Mallia et al., 2019 ):

  • Collection Processing

    • Load documents
    • Extract contents
    • Tokenize
    • Filter (Stemming + Stopword removal)
  • Forward Index

    • Term Lexicon
    • Document Lexicon
  • Inverted Index

    • Document reordering
    • Compression
  • Index Compression

  • Query Pocessing

About

A high-performance text indexing engine for searching large documents or corpora which I am currently working on

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published