Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Could ack speed up with a per-directory cache of some kind? #333

Open
qpwo opened this issue Jan 29, 2021 · 3 comments
Open

Could ack speed up with a per-directory cache of some kind? #333

qpwo opened this issue Jan 29, 2021 · 3 comments
Labels

Comments

@qpwo
Copy link

qpwo commented Jan 29, 2021

I hope this isn't too naive but I couldn't find anything on it. I have a directory with hundreds of MBs of source code and every search takes a long time. Would it be possible to make a saved cache index for a directory and update it for updated files when you make a new search?

@petdance
Copy link
Collaborator

I've played with that in my head for quite a while, but never actually tried it.

The indexing tools at https://beyondgrep.com/more-tools/ may be ideas about what you could use. Also, if you're looking for functions and variables a lot, using ctags may do 90% of what you're looking for.

@petdance
Copy link
Collaborator

petdance commented Feb 3, 2021

Aside from the cache question: How many hundreds of MBs do you have, and how long are searches taking? One thing we've had trouble with over the years is that some folks have systems where ack takes far longer than we would expect it to, and we haven't been able to figure out why. I'm wondering if you might be in that situation as well. See #194 for example.

@n1vux
Copy link
Contributor

n1vux commented Sep 28, 2021

Tuning the OS filecache reservation and/or switching from spinning iron-oxide to SSD can greatly improve read speed.

I have doubts about one tool having both cached-index mode and grep mode, and i wouldn't want to give up extemporaneous usage of ack.
Alas the existing tools that build inverted indexes in Perl do not appear to under maintenance (plucene, and relatives).
(There's a newer one that uses BDB or PG, uh, no.)

I've installed swish-e for both cached-index search and natural-language spanning-lines use cases.
While the name expands to "Simple Web Indexing System for Humans - Enhanced" and it rather expects you'll expose it locally with CGI and Nginx or Apache, it has a commandline interface and API too. (It's in Ubuntu package manager and probably available for any platform of interest. Has Perl API that seems newer than any P?Luc(y|ene).) While it's designed for natural language, it's recommended self-demo is searching it's source code.

I've even experimented with scanning files matched by swish-e with ack:
swish-e -x '%p\n' -w 'constance near5 snow near40 doane' | egrep -v '^#' | ack -x -C30 -iw 'snow|doane|constan[tc]e?|daniel' --pager='less -iR'

Before i can make full use of it i need to figure out how to capture metadata about a document along with its contents, and what's my needed metadata schema ... ugh. I should remember from 20+ years ago in late Web1.0 when i was buying a bleeding edge indexing engine that Information Retrieval is NOT as easy as it looks!.

@petdance petdance changed the title Cache index for directory? Could ack speed up with a per-directory cache of some kind? Jun 6, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants