Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

perf: Occasionally sort database file by frequency counts #10

Open
YodaEmbedding opened this issue Nov 19, 2023 · 0 comments
Open

perf: Occasionally sort database file by frequency counts #10

YodaEmbedding opened this issue Nov 19, 2023 · 0 comments

Comments

@YodaEmbedding
Copy link
Owner

Currently, we seek to a target entry's row and update the count / access time in-place.

If many entries are accessed, this can "fragment" the database, making the sort no longer be roughly one-pass O(n) and instead turn into O(n log n). (Tim sort's best-case complexity on partially sorted data is O(n).)

Possible optimization: Every once in a while, "defragment" the database with correctly sorted entries. This means that future frece prints (which read/sort everything) will be faster since the data will be already mostly sorted.

Downsides: increases code complexity, and I don't think it will benefit most users (including me). One alternative is running a manual frece print command to just write a new sorted database. :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant