Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Experimenting with different hash functions #992

Draft
wants to merge 7 commits into
base: main
Choose a base branch
from

Conversation

prvyk
Copy link
Contributor

@prvyk prvyk commented Feb 2, 2025

I'm not sure what's the motivation behind the current hash function. It's not a function I know (my CS education is missing a lot granted), it seems a bit odd (looking at half the array but adding in the length? Why 40343 and not another prime?). Maybe some cache locality thing? git blame says it's the same starting from the FASTER initial commit, and I couldn't find any documentation in the research PDFs about it. It reminds me of FNV but isn't FNV.

I've implemented FNV for comparison purposes. Running Resp.benchmark locally seems to show FNV has improved performance**. Maybe there's a better benchmark, or a different consideration for choosing this hash? [Edit: Doing actually working FNV was slower than current function, trying a different variant with just unchecked.]

  • The goal of this PR is not merging code, more asking a question while having code available for testing/benchmarking. Even if one assumes a different hash function would be better, there'll still be work testing all the options.

** FNV implementation uses unchecked. Merely adding unchecked to the existing function seems to improve performance a bit, but not as much as FNV while locally testing.

@prvyk prvyk marked this pull request as draft February 2, 2025 19:28
prvyk added 3 commits February 2, 2025 21:51
Sometimes faster, sometimes slower, overall a tiny bit slower compared to unchecked function.
@prvyk prvyk changed the title [DRAFT] fnv hash for benchmarking comparison Experimenting with different hash functions Feb 2, 2025
@badrishc
Copy link
Contributor

badrishc commented Feb 2, 2025

Speed is important, but more important is how well/evenly the hash function distributes the keys in buckets. The current hash function used in Garnet is derived from research in my group from a long time back, and showed pretty even distribution for e.g., YCSB workloads, at high speeds. To see this, you would run FasterYcsbBenchmark in the FASTER repo and invoke the DumpDistribution method.

We're happy to consider alternatives if there are better options out there, along both the speed and even spread metrics.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants