-
Notifications
You must be signed in to change notification settings - Fork 597
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Overhaul ustring hash collision handling (#2870)
This patch ensures that two ustrings will never have the same hash value. The basic strategy is that the usual 64 bit hash of the characters provides an initial guess. If no other string has already used that hash (as is almost always the case), we use it. However, if a different string already took that hash value, we rehash, and repeat until we find an unused hash. We maintain a "reverse hash map" that relates hashes to ustrings to help figure this out efficiently. (Note: This adds at least another 16 bytes per unique ustring to maintain the reverse map. Do we care? Probably few applications are in the business of creating more than a few million distinct strings.) Because we are recording the collisions as they happen, this also greatly simplifies (and reduces to trivially low cost) the recently-added ustring::hash_collisions() function, because there is no complicated search when it is called.
- Loading branch information
Showing
2 changed files
with
113 additions
and
41 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters