-
Notifications
You must be signed in to change notification settings - Fork 231
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
default string hash produces a lot of consecutive ints for (gensym) symbols #1520
Comments
A simple fix for this is to just add some extra hash mixing at the end of the string hash. We already do this for tuples and other structures to improve the hashing quality - for strings we are still just using a very basic DJB hash. |
Pushed 5d1bd8a that uses our existing |
It's my own fault but some of my tests seem to have been relying on internal details and they now fail with 5d1bd8a. As a specific example, as might be expected, the return value of things like With the change I now get:
before the change this was:
Time to rewrite some tests perhaps (^^; Update: tests have been updated. Found and fixed some tests that were problematic for other reasons so perhaps there was a net gain :) |
Running Bauble's test suite before and after the new hash function:
Not bad! |
Hurray for ⌊(2^32)⁄𝜙⌋. |
While debugging #1519 I noticed that the symbol cache is very densely packed with gensyms, which made the hash collision that breaks
symbol/slice
far more likely than it "should" be. (The application that actually triggered that bug had to traverse 4072 full buckets during the lookup in question, even though I think the resizing tries to keep the cache only half full.)A cheap workaround for this particular issue would be to hash strings in reverse order, which seems worth it given how much of the symbol cache consists of gensyms during compilation, but I didn't want to do that in case someone proposes an altogether better hash function.
The text was updated successfully, but these errors were encountered: