-
Notifications
You must be signed in to change notification settings - Fork 186
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix rehash functionality #2266
Fix rehash functionality #2266
Conversation
Hello Matthew Alp, thanks for contributing a PR to our project! We use the Oracle Contributor Agreement to make the copyright of contributions clear. We don't have a record of you having signed this yet, based on your email address matthew -(dot)- alp -(at)- shopify -(dot)- com. You can sign it at that link. If you think you've already signed it, please comment below and we'll check. |
Matthew Alp has signed the Oracle Contributor Agreement (based on email address matthew -(dot)- alp -(at)- shopify -(dot)- com) so can contribute to this repository. |
Could you untag the spec and commit that change to show which spec now passes?
|
hashNode.execute(PackedArrayStrategy.getKey(store, n), compareByIdentity)); | ||
} | ||
} | ||
PackedArrayStrategy.promoteToBuckets(getContext(), hash, store, size); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is fairly inefficient, it means rehash
will represent a Hash with #entries<=3 less efficiently than before. It would be best to keep it as a packed array representation if possible.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We weren't sure the best way to solve this - multiple nested loops? Could get pretty verbose when exploded? Or a temporary set to see if a value has already been included? Seemed like that was basically the bucket strategy anyway?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A nested loop seems OK, there is already no @ExplodeLoop
on this method anyway.
I think what's important is not so much the speed of rehash
but that the resulting Hash still uses the more efficient packed array representation for other calls later on.
entry.getKey(), | ||
entry.getHashed(), | ||
bucketEntry.getKey(), | ||
bucketEntry.getHashed())) { // If the bucket contains a single entry, we never set the flag |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you avoid this duplicated call by changing the loop above to be a do/while
loop?
@@ -890,6 +885,7 @@ protected RubyHash rehashBuckets(RubyHash hash, | |||
Arrays.fill(entries, null); | |||
|
|||
Entry entry = hash.firstInSequence; | |||
Entry previousEntry = null; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
previousEntry
typically means the previous Entry in lookup, not in sequence, could you clarify by renaming it to previousInSequence
?
bucketEntry.getKey(), | ||
bucketEntry.getHashed())) { // If the bucket contains a single entry, we never set the flag | ||
if (previousEntry != null) { | ||
previousEntry.setNextInSequence(entry.getNextInSequence()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What about entry.getNextInSequence().getPreviousInSequence()
, won't that still refer to entry
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should use BucketsStrategy.removeFromSequenceChain
here, then it should be fine.
@@ -13,7 +13,7 @@ Bug fixes: | |||
* Fix `Thread.handle_interrupt` to defer non-pure interrupts until the end of the `handle_interrupt` block (#2219). | |||
* Clear and restore errinfo on entry and normal return from methods in C extensions (#2227). | |||
* Fix extra whitespace in squiggly heredoc with escaped newline (#2238, @wildmaples and @norswap). | |||
|
|||
* Fix `Hash#rehash` to remove duplicate keys after modifications (#2266, @MattAlp) | |||
Compatibility: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Broken empty line here.
I think it would be nice if the spec was updated to also cover the bucket hash rehash if it's not currently covering it. |
We made progress on this today. |
@MattAlp do you have time to work on this soon? Not a problem if you don't! Don't feel pressured! But we'd probably have to move it on a bit ourselves if you don't soon. |
I worry about baking in TruffleRuby implementation details to the spec. Two better ideas:
|
I wouldn't worry about that. Consider ruby/spec a test suite, not a specification. |
Working on #2263-
Hash#rehash
currently recalculates hashes, but does not remove duplicate as described in the spec. This PR addresses this by unlinking duplicate entries within a hash when a hash collision and equal values are detected.