Fix rehash functionality #2266

MattAlp · 2021-02-21T19:12:36Z

Working on #2263- Hash#rehash currently recalculates hashes, but does not remove duplicate as described in the spec. This PR addresses this by unlinking duplicate entries within a hash when a hash collision and equal values are detected.

graalvmbot · 2021-02-21T19:23:05Z

Hello Matthew Alp, thanks for contributing a PR to our project!

We use the Oracle Contributor Agreement to make the copyright of contributions clear. We don't have a record of you having signed this yet, based on your email address matthew -(dot)- alp -(at)- shopify -(dot)- com. You can sign it at that link.

If you think you've already signed it, please comment below and we'll check.

graalvmbot · 2021-02-22T12:43:14Z

Matthew Alp has signed the Oracle Contributor Agreement (based on email address matthew -(dot)- alp -(at)- shopify -(dot)- com) so can contribute to this repository.

eregon · 2021-02-22T12:52:53Z

Could you untag the spec and commit that change to show which spec now passes?

jt untag spec/ruby/core/hash/rehash_spec.rb

eregon · 2021-02-22T12:58:46Z

src/main/java/org/truffleruby/core/hash/HashNodes.java

-                            hashNode.execute(PackedArrayStrategy.getKey(store, n), compareByIdentity));
-                }
-            }
+            PackedArrayStrategy.promoteToBuckets(getContext(), hash, store, size);


This is fairly inefficient, it means rehash will represent a Hash with #entries<=3 less efficiently than before. It would be best to keep it as a packed array representation if possible.

We weren't sure the best way to solve this - multiple nested loops? Could get pretty verbose when exploded? Or a temporary set to see if a value has already been included? Seemed like that was basically the bucket strategy anyway?

A nested loop seems OK, there is already no @ExplodeLoop on this method anyway.
I think what's important is not so much the speed of rehash but that the resulting Hash still uses the more efficient packed array representation for other calls later on.

eregon · 2021-02-22T13:06:23Z

src/main/java/org/truffleruby/core/hash/HashNodes.java

+                            entry.getKey(),
+                            entry.getHashed(),
+                            bucketEntry.getKey(),
+                            bucketEntry.getHashed())) { // If the bucket contains a single entry, we never set the flag


Could you avoid this duplicated call by changing the loop above to be a do/while loop?

eregon · 2021-02-22T13:08:31Z

src/main/java/org/truffleruby/core/hash/HashNodes.java

@@ -890,6 +885,7 @@ protected RubyHash rehashBuckets(RubyHash hash,
            Arrays.fill(entries, null);

            Entry entry = hash.firstInSequence;
+            Entry previousEntry = null;


previousEntry typically means the previous Entry in lookup, not in sequence, could you clarify by renaming it to previousInSequence?

eregon · 2021-02-22T13:10:02Z

src/main/java/org/truffleruby/core/hash/HashNodes.java

+                            bucketEntry.getKey(),
+                            bucketEntry.getHashed())) { // If the bucket contains a single entry, we never set the flag
+                        if (previousEntry != null) {
+                            previousEntry.setNextInSequence(entry.getNextInSequence());


What about entry.getNextInSequence().getPreviousInSequence(), won't that still refer to entry?

I think we should use BucketsStrategy.removeFromSequenceChain here, then it should be fine.

chrisseaton · 2021-02-22T13:23:30Z

CHANGELOG.md

@@ -13,7 +13,7 @@ Bug fixes:
 * Fix `Thread.handle_interrupt` to defer non-pure interrupts until the end of the `handle_interrupt` block (#2219).
 * Clear and restore errinfo on entry and normal return from methods in C extensions (#2227).
 * Fix extra whitespace in squiggly heredoc with escaped newline (#2238, @wildmaples and @norswap).
-
+* Fix `Hash#rehash` to remove duplicate keys after modifications (#2266, @MattAlp)
 Compatibility:


Broken empty line here.

bjfish · 2021-02-22T14:59:26Z

Could you untag the spec and commit that change to show which spec now passes?
jt untag spec/ruby/core/hash/rehash_spec.rb 

I think it would be nice if the spec was updated to also cover the bucket hash rehash if it's not currently covering it.

chrisseaton · 2021-03-14T20:05:09Z

We made progress on this today.

chrisseaton · 2021-03-28T22:04:37Z

@MattAlp do you have time to work on this soon? Not a problem if you don't! Don't feel pressured! But we'd probably have to move it on a bit ourselves if you don't soon.

chrisseaton · 2021-04-04T01:16:07Z

I think it would be nice if the spec was updated to also cover the bucket hash rehash if it's not currently covering it.

I worry about baking in TruffleRuby implementation details to the spec.

Two better ideas:

put it in spec/truffle
rely on running the specs in a loop and using a chaos node to randomise storage to fix this

eregon · 2021-04-06T12:43:28Z

I worry about baking in TruffleRuby implementation details to the spec.

I wouldn't worry about that. Consider ruby/spec a test suite, not a specification.
From a test suite point of view most Ruby implementations have different storage strategies for small and large hashes, so one could just use e.g. a Hash with 10 elements to test that and mention in the spec description or a in comment it's specifically to test large hashes which might have a different representation.

MattAlp added 2 commits February 21, 2021 13:28

Fix rehash functionality

daf0076

Updated changelog

b6d6876

eregon added the shopify label Feb 22, 2021

graalvmbot added the oca-signed label Feb 22, 2021

eregon reviewed Feb 22, 2021

View reviewed changes

chrisseaton reviewed Feb 22, 2021

View reviewed changes

eregon linked an issue Feb 22, 2021 that may be closed by this pull request

Fix Hash#rehash #2263

Closed

chrisseaton mentioned this pull request Apr 4, 2021

Fix rehash functionality #2310

Merged

chrisseaton closed this Apr 4, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix rehash functionality #2266

Fix rehash functionality #2266

MattAlp commented Feb 21, 2021 •

edited

Loading

graalvmbot commented Feb 21, 2021

graalvmbot commented Feb 22, 2021

eregon commented Feb 22, 2021

eregon Feb 22, 2021

chrisseaton Feb 22, 2021

eregon Feb 22, 2021

eregon Feb 22, 2021

eregon Feb 22, 2021

eregon Feb 22, 2021

eregon Feb 22, 2021

chrisseaton Feb 22, 2021

bjfish commented Feb 22, 2021

chrisseaton commented Mar 14, 2021

chrisseaton commented Mar 28, 2021

chrisseaton commented Apr 4, 2021

eregon commented Apr 6, 2021

Fix rehash functionality #2266

Fix rehash functionality #2266

Conversation

MattAlp commented Feb 21, 2021 • edited Loading

graalvmbot commented Feb 21, 2021

graalvmbot commented Feb 22, 2021

eregon commented Feb 22, 2021

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

bjfish commented Feb 22, 2021

chrisseaton commented Mar 14, 2021

chrisseaton commented Mar 28, 2021

chrisseaton commented Apr 4, 2021

eregon commented Apr 6, 2021

MattAlp commented Feb 21, 2021 •

edited

Loading