Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix rehash functionality #2266

Closed
wants to merge 2 commits into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -13,7 +13,7 @@ Bug fixes:
* Fix `Thread.handle_interrupt` to defer non-pure interrupts until the end of the `handle_interrupt` block (#2219).
* Clear and restore errinfo on entry and normal return from methods in C extensions (#2227).
* Fix extra whitespace in squiggly heredoc with escaped newline (#2238, @wildmaples and @norswap).

* Fix `Hash#rehash` to remove duplicate keys after modifications (#2266, @MattAlp)
Compatibility:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Broken empty line here.


* Implemented `$LOAD_PATH.resolve_feature_path`.
Expand Down
39 changes: 29 additions & 10 deletions src/main/java/org/truffleruby/core/hash/HashNodes.java
Original file line number Diff line number Diff line change
Expand Up @@ -845,6 +845,7 @@ protected int sizePackedArray(RubyHash hash) {
public abstract static class InternalRehashNode extends RubyContextNode {

@Child private HashingNodes.ToHash hashNode = HashingNodes.ToHash.create();
@Child private CompareHashKeysNode compareHashKeysNode = new CompareHashKeysNode();

public static InternalRehashNode create() {
return InternalRehashNodeGen.create();
Expand All @@ -866,14 +867,8 @@ protected RubyHash rehashPackedArray(RubyHash hash,
final Object[] store = (Object[]) hash.store;
final int size = hash.size;

for (int n = 0; n < getLanguage().options.HASH_PACKED_ARRAY_MAX; n++) {
if (n < size) {
PackedArrayStrategy.setHashed(
store,
n,
hashNode.execute(PackedArrayStrategy.getKey(store, n), compareByIdentity));
}
}
PackedArrayStrategy.promoteToBuckets(getContext(), hash, store, size);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is fairly inefficient, it means rehash will represent a Hash with #entries<=3 less efficiently than before. It would be best to keep it as a packed array representation if possible.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We weren't sure the best way to solve this - multiple nested loops? Could get pretty verbose when exploded? Or a temporary set to see if a value has already been included? Seemed like that was basically the bucket strategy anyway?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A nested loop seems OK, there is already no @ExplodeLoop on this method anyway.
I think what's important is not so much the speed of rehash but that the resulting Hash still uses the more efficient packed array representation for other calls later on.

rehashBuckets(hash, byIdentityProfile);

assert HashOperations.verifyStore(getContext(), hash);

Expand All @@ -890,6 +885,7 @@ protected RubyHash rehashBuckets(RubyHash hash,
Arrays.fill(entries, null);

Entry entry = hash.firstInSequence;
Entry previousEntry = null;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

previousEntry typically means the previous Entry in lookup, not in sequence, could you clarify by renaming it to previousInSequence?


while (entry != null) {
final int newHash = hashNode.execute(entry.getKey(), compareByIdentity);
Expand All @@ -900,14 +896,37 @@ protected RubyHash rehashBuckets(RubyHash hash,

if (bucketEntry == null) {
entries[index] = entry;
previousEntry = entry;
} else {
boolean encounteredDuplicateKey = false;
while (bucketEntry.getNextInLookup() != null) {
if (compareHashKeysNode.equalKeys(
compareByIdentity,
entry.getKey(),
entry.getHashed(),
bucketEntry.getKey(),
bucketEntry.getHashed())) {
encounteredDuplicateKey = true;
break;
}
bucketEntry = bucketEntry.getNextInLookup();
}
if (encounteredDuplicateKey || compareHashKeysNode.equalKeys(
compareByIdentity,
entry.getKey(),
entry.getHashed(),
bucketEntry.getKey(),
bucketEntry.getHashed())) { // If the bucket contains a single entry, we never set the flag
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you avoid this duplicated call by changing the loop above to be a do/while loop?

if (previousEntry != null) {
previousEntry.setNextInSequence(entry.getNextInSequence());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about entry.getNextInSequence().getPreviousInSequence(), won't that still refer to entry?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should use BucketsStrategy.removeFromSequenceChain here, then it should be fine.

}
hash.size--;
} else {
bucketEntry.setNextInLookup(entry);
previousEntry = entry;
}

bucketEntry.setNextInLookup(entry);
}

entry = entry.getNextInSequence();
}

Expand Down