-
Notifications
You must be signed in to change notification settings - Fork 12.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Simplify HashMap Bucket interface #40561
Conversation
(rust_highfive has picked a reviewer for you, use r? to override) |
src/libstd/collections/hash/table.rs
Outdated
@@ -126,9 +126,10 @@ unsafe impl<K: Send, V: Send> Send for RawTable<K, V> {} | |||
unsafe impl<K: Sync, V: Sync> Sync for RawTable<K, V> {} | |||
|
|||
pub struct RawBucket<K, V> { | |||
hash: *mut HashUint, | |||
_hash: *mut HashUint, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why was this renamed? If it was to disambiguate with the method, I think hash_
would be more appropriate than _hash
, since the standard practice is to use prefix _
to denote "unused."
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's for disambiguation. I think we can come up with a better name instead, moving the _ to the end hurts the eye. Like hash_start
, hash_base
, first_hash
?
src/libstd/collections/hash/table.rs
Outdated
fn first_bucket_raw(&self) -> RawBucket<K, V> { | ||
let hashes_size = self.capacity * size_of::<HashUint>(); | ||
let pairs_size = self.capacity * size_of::<(K, V)>(); | ||
fn raw_bucket(&self) -> RawBucket<K, V> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems like a worse name, since it isn't clear which raw bucket is returned. Maybe first_raw_bucket
or raw_bucket_first
or stick with the original first_bucket_raw
(presumably paired with bucket_at_raw
).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll switch callers to raw_bucket_at(0)
.
pair: self.pair.offset(count), | ||
_marker: marker::PhantomData, | ||
} | ||
unsafe fn hash(&self) -> *mut HashUint { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you add a comment noting that these functions are unsafe because you can safely create out-of-bounds buckets?
src/libstd/collections/hash/table.rs
Outdated
// We use *const to ensure covariance with respect to K and V | ||
pair: *const (K, V), | ||
_pair: *const (K, V), | ||
idx: usize, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If this is always within [0..table_capacity)
, could you add a comment mentioning that.
src/libstd/collections/hash/table.rs
Outdated
RevMoveBuckets { | ||
raw: raw_bucket.offset(self.capacity as isize), | ||
hashes_end: raw_bucket.hash, | ||
raw: self.raw_bucket_at(self.capacity()), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You said in your description that bucket indices are in [0..table_capacity)
, but this is one past that. Did you mean [0..table_capacity]
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is intentionally set to one past the end for the reverse style iterator. I will add a comment line on that.
src/libstd/collections/hash/table.rs
Outdated
marker: marker::PhantomData, | ||
} | ||
} | ||
} | ||
|
||
|
||
impl<'a, K, V> Iterator for RawBuckets<'a, K, V> { | ||
type Item = RawBucket<K, V>; | ||
type Item = (*mut HashUint, *mut (K, V)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems odd to have an iterator named RawBuckets
that doesn't return RawBucket
s.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point, hopefully that won't affect performance negatively.
src/libstd/collections/hash/table.rs
Outdated
if self.elems_left == 0 { | ||
return None; | ||
} | ||
loop { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: could there be a blank line before this loop?
src/libstd/collections/hash/table.rs
Outdated
(self.elems_left, Some(self.elems_left)) | ||
} | ||
} | ||
impl<'a, K, V> ExactSizeIterator for RawBuckets<'a, K, V> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: could there be a blank line before this impl
.
@bors: delegate=gereeter |
✌️ @gereeter can now approve this pull request |
Thanks for the review @gereeter. I'll updated the PR later. |
☔ The latest upstream changes (presumably #40538) made this pull request unmergeable. Please resolve the merge conflicts. |
Rebased. |
☔ The latest upstream changes (presumably #40806) made this pull request unmergeable. Please resolve the merge conflicts. |
e190879
to
39802b2
Compare
(gentle ping) |
src/libstd/collections/hash/map.rs
Outdated
@@ -503,7 +498,7 @@ fn robin_hood<'a, K: 'a, V: 'a>(bucket: FullBucketMut<'a, K, V>, | |||
loop { | |||
displacement += 1; | |||
let probe = bucket.next(); | |||
debug_assert!(probe.index() != idx_end); | |||
debug_assert!(probe.index() != start_index); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this invariant not as strict as the old one? Either way it's ok. You could add a comment that this assertion is simplified.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's not. I'll updat the code to do the same check as before.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Excellent. Please make a rebase
3451af6
to
fdb88fe
Compare
Updated with @pczarn review. |
* Store capacity_mask instead of capacity * Move bucket index into RawBucket * Bucket index is now always within [0..table_capacity) * Clone RawTable using RawBucket * Simplify iterators by moving logic into RawBuckets * Make retain aware of the number of elements
@bors: r=pczarn |
📌 Commit f07ebd6 has been approved by |
@bors: delegate=pczarn |
✌️ @pczarn can now approve this pull request |
⌛ Testing commit f07ebd6 with merge b181396... |
💔 Test failed - status-appveyor |
… On Wed, Apr 5, 2017 at 2:08 AM, bors ***@***.***> wrote:
💔 Test failed - status-appveyor
<https://ci.appveyor.com/project/rust-lang/rust/build/1.0.2769>
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#40561 (comment)>, or mute
the thread
<https://github.com/notifications/unsubscribe-auth/AAD95G7JvqimISUpYtdiDu2ak4yAt2-Dks5rs0v6gaJpZM4Menm5>
.
|
Simplify HashMap Bucket interface > Simplify HashMap Bucket interface > > * Store capacity_mask instead of capacity > * Move bucket index into RawBucket > * Valid bucket index is now always within [0..table_capacity) > * Simplify iterators by moving logic into RawBuckets > * Clone RawTable using RawBucket > * Make retain aware of the number of elements The idea was to put idx in RawBucket instead of the other Bucket types and simplify next() and prev() as much as possible. The rest was a side-effect of that change, except maybe the last 2. This change makes iteration and other next/prev() heavy operations noticeably faster. Clone is way faster. ``` ➜ hashmap2 git:(adapt) ✗ cargo benchcmp pre:: adp:: bench.txt name pre:: ns/iter adp:: ns/iter diff ns/iter diff % clone_10_000 74,364 39,736 -34,628 -46.57% grow_100_000 8,343,553 8,233,785 -109,768 -1.32% grow_10_000 817,825 723,958 -93,867 -11.48% grow_big_value_100_000 18,418,979 17,906,186 -512,793 -2.78% grow_big_value_10_000 1,219,242 1,103,334 -115,908 -9.51% insert_1000 74,546 58,343 -16,203 -21.74% insert_100_000 6,743,770 6,238,017 -505,753 -7.50% insert_10_000 798,079 719,123 -78,956 -9.89% insert_1_000_000 275,215,605 266,975,875 -8,239,730 -2.99% insert_int_bigvalue_10_000 1,517,387 1,419,838 -97,549 -6.43% insert_str_10_000 316,179 278,896 -37,283 -11.79% insert_string_10_000 770,927 747,449 -23,478 -3.05% iter_keys_100_000 386,099 333,104 -52,995 -13.73% iterate_100_000 387,320 355,707 -31,613 -8.16% lookup_100_000 206,757 193,063 -13,694 -6.62% lookup_100_000_unif 219,366 193,180 -26,186 -11.94% lookup_1_000_000 206,456 205,716 -740 -0.36% lookup_1_000_000_unif 659,934 629,659 -30,275 -4.59% lru_sim 20,194,334 18,442,149 -1,752,185 -8.68% merge_shuffle 1,168,044 1,063,055 -104,989 -8.99% ``` Note 2: I may have messed up porting the diff, let's see what CI says.
Simplify HashMap Bucket interface > Simplify HashMap Bucket interface > > * Store capacity_mask instead of capacity > * Move bucket index into RawBucket > * Valid bucket index is now always within [0..table_capacity) > * Simplify iterators by moving logic into RawBuckets > * Clone RawTable using RawBucket > * Make retain aware of the number of elements The idea was to put idx in RawBucket instead of the other Bucket types and simplify next() and prev() as much as possible. The rest was a side-effect of that change, except maybe the last 2. This change makes iteration and other next/prev() heavy operations noticeably faster. Clone is way faster. ``` ➜ hashmap2 git:(adapt) ✗ cargo benchcmp pre:: adp:: bench.txt name pre:: ns/iter adp:: ns/iter diff ns/iter diff % clone_10_000 74,364 39,736 -34,628 -46.57% grow_100_000 8,343,553 8,233,785 -109,768 -1.32% grow_10_000 817,825 723,958 -93,867 -11.48% grow_big_value_100_000 18,418,979 17,906,186 -512,793 -2.78% grow_big_value_10_000 1,219,242 1,103,334 -115,908 -9.51% insert_1000 74,546 58,343 -16,203 -21.74% insert_100_000 6,743,770 6,238,017 -505,753 -7.50% insert_10_000 798,079 719,123 -78,956 -9.89% insert_1_000_000 275,215,605 266,975,875 -8,239,730 -2.99% insert_int_bigvalue_10_000 1,517,387 1,419,838 -97,549 -6.43% insert_str_10_000 316,179 278,896 -37,283 -11.79% insert_string_10_000 770,927 747,449 -23,478 -3.05% iter_keys_100_000 386,099 333,104 -52,995 -13.73% iterate_100_000 387,320 355,707 -31,613 -8.16% lookup_100_000 206,757 193,063 -13,694 -6.62% lookup_100_000_unif 219,366 193,180 -26,186 -11.94% lookup_1_000_000 206,456 205,716 -740 -0.36% lookup_1_000_000_unif 659,934 629,659 -30,275 -4.59% lru_sim 20,194,334 18,442,149 -1,752,185 -8.68% merge_shuffle 1,168,044 1,063,055 -104,989 -8.99% ``` Note 2: I may have messed up porting the diff, let's see what CI says.
⌛ Testing commit f07ebd6 with merge 1e10204... |
@bors retry - prioritizing rollup |
The idea was to put idx in RawBucket instead of the other Bucket types and simplify next() and prev() as much as possible. The rest was a side-effect of that change, except maybe the last 2.
This change makes iteration and other next/prev() heavy operations noticeably faster. Clone is way faster.
Note 2: I may have messed up porting the diff, let's see what CI says.