Skip to content

Commit

Permalink
Auto merge of #92103 - Kobzol:stable-hash-skip-zero-bytes, r=the8472
Browse files Browse the repository at this point in the history
Do not hash leading zero bytes of i64 numbers in Sip128 hasher

I was poking into the stable hasher, trying to improve its performance by compressing the number of hashed bytes. First I was experimenting with LEB128, but it was painful to implement because of the many assumptions that the SipHasher makes, so I tried something simpler - just ignoring leading zero bytes. For example, if an 8-byte integer can fit into a 4-byte integer, I will just hash the four bytes.

I wonder if this could produce any hashing ambiguity. Originally I thought so, but then I struggled to find any counter-example where this could cause different values to have the same hash. I'd be glad for any examples that could be broken by this (there are some ways of mitigating it if that would be the case). It could happen if you had e.g. 2x `u8` vs 1x `u16` hashed after one another in two different runs, but that can also happen now, without this "trick". And with collections, it should be fine, because the length is included in their hash.

I gathered some statistics for common values used in the `clap` benchmark. I observed that especially `i64` often had very low values, so I started with that type, let's see what perf does on CI.

There are some tradeoffs that we can try:
1) What types to use this optimization for? `u64`, `u32`, `u16`? Locally it was a slight loss for `u64`, I noticed that its values are often quite large.
2) What byte sizes to check? E.g. we can only distinguish between `u64`/`u32` or `u64`/`u8` instead of `u64`/`u32`/`u16`/`u8` to reduce branching (with `i64` it seemed to be better to go all the way down to `u8` locally though).

(The macro was introduced because I expect that I will be trying out this "trick" for different types).

Can you please schedule a perf. run? Thanks.

r? `@the8472`
  • Loading branch information
bors committed Jan 5, 2022
2 parents b03c504 + 65a3279 commit 936ce3d
Show file tree
Hide file tree
Showing 2 changed files with 18 additions and 4 deletions.
18 changes: 16 additions & 2 deletions compiler/rustc_data_structures/src/sip128.rs
Original file line number Diff line number Diff line change
Expand Up @@ -409,6 +409,20 @@ impl SipHasher128 {
}
}

macro_rules! dispatch_value {
($target: expr, $value:expr) => {
let value = $value;
#[allow(unreachable_patterns)]
#[allow(overflowing_literals)]
match value {
0..=0xFF => $target.short_write(value as u8),
0x100..=0xFFFF => $target.short_write(value as u16),
0x10000..=0xFFFFFFFF => $target.short_write(value as u32),
_ => $target.short_write(value as u64),
}
};
}

impl Hasher for SipHasher128 {
#[inline]
fn write_u8(&mut self, i: u8) {
Expand All @@ -422,7 +436,7 @@ impl Hasher for SipHasher128 {

#[inline]
fn write_u32(&mut self, i: u32) {
self.short_write(i);
dispatch_value!(self, i);
}

#[inline]
Expand Down Expand Up @@ -452,7 +466,7 @@ impl Hasher for SipHasher128 {

#[inline]
fn write_i64(&mut self, i: i64) {
self.short_write(i as u64);
dispatch_value!(self, i as u64);
}

#[inline]
Expand Down
4 changes: 2 additions & 2 deletions src/test/debuginfo/function-names.rs
Original file line number Diff line number Diff line change
Expand Up @@ -37,7 +37,7 @@
// Const generic parameter
// gdb-command:info functions -q function_names::const_generic_fn.*
// gdb-check:[...]static fn function_names::const_generic_fn_bool<false>();
// gdb-check:[...]static fn function_names::const_generic_fn_non_int<{CONST#fe3cfa0214ac55c7}>();
// gdb-check:[...]static fn function_names::const_generic_fn_non_int<{CONST#3fcd7c34c1555be6}>();
// gdb-check:[...]static fn function_names::const_generic_fn_signed_int<-7>();
// gdb-check:[...]static fn function_names::const_generic_fn_unsigned_int<14>();

Expand Down Expand Up @@ -76,7 +76,7 @@
// Const generic parameter
// cdb-command:x a!function_names::const_generic_fn*
// cdb-check:[...] a!function_names::const_generic_fn_bool<false> (void)
// cdb-check:[...] a!function_names::const_generic_fn_non_int<CONST$fe3cfa0214ac55c7> (void)
// cdb-check:[...] a!function_names::const_generic_fn_non_int<CONST$3fcd7c34c1555be6> (void)
// cdb-check:[...] a!function_names::const_generic_fn_unsigned_int<14> (void)
// cdb-check:[...] a!function_names::const_generic_fn_signed_int<-7> (void)

Expand Down

0 comments on commit 936ce3d

Please sign in to comment.