-
Notifications
You must be signed in to change notification settings - Fork 12.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Small improvement to SipHasher #37312
Conversation
(rust_highfive has picked a reviewer for you, use r? to override) |
Please don't merge it yet. |
if length < needed { | ||
self.tail |= u8to64_le!(msg, 0, length) << 8 * self.ntail; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
tail isn't being updated in this function
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, too much copy pasting around I guess.
377cebb
to
04912a0
Compare
Should be good to review now. |
self.state.v3 ^= self.tail; | ||
S::c_rounds(&mut self.state); | ||
self.state.v0 ^= self.tail; | ||
self.ntail = 0; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This branch previously didn't update self.tail
, but now it is. To confirm, that's ok?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's ok, as it's marking it has 0 valid bytes at the end.
out |= (load_int_le!($buf, t+$i, u16) as u64) << t*8; | ||
t += 2; | ||
} | ||
if t < $len { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This math here is a little hard to follow ($len
first, then + 1
, then +0
), I wonder if this could be refactored slightly? Also, shouldn't this handle the case that $len == 8
or perhaps debug assert t == $len
at the end?
One way to structure this might be:
let mut t = $len;
if t >= 4 {
out = ...;
t -= 4;
}
if t >= 2 {
out = ...;
t -= 2;
}
if t >= 1 {
out = ...;
t -= 1;
}
debug_assert!(t == 0);
Also, this is getting to the point where it probably doesn't need to be a macro, so perhaps this could become an #[inline]
function?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good points. I think a variable renaming will help here, and also making it inline. I just have to check the code gen.
Counting down from $len isn't really desirable because then we need some additional arithmetic to calculate the shifts.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, the len = 8 case was intentionally left out for optimization. I added comments and asserts as requested.
The next step is to get the compiler to in-line the calls write_u8(0xff) and write_usize(xx.len()), this will unlock an even bigger speed bump. Unfortunately the compiler is very reluctant to do so. So I'm trying to work around it. It's not pretty but it's the best we can do until we remove these delimiters completely (when not necessary). |
04912a0
to
dd6f8fe
Compare
Ok, I got the part mentioned in the previous comment in the PR as well. This should make stdlib sip13 sort-of competitive for strings/slices. I'll run some hash-rs benchmarks tomorrow. |
Scratch the |
I wonder if we can use an specialization trait just for collecitons::HashMap, to avoid writing the delimiters for &str and &[u/iN] when not needed. Then we can heap the benefits without waiting for the RFC, that's probably nowhere near acceptance. Edit: I made this over lunch, works great: https://godbolt.org/g/iaaplk |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice wins! Can you be sure to add a few tests here as well for the various corner cases uncovered? Basically alternate between write + write_usize
vs just write
to ensure the output is the same.
if fill == 8 { | ||
self.tail = unsafe { load_int_le!(msg, 0, u64) }; | ||
} else { | ||
self.tail |= unsafe { u8to64_le(msg, 0, fill) } << 8 * self.ntail; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you put parens around 8 * self.ntail
just to clarify that order of operations?
dd6f8fe
to
a319d13
Compare
Should be good now, I amended the commit with your comments. |
@bors: r+ |
📌 Commit a319d13 has been approved by |
Small improvement to SipHasher Very small but constant improvement, the objective is to lower latency for u16, u32 and small strings. CC #35735 ``` ➜ siphash-bench git:(master) ✗ sudo nice -n -20 target/release/foo-648738a54f390643 --bench | tee benches.txt [sudo] password for arthurprs: running 62 tests test _same ... bench: 0 ns/iter (+/- 0) test _warmup ... bench: 0 ns/iter (+/- 0) test rust_siphash13::int_u16 ... bench: 12 ns/iter (+/- 1) test rust_siphash13::int_u32 ... bench: 14 ns/iter (+/- 0) test rust_siphash13::int_u64 ... bench: 11 ns/iter (+/- 1) test rust_siphash13::int_u8 ... bench: 11 ns/iter (+/- 1) test rust_siphash13::slice::_10 ... bench: 18 ns/iter (+/- 1) test rust_siphash13::slice::_100 ... bench: 42 ns/iter (+/- 2) test rust_siphash13::slice::_11 ... bench: 19 ns/iter (+/- 1) test rust_siphash13::slice::_12 ... bench: 21 ns/iter (+/- 3) test rust_siphash13::slice::_2 ... bench: 16 ns/iter (+/- 2) test rust_siphash13::slice::_200 ... bench: 68 ns/iter (+/- 3) test rust_siphash13::slice::_3 ... bench: 17 ns/iter (+/- 3) test rust_siphash13::slice::_4 ... bench: 18 ns/iter (+/- 1) test rust_siphash13::slice::_5 ... bench: 19 ns/iter (+/- 4) test rust_siphash13::slice::_6 ... bench: 19 ns/iter (+/- 1) test rust_siphash13::slice::_7 ... bench: 20 ns/iter (+/- 1) test rust_siphash13::slice::_8 ... bench: 16 ns/iter (+/- 1) test rust_siphash13::slice::_9 ... bench: 18 ns/iter (+/- 2) test rust_siphash13::str_::_10 ... bench: 18 ns/iter (+/- 1) test rust_siphash13::str_::_100 ... bench: 41 ns/iter (+/- 2) test rust_siphash13::str_::_11 ... bench: 19 ns/iter (+/- 1) test rust_siphash13::str_::_12 ... bench: 20 ns/iter (+/- 2) test rust_siphash13::str_::_2 ... bench: 16 ns/iter (+/- 1) test rust_siphash13::str_::_200 ... bench: 68 ns/iter (+/- 3) test rust_siphash13::str_::_3 ... bench: 17 ns/iter (+/- 1) test rust_siphash13::str_::_4 ... bench: 18 ns/iter (+/- 2) test rust_siphash13::str_::_5 ... bench: 19 ns/iter (+/- 6) test rust_siphash13::str_::_6 ... bench: 20 ns/iter (+/- 5) test rust_siphash13::str_::_7 ... bench: 23 ns/iter (+/- 1) test rust_siphash13::str_::_8 ... bench: 15 ns/iter (+/- 1) test rust_siphash13::str_::_9 ... bench: 17 ns/iter (+/- 1) test sip1b::int_u16 ... bench: 10 ns/iter (+/- 1) test sip1b::int_u32 ... bench: 9 ns/iter (+/- 1) test sip1b::int_u64 ... bench: 12 ns/iter (+/- 1) test sip1b::int_u8 ... bench: 7 ns/iter (+/- 0) test sip1b::slice::_10 ... bench: 12 ns/iter (+/- 1) test sip1b::slice::_100 ... bench: 33 ns/iter (+/- 2) test sip1b::slice::_11 ... bench: 13 ns/iter (+/- 0) test sip1b::slice::_12 ... bench: 12 ns/iter (+/- 1) test sip1b::slice::_2 ... bench: 10 ns/iter (+/- 0) test sip1b::slice::_200 ... bench: 62 ns/iter (+/- 2) test sip1b::slice::_3 ... bench: 10 ns/iter (+/- 1) test sip1b::slice::_4 ... bench: 9 ns/iter (+/- 0) test sip1b::slice::_5 ... bench: 10 ns/iter (+/- 1) test sip1b::slice::_6 ... bench: 10 ns/iter (+/- 0) test sip1b::slice::_7 ... bench: 11 ns/iter (+/- 0) test sip1b::slice::_8 ... bench: 11 ns/iter (+/- 1) test sip1b::slice::_9 ... bench: 12 ns/iter (+/- 1) test sip1b::str_::_10 ... bench: 15 ns/iter (+/- 1) test sip1b::str_::_100 ... bench: 37 ns/iter (+/- 3) test sip1b::str_::_11 ... bench: 16 ns/iter (+/- 1) test sip1b::str_::_12 ... bench: 14 ns/iter (+/- 1) test sip1b::str_::_2 ... bench: 13 ns/iter (+/- 1) test sip1b::str_::_200 ... bench: 67 ns/iter (+/- 5) test sip1b::str_::_3 ... bench: 14 ns/iter (+/- 2) test sip1b::str_::_4 ... bench: 12 ns/iter (+/- 1) test sip1b::str_::_5 ... bench: 13 ns/iter (+/- 1) test sip1b::str_::_6 ... bench: 13 ns/iter (+/- 0) test sip1b::str_::_7 ... bench: 16 ns/iter (+/- 1) test sip1b::str_::_8 ... bench: 14 ns/iter (+/- 1) test sip1b::str_::_9 ... bench: 15 ns/iter (+/- 1) test result: ok. 0 passed; 0 failed; 0 ignored; 62 measured ➜ siphash-bench git:(master) ✗ cargo benchcmp rust_siphash13:: sip1b:: benches.txt name rust_siphash13:: ns/iter sip1b:: ns/iter diff ns/iter diff % int_u16 12 10 -2 -16.67% int_u32 14 9 -5 -35.71% int_u64 11 12 1 9.09% int_u8 11 7 -4 -36.36% slice::_10 18 12 -6 -33.33% slice::_100 42 33 -9 -21.43% slice::_11 19 13 -6 -31.58% slice::_12 21 12 -9 -42.86% slice::_2 16 10 -6 -37.50% slice::_200 68 62 -6 -8.82% slice::_3 17 10 -7 -41.18% slice::_4 18 9 -9 -50.00% slice::_5 19 10 -9 -47.37% slice::_6 19 10 -9 -47.37% slice::_7 20 11 -9 -45.00% slice::_8 16 11 -5 -31.25% slice::_9 18 12 -6 -33.33% str_::_10 18 15 -3 -16.67% str_::_100 41 37 -4 -9.76% str_::_11 19 16 -3 -15.79% str_::_12 20 14 -6 -30.00% str_::_2 16 13 -3 -18.75% str_::_200 68 67 -1 -1.47% str_::_3 17 14 -3 -17.65% str_::_4 18 12 -6 -33.33% str_::_5 19 13 -6 -31.58% str_::_6 20 13 -7 -35.00% str_::_7 23 16 -7 -30.43% str_::_8 15 14 -1 -6.67% str_::_9 17 15 -2 -11.76% ``` from a modified hash-rs suite (preallocating maps and adding having slice/str variants) graph version: http://imgur.com/a/DuoI4 ``` ➜ hash-rs git:(rfc-extend-hasher) ✗ cargo benchcmp sip13:: sip13opt:: benches.txt name sip13:: ns/iter sip13opt:: ns/iter diff ns/iter diff % slice::mapcountdense_000000001 27,343 (36 MB/s) 26,401 (37 MB/s) -942 -3.45% slice::mapcountdense_000000002 28,982 (69 MB/s) 26,807 (74 MB/s) -2,175 -7.50% slice::mapcountdense_000000003 29,304 (102 MB/s) 27,360 (109 MB/s) -1,944 -6.63% slice::mapcountdense_000000004 30,411 (131 MB/s) 25,888 (154 MB/s) -4,523 -14.87% slice::mapcountdense_000000005 32,625 (153 MB/s) 27,486 (181 MB/s) -5,139 -15.75% slice::mapcountdense_000000006 34,920 (171 MB/s) 27,204 (220 MB/s) -7,716 -22.10% slice::mapcountdense_000000007 33,497 (208 MB/s) 28,330 (247 MB/s) -5,167 -15.43% slice::mapcountdense_000000008 31,153 (256 MB/s) 28,617 (279 MB/s) -2,536 -8.14% slice::mapcountdense_000000009 30,745 (292 MB/s) 29,666 (303 MB/s) -1,079 -3.51% slice::mapcountdense_000000010 31,509 (317 MB/s) 29,804 (335 MB/s) -1,705 -5.41% slice::mapcountdense_000000011 32,526 (338 MB/s) 30,520 (360 MB/s) -2,006 -6.17% slice::mapcountdense_000000012 32,981 (363 MB/s) 28,739 (417 MB/s) -4,242 -12.86% slice::mapcountdense_000000013 34,713 (374 MB/s) 30,348 (428 MB/s) -4,365 -12.57% slice::mapcountdense_000000014 34,635 (404 MB/s) 29,974 (467 MB/s) -4,661 -13.46% slice::mapcountdense_000000015 35,924 (417 MB/s) 30,584 (490 MB/s) -5,340 -14.86% slice::mapcountdense_000000016 31,939 (500 MB/s) 30,564 (523 MB/s) -1,375 -4.31% slice::mapcountdense_000000032 36,545 (875 MB/s) 34,833 (918 MB/s) -1,712 -4.68% slice::mapcountdense_000000064 44,691 (1432 MB/s) 43,912 (1457 MB/s) -779 -1.74% slice::mapcountdense_000000128 67,210 (1904 MB/s) 64,630 (1980 MB/s) -2,580 -3.84% slice::mapcountdense_000000256 110,320 (2320 MB/s) 108,713 (2354 MB/s) -1,607 -1.46% slice::mapcountsparse_000000001 29,686 (33 MB/s) 28,673 (34 MB/s) -1,013 -3.41% slice::mapcountsparse_000000002 32,073 (62 MB/s) 30,519 (65 MB/s) -1,554 -4.85% slice::mapcountsparse_000000003 33,184 (90 MB/s) 31,208 (96 MB/s) -1,976 -5.95% slice::mapcountsparse_000000004 34,344 (116 MB/s) 30,242 (132 MB/s) -4,102 -11.94% slice::mapcountsparse_000000005 34,536 (144 MB/s) 30,552 (163 MB/s) -3,984 -11.54% slice::mapcountsparse_000000006 35,791 (167 MB/s) 30,813 (194 MB/s) -4,978 -13.91% slice::mapcountsparse_000000007 36,773 (190 MB/s) 31,362 (223 MB/s) -5,411 -14.71% slice::mapcountsparse_000000008 33,101 (241 MB/s) 32,399 (246 MB/s) -702 -2.12% slice::mapcountsparse_000000009 34,025 (264 MB/s) 33,065 (272 MB/s) -960 -2.82% slice::mapcountsparse_000000010 34,755 (287 MB/s) 33,152 (301 MB/s) -1,603 -4.61% slice::mapcountsparse_000000011 35,682 (308 MB/s) 33,631 (327 MB/s) -2,051 -5.75% slice::mapcountsparse_000000012 36,422 (329 MB/s) 32,604 (368 MB/s) -3,818 -10.48% slice::mapcountsparse_000000013 37,561 (346 MB/s) 32,978 (394 MB/s) -4,583 -12.20% slice::mapcountsparse_000000014 38,476 (363 MB/s) 33,376 (419 MB/s) -5,100 -13.26% slice::mapcountsparse_000000015 39,202 (382 MB/s) 33,750 (444 MB/s) -5,452 -13.91% slice::mapcountsparse_000000016 34,898 (458 MB/s) 33,621 (475 MB/s) -1,277 -3.66% slice::mapcountsparse_000000032 39,767 (804 MB/s) 38,013 (841 MB/s) -1,754 -4.41% slice::mapcountsparse_000000064 47,810 (1338 MB/s) 46,332 (1381 MB/s) -1,478 -3.09% slice::mapcountsparse_000000128 64,519 (1983 MB/s) 63,322 (2021 MB/s) -1,197 -1.86% slice::mapcountsparse_000000256 101,042 (2533 MB/s) 99,754 (2566 MB/s) -1,288 -1.27% str_::mapcountdense_000000001 27,183 (36 MB/s) 24,007 (41 MB/s) -3,176 -11.68% str_::mapcountdense_000000002 28,940 (69 MB/s) 24,574 (81 MB/s) -4,366 -15.09% str_::mapcountdense_000000003 29,000 (103 MB/s) 24,687 (121 MB/s) -4,313 -14.87% str_::mapcountdense_000000004 29,822 (134 MB/s) 24,377 (164 MB/s) -5,445 -18.26% str_::mapcountdense_000000005 31,962 (156 MB/s) 25,184 (198 MB/s) -6,778 -21.21% str_::mapcountdense_000000006 32,218 (186 MB/s) 25,020 (239 MB/s) -7,198 -22.34% str_::mapcountdense_000000007 35,482 (197 MB/s) 27,705 (252 MB/s) -7,777 -21.92% str_::mapcountdense_000000008 28,643 (279 MB/s) 25,563 (312 MB/s) -3,080 -10.75% str_::mapcountdense_000000009 30,112 (298 MB/s) 26,773 (336 MB/s) -3,339 -11.09% str_::mapcountdense_000000010 31,554 (316 MB/s) 27,607 (362 MB/s) -3,947 -12.51% str_::mapcountdense_000000011 32,062 (343 MB/s) 27,770 (396 MB/s) -4,292 -13.39% str_::mapcountdense_000000012 32,258 (372 MB/s) 25,612 (468 MB/s) -6,646 -20.60% str_::mapcountdense_000000013 33,544 (387 MB/s) 26,908 (483 MB/s) -6,636 -19.78% str_::mapcountdense_000000014 34,681 (403 MB/s) 27,267 (513 MB/s) -7,414 -21.38% str_::mapcountdense_000000015 37,883 (395 MB/s) 30,226 (496 MB/s) -7,657 -20.21% str_::mapcountdense_000000016 30,299 (528 MB/s) 27,960 (572 MB/s) -2,339 -7.72% str_::mapcountdense_000000032 34,372 (930 MB/s) 32,736 (977 MB/s) -1,636 -4.76% str_::mapcountdense_000000048 38,610 (1243 MB/s) 36,437 (1317 MB/s) -2,173 -5.63% str_::mapcountdense_000000064 43,052 (1486 MB/s) 41,269 (1550 MB/s) -1,783 -4.14% str_::mapcountdense_000000128 64,059 (1998 MB/s) 62,007 (2064 MB/s) -2,052 -3.20% str_::mapcountdense_000000256 109,608 (2335 MB/s) 107,184 (2388 MB/s) -2,424 -2.21% str_::mapcountsparse_000000001 29,155 (34 MB/s) 26,151 (38 MB/s) -3,004 -10.30% str_::mapcountsparse_000000002 31,536 (63 MB/s) 27,787 (71 MB/s) -3,749 -11.89% str_::mapcountsparse_000000003 32,524 (92 MB/s) 27,861 (107 MB/s) -4,663 -14.34% str_::mapcountsparse_000000004 33,535 (119 MB/s) 27,585 (145 MB/s) -5,950 -17.74% str_::mapcountsparse_000000005 34,239 (146 MB/s) 27,520 (181 MB/s) -6,719 -19.62% str_::mapcountsparse_000000006 35,485 (169 MB/s) 27,437 (218 MB/s) -8,048 -22.68% str_::mapcountsparse_000000007 39,098 (179 MB/s) 30,465 (229 MB/s) -8,633 -22.08% str_::mapcountsparse_000000008 30,882 (259 MB/s) 29,215 (273 MB/s) -1,667 -5.40% str_::mapcountsparse_000000009 33,375 (269 MB/s) 29,301 (307 MB/s) -4,074 -12.21% str_::mapcountsparse_000000010 33,531 (298 MB/s) 29,008 (344 MB/s) -4,523 -13.49% str_::mapcountsparse_000000011 34,607 (317 MB/s) 29,800 (369 MB/s) -4,807 -13.89% str_::mapcountsparse_000000012 35,700 (336 MB/s) 28,380 (422 MB/s) -7,320 -20.50% str_::mapcountsparse_000000013 36,692 (354 MB/s) 29,350 (442 MB/s) -7,342 -20.01% str_::mapcountsparse_000000014 37,326 (375 MB/s) 29,285 (478 MB/s) -8,041 -21.54% str_::mapcountsparse_000000015 41,098 (364 MB/s) 33,073 (453 MB/s) -8,025 -19.53% str_::mapcountsparse_000000016 33,046 (484 MB/s) 30,717 (520 MB/s) -2,329 -7.05% str_::mapcountsparse_000000032 37,471 (853 MB/s) 35,542 (900 MB/s) -1,929 -5.15% str_::mapcountsparse_000000048 41,324 (1161 MB/s) 39,332 (1220 MB/s) -1,992 -4.82% str_::mapcountsparse_000000064 45,858 (1395 MB/s) 43,802 (1461 MB/s) -2,056 -4.48% str_::mapcountsparse_000000128 62,471 (2048 MB/s) 60,683 (2109 MB/s) -1,788 -2.86% str_::mapcountsparse_000000256 101,283 (2527 MB/s) 97,655 (2621 MB/s) -3,628 -3.58% ```
Very small but constant improvement, the objective is to lower latency for u16, u32 and small strings.
CC #35735
from a modified hash-rs suite (preallocating maps and adding having slice/str variants)
graph version: http://imgur.com/a/DuoI4