Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Auto merge of #40454 - djzin:fast-swap, r=sfackler
speed up mem::swap I would have thought that the mem::swap code didn't need an intermediate variable precisely because the pointers are guaranteed never to alias. And.. it doesn't! It seems that llvm will also auto-vectorize this case for large structs, but alas it doesn't seem to have all the aliasing info it needs and so will add redundant checks (and even not bother with autovectorizing for small types). Looks like a lot of performance could still be gained here, so this might be a good test case for future optimizer improvements. Here are the current benchmarks for the simd version of mem::swap; the timings are in cycles (code below) measured with 10 iterations. The timings for sizes > 32 which are not a multiple of 8 tend to be ever so slightly faster in the old code, but not always. For large struct sizes (> 1024) the new code shows a marked improvement. \* = latest commit † = subtracted from other measurements | arr_length | noop<sup>†</sup> | rust_stdlib | simd_u64x4\* | simd_u64x8 |------------------|------------|-------------------|-------------------|------------------- 8|80|90|90|90 16|72|177|177|177 24|32|76|76|76 32|68|188|112|188 40|32|80|60|80 48|32|84|56|84 56|32|108|72|108 64|32|108|72|76 72|80|350|220|230 80|80|350|220|230 88|80|420|270|270 96|80|420|270|270 104|80|500|320|320 112|80|490|320|320 120|72|528|342|342 128|48|360|234|234 136|72|987|387|387 144|80|1070|420|420 152|64|856|376|376 160|68|804|400|400 168|80|1060|520|520 176|80|1070|520|520 184|32|464|228|228 192|32|504|228|228 200|32|440|248|248 208|72|987|573|573 216|80|1464|220|220 224|48|852|450|450 232|72|1182|666|666 240|32|428|288|288 248|32|428|308|308 256|80|860|770|770 264|80|1130|820|820 272|80|1340|820|820 280|80|1220|870|870 288|72|1227|804|804 296|72|1356|849|849
- Loading branch information