Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
md5 benchmark 15% speed by removing "slow LEA"
Remove a special case from the assembler that generates "Slow LEA" instructions that can execute poorly on Skylake and Haswell CPUs. The "slow LEA" is one that uses base+index+offset operands. These instructions "have increased latency and reduced dispatch port choices compared to other LEAs." Links: - https://software.intel.com/en-us/node/544484 - http://stackoverflow.com/questions/21288214/what-are-fast-lea-and-slow-lea-unit-in-the-microarchitecture-of-intes-cpu Resolves raptorjit/raptorjit#54. Here is an example of a "slow LEA" instruction that was emitted before: lea eax, [rbx+rdx+1234] The new replacement avoids the bad case: lea eax, [rbx+1234] add rax, rdx On Haswell and Skylake CPUs this improves the md5 benchmark performance by ~15%. The difference in cycles (time) correlates closely with the difference in slow LEA instructions executed (as reported by the CPU performance monitoring unit.) Before: Performance counter stats for './luajit ../../luajit-test-cleanup/bench/md5.lua 20000': 8,166,721,155 instructions # 2.02 insn per cycle 4,039,743,481 cycles 633,604,974 uops_issued_slow_lea 1.683641631 seconds time elapsed After: 8,463,581,471 instructions # 2.45 insn per cycle 3,454,061,396 cycles 340,049,934 uops_issued_slow_lea
- Loading branch information