-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
riscv64: Improve codegen for icmp
#7203
Conversation
Skip generating extra sign-extension instructions in this case because the materialization of a constant will implicitly sign-extend into the entire register.
Try to reserve `lower_*` as taking a thing and producing a `*Reg`. Rename this helper to `is_nonzero_cmp` to represent how it's testing for nonzero and producing a comparison.
* `FCmp` -> `FloatCompare` * `emit_fcmp` -> `fcmp_to_float_compare` * `lower_fcmp` -> `lower_float_compare` Make some room for upcoming integer comparison functions.
This is only used by one lowering so inline its definition directly.
This commit is the first in a few steps to reimplement bytecodealliance#6112 and bytecodealliance#7113. The `Icmp` pseudo-instruction is removed here and necessary functionality is all pushed into ISLE lowerings. This enables deleting the `lower_br_icmp` helper in `emit.rs` as it's no longer necessary, meaning that all conditional branches should ideally be generated in lowering rather than pseudo-instructions to benefit from various optimizations. Currently the lowering is the bare-bones minimum to get things working. This involved creating a helper to lower an `IntegerCompare` into an `XReg` via negation/swapping args/etc. In generated code this removes branches and makes `icmp` a straight-line instruction for non-128-bit arguments.
Use the `x0` register which is always zero where possible and avoid unnecessary `xor`s against this register.
Subscribe to Label Action
This issue or pull request has been labeled: "cranelift", "isle"
Thus the following users have been cc'd because of the following labels:
To subscribe or unsubscribe from this label, edit the |
Matches the sign-extension that happens at the hardware layer.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is very nice! Thank you! ❤️ Having the commits separated neatly really helped review this.
It all looks correct to me. But I'm not entirely sure I didn't miss anything, so a second pair of eyes would be helpful!
I'm also going to run the fuzzer on this for a few days to check if anything comes up. (Although I don't think that should prevent merging this)
@@ -2136,6 +2145,9 @@ | |||
(decl pure partial val_already_extended (ExtendOp Value) bool) | |||
(rule 0 (val_already_extended _ v @ (value_type $I64)) $true) | |||
|
|||
;; Immediates are always sign extended to their full 64-bits | |||
(rule 1 (val_already_extended (ExtendOp.Signed) (iconst _)) $true) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think this is entirely correct, but it probably doesn't matter.
Due to #6922 we don't generate all 32bit constants from lui+addi
so there are some that fallback to the general load from const pool path.
That path uses the ld
instruction to load the full 8 bytes, but we never sign extend that value while lowering. There is a chance that in some of the 32bit constants that we miss we may not sign extend them.
I've built a small test to check which constants fail to generate, and the lowest one is 0x7FFF_F801
so all the negatives are covered. So I don't think there's anything wrong here.
It might be worth ensuring that the value is sign extended before emitting it to the constant pool anyway. Or switching to lw
that only loads 4 bytes and sign extends the load result. Just to make sure that this doesn't happen in the future.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the good eye here, I'll dig into this a bit more and possibly omit this now that there's more constant-related optimizations for comparisons anyway. I'll probably leave this out for an initial attempt.
Fuzzing has turned up the following testcase:
|
Do you have native RISC-V hardware that you're fuzzing on? Or are you running fuzzing through QEMU emulation? If the latter I think I'll try to throw some compute at this as well to help weed out more too. |
I've been running this natively, but honestly it's not much faster than running it on QEMU on my desktop, so it might help. This example took like 8 hours to show up. For comparison, Natively I'm getting about 50 execs/s/core * 4 cores. (with Edit: Also, if you're going to run the fuzzer, you might want to pull #7208 in, that triggered fairly quickly for me. |
These aren't correct and will need updating
Ok so I tried a bit of fuzzing locally and it found a number of crashes but I was unfortunately unable to reproduce outside of the fuzzer to diagnose further. The test cases all reproduce on In any case I remove the Additionally I removed the rules that modify an immediate with |
Wow, that's interesting, I had noticed the In any case, I'm going to start fuzzing these latest changes. |
crashes.tar.gz are the crashes that fuzzing found locally for me |
This one showed up today:
|
Ooh nice catch, more signed-vs-unsigned handling that needed fixing (fuzzing is great!) |
This has been fuzzing continuously for the past 36 hours without any crashes 🎉 I'm going to consider that good enough. |
This PR is another stab towards improving
icmp
along the lines of #6112 and #7113. This tries to draw on both of those PRs a bit to end up with something that's usable by the majority of the backend for use with branches.I've attempted to break things up piecemeal here so the commits in theory are understandable commit-by-commit. This additionally doesn't implement every single optimization I could think of, mostly just a few that seemed to be the low-hanging fruit ones.