-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add splat simplify opt for various ops #6851
Conversation
Subscribe to Label Action
This issue or pull request has been labeled: "cranelift", "isle"
Thus the following users have been cc'd because of the following labels:
To subscribe or unsubscribe from this label, edit the |
namely, popcnt, smin, umin, smax, umax, sadd_sat, uadd_sat, ssub_sat & usub_sat
and avg_round as well
@afonso360 The following two wasmtime spec tests are failing for the PR:
They are failing for the following operators: Can you please help me find the root cause of this failure? Do I need to tweak the opts for the above mentioned operators somehow to make the tests pass? |
This ended up being quite a long reply, sorry! I just wanted to note down the steps that I use to debug these sort of stuff, it might be useful in the future! (There's a TLDR at the bottom) So, running the tests individually shows us what went wrong. These test suites contain a bunch of individual tests so there is more than one failure. Here's what I got:
Starting with the easy ones: This sometimes happens, we allow an operation for vectors but not for scalars. Usually because they don't make sense for scalars, I think in this case it's just that no one has felt the need to implement it yet. I think we should be able to remove the optimizations for both of these opcodes, and the rest of the Now onto the hard one. I ran the testsuite individually with I got this function out (I've slightly cleaned it up): function u0:22(i64 vmctx, i64, i32, i32) -> i8x16 fast {
block0(v0: i64, v1: i64, v2: i32, v3: i32):
v5 = ireduce.i8 v2
v6 = splat.i8x16 v5
v7 = sshr v6, v3
return v7
} It's really neat that it matches our optimization pretty much exactly! After optimizations it looks like this: block0(v0: i64, v1: i64, v2: i32, v3: i32):
v5 = ireduce.i8 v2
v8 = sshr v5, v3
v9 = splat.i8x16 v8
return v9 So, now that we have a small reproducible example of what is going wrong, lets debug it. My first step was compiling both the pre-optimization function and the post optimization function. And something immediately went wrong! The optimized version compiles cleanly! Which is really weird, since I got that out of the optimizer from the first version. I haven't seen this before but I do have an idea of what might be going wrong. We don't print the type of the Changing the optimization source to explicitly request the
This is a really weird issue, and it probably should have been caught earlier in the pipeline instead of letting it propagate all the way to the backend which really just confuses things. This relied a lot on intuition and also knowledge of how cranelift works, so don't worry if it's hard to figure it out! We really should have had better error messages to help people out on these issues. TLDR:
|
because they don't support scalars.
Some of the wasmtime spec tests were failing without this
Thanks @afonso360 for the detailed explanation and showing how to debug such failures in the future 👍. I've made the suggested changes now. Let's hope the checks pass. |
Checks passed. Please review. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks great! Thanks for working through this.
I've had a look at the rest of our opcodes and I don't think we are missing any of them (at least for integers)!
I think it's possible to do a transform from splat+icmp
into icmp+bmask+splat
, but leaving that for a future PR is also OK! I'm also not sure if this is a good transform to do. Since I think it would be the only one that actually adds instructions rather than reducing them.
I've also run our fuzzer on this PR for a couple of hours and nothing has come up.
I assume you'll merge this, @afonso360? This PR looks great to me, thank you @gurry! I feel like all these bugs should have been caught by the CLIF verifier, which I guess means that the verifier did not run after the egraph pass. Would that have helped more quickly pin down the causes of these bugs? Actually I'm confused about when the verifier runs and whether we already have a way to turn it on. Maybe we should enable it for all filetests? |
And the very next thing in my queue of GitHub notifications was #6855 which says exactly that, thank you @afonso360 😆 |
because it doesn't support scalars.
* Add splat simplify opt for iadd, isub, imul, ineg & iabs * Add splat simplify opt for smulhi & hmulhi * Add splat simplify opt for band, bor & bxor * Add splat simplify opt for a few more ops namely, popcnt, smin, umin, smax, umax, sadd_sat, uadd_sat, ssub_sat & usub_sat * Add splat simplify opt for bnot * Add splat simplify opt for shift and rotate ops and avg_round as well * Remove splat simplify opt for certain ops because they don't support scalars. * Add lane_type to splat opts for shift/rotate Some of the wasmtime spec tests were failing without this * Remove splat simplify opt for avg_round because it doesn't support scalars.
Fixes #6828