-
Notifications
You must be signed in to change notification settings - Fork 4.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[wasm] Optimize constant i2/i4 shuffles in jiterpreter #86470
Conversation
Tagging subscribers to 'arch-wasm': @lewing Issue Details(draft because blocked by #86469)
|
2c855bb
to
5e5f0de
Compare
@BrzVlad @kotlarmilos How do you feel about optimizing the constant shuffles this way? It seems like it might be best to do this in interp, but it's not clear to me how much work it would be to do the analysis there. The jiterpreter would still need to do the lowering, since on the C side and on non-wasm platforms we still want to generate the opcodes for I2/I4 shuffle. So the jiterp would need some way to know that the indices are constant and know what the indices are. The relevant part of reverse chars looks like this:
Maybe we could define new SHUFFLE_CONSTANT opcodes that the jiterp consumes, and generate them by doing a peephole optimization? |
Rebasing onto #86506 since there would be a merge conflict. Timings with both applied:
EDIT: I'll note that from reading v8's source code, they have optimizations that kick in when they can detect a constant indices vector, so it makes sense that we see a speedup here. |
5e5f0de
to
b5977a6
Compare
I strongly suggest any kind of constant tracking to be done within the interpreter. |
OK. I'll leave this no merge until we figure out how we want to handle it, and we can remove the other constant tracking when we do that. |
My recollection of the conversations I've had about this is as follows:
|
Introduce builder v128_const method v8 doesn't optimize splats so use the enormous encoding for v128 zero Fix fast memset for nonzero values Detect constant shuffle vectors for i2/i4 shuffles and expand them to byte shuffle vectors at JIT time Also optimize i1 shuffles
acd933e
to
07e8544
Compare
(draft because blocked by #86469)
For known-constant shuffle vectors, the jiterpreter can transform the i2/i4 indices into a byte shuffle vector at JIT time and encode it directly into the trace. In my testing this speeds up span Reverse on chars a bit. Not sure if it's a good idea to do this, so feedback is appreciated.