Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[wasm] Optimize constant i2/i4 shuffles in jiterpreter #86470

Merged
merged 1 commit into from
Jun 1, 2023

Conversation

kg
Copy link
Member

@kg kg commented May 18, 2023

(draft because blocked by #86469)
For known-constant shuffle vectors, the jiterpreter can transform the i2/i4 indices into a byte shuffle vector at JIT time and encode it directly into the trace. In my testing this speeds up span Reverse on chars a bit. Not sure if it's a good idea to do this, so feedback is appreciated.

@kg kg added arch-wasm WebAssembly architecture area-Codegen-Jiterpreter-mono labels May 18, 2023
@ghost
Copy link

ghost commented May 18, 2023

Tagging subscribers to 'arch-wasm': @lewing
See info in area-owners.md if you want to be subscribed.

Issue Details

(draft because blocked by #86469)
For known-constant shuffle vectors, the jiterpreter can transform the i2/i4 indices into a byte shuffle vector at JIT time and encode it directly into the trace. In my testing this speeds up span Reverse on chars a bit. Not sure if it's a good idea to do this, so feedback is appreciated.

Author: kg
Assignees: -
Labels:

arch-wasm, area-Codegen-Jiterpreter-mono

Milestone: -

@ghost ghost assigned kg May 18, 2023
@kg kg force-pushed the wasm-jiterp-constant-shuffles branch from 2c855bb to 5e5f0de Compare May 19, 2023 00:31
@kg kg marked this pull request as ready for review May 19, 2023 14:55
@kg kg requested review from lewing and pavelsavara as code owners May 19, 2023 14:55
@kg kg requested review from BrzVlad and kotlarmilos May 19, 2023 14:55
@kg
Copy link
Member Author

kg commented May 19, 2023

@BrzVlad @kotlarmilos How do you feel about optimizing the constant shuffles this way? It seems like it might be best to do this in interp, but it's not clear to me how much work it would be to do the analysis there. The jiterpreter would still need to do the lowering, since on the C side and on non-wasm platforms we still want to generate the opcodes for I2/I4 shuffle. So the jiterp would need some way to know that the indices are constant and know what the indices are.

The relevant part of reverse chars looks like this:

dotnet.runtime.js:3 MONO_WASM: 326a0fc ldobj.vt 96 -> 128
dotnet.runtime.js:3 MONO_WASM: 326a104 simd_v128_ldc  -> 160
dotnet.runtime.js:3 MONO_WASM: 326a118 V128_I2_SHUFFLE 112, 160 -> 112
dotnet.runtime.js:3 MONO_WASM: 326a122 simd_v128_ldc  -> 160
dotnet.runtime.js:3 MONO_WASM: 326a136 V128_I2_SHUFFLE 128, 160 -> 128
dotnet.runtime.js:3 MONO_WASM: 326a140 stobj.vt.noref 88, 128
dotnet.runtime.js:3 MONO_WASM: 326a148 stobj.vt.noref 96, 112

Maybe we could define new SHUFFLE_CONSTANT opcodes that the jiterp consumes, and generate them by doing a peephole optimization?

@kg kg added the NO-MERGE The PR is not ready for merge yet (see discussion for detailed reasons) label May 19, 2023
@kg
Copy link
Member Author

kg commented May 19, 2023

Rebasing onto #86506 since there would be a merge conflict. Timings with both applied:

measurement main #86506 both PRs
Span, Reverse bytes 0.0213ms 0.0139ms 0.0126ms
Span, Reverse chars 0.0418ms 0.0256ms 0.0228ms

EDIT: I'll note that from reading v8's source code, they have optimizations that kick in when they can detect a constant indices vector, so it makes sense that we see a speedup here.

@kg kg force-pushed the wasm-jiterp-constant-shuffles branch from 5e5f0de to b5977a6 Compare May 19, 2023 16:40
@kg kg requested a review from vargaz as a code owner May 19, 2023 16:52
@BrzVlad
Copy link
Member

BrzVlad commented May 19, 2023

I strongly suggest any kind of constant tracking to be done within the interpreter.

@kg
Copy link
Member Author

kg commented May 19, 2023

I strongly suggest any kind of constant tracking to be done within the interpreter.

OK. I'll leave this no merge until we figure out how we want to handle it, and we can remove the other constant tracking when we do that.

@kg
Copy link
Member Author

kg commented Jun 1, 2023

My recollection of the conversations I've had about this is as follows:

  • We want to heavily limit how much the jiterpreter does this category of optimization, but narrowly scoped cases like this (the immediate is from a preceding opcode) are OK
  • In the future we want to introduce a variant of the SIMD opcodes that accepts an immediate - we need it for ExtractLane/ReplaceLane anyway, so it would be natural to apply it to this case as well
  • Once we have 'immediate' SIMD opcodes, we can introduce immediate versions of shuffle and then the jiterpreter can be updated to consume those opcodes instead of detecting the constant shuffle itself
    I'll probably rebase this soon and land it since I'm not aware of any reason to block it at present.

Introduce builder v128_const method
v8 doesn't optimize splats so use the enormous encoding for v128 zero
Fix fast memset for nonzero values
Detect constant shuffle vectors for i2/i4 shuffles and expand them to byte shuffle vectors at JIT time
Also optimize i1 shuffles
@kg kg force-pushed the wasm-jiterp-constant-shuffles branch from acd933e to 07e8544 Compare June 1, 2023 08:42
@kg kg merged commit d675add into dotnet:main Jun 1, 2023
@ghost ghost locked as resolved and limited conversation to collaborators Jul 2, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
arch-wasm WebAssembly architecture NO-MERGE The PR is not ready for merge yet (see discussion for detailed reasons)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants