Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[wasm] Add limited constant propagation to the jiterpreter for ldc.i4 and ldloca #99706

Merged
merged 1 commit into from
Mar 21, 2024

Conversation

kg
Copy link
Member

@kg kg commented Mar 13, 2024

Right now if an interpreter opcode stores a constant (ldc.i4) or effectively-constant (ldloca) expression into an interpreter local, we have to read it back from memory before using it later in a trace. However, there are many scenarios where it would be profitable to not do this, and instead embed the constant into the trace where the load would otherwise happen. This furthermore enables optimizing out null checks in some cases, since if the address being null-checked is constant, we can determine statically whether it is null and omit the runtime check entirely.

The System.Runtime.Intrinsics.Tests.Perf_Vector128Of(UInt16).LessThanOrEqualAnyBenchmark perf case is a good example for this, because both types of cprop occur for it in my test harness.

First, ldloca constant propagation eliminates a memory load and a null-check of the load (because the load was a pointer), for the following pair of interp opcodes:

 IR_003e: ldloca.s       [128 <- nil], 32
 IR_0041: ldind_off_add_mul_imm.u2 [128 <- 128 40], 0,2

image
In the above example we perform a ldloca, computing pLocals + 32 and store it into pLocals[128].
Then we load pLocals[128] to use it as the base address for an indirect-load-with-offset, which requires null checking the loaded pointer, and jiterp stores the null-checked pointer into what I call cknull_ptr. Then the null-checked pointer can finally be used to compute the address for the indirect load, and perform the load.
With constant propagation, the null check disappears and instead of reading pLocals[128] back, we just compute the address again from scratch. From outside the trace this is unobservable, since we still wrote the result of the ldloca opcode into the locals like we were supposed to, we're just not reading it.

Second, the constant propagation makes the benchmark loop termination condition faster. The loop count is too big to fit into the interpreter's compare-with-immediate superinstructions, but jiterp is able to propagate the constant into the comparison, so it still gets a little faster:

 IR_00a4: ldc.i4         [120 <- nil], 50000000
 IR_00a8: blt.i4.s       [nil <- 16 120], IR_0019

image

This is still a somewhat fragile optimization (if the jiterpreter's constant analysis breaks, this will break stuff), so there's a new runtime option we can use to turn it off. However, the jiterpreter already uses constant analysis for various things (SIMD requires it), so if it's broken, we want to know about it... and this should surface any brokenness. The constant analysis is conservative by design (we discard all of our analysis info any time we cross a branch target).

Fixing VectorNN.GetElementUnsafe<T> to not generate extremely-hard-to-optimize code is left as an exercise for the reader.

Copy link
Member

@BrzVlad BrzVlad left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not a fan of having optimizations like these in the jiterpreter, but it is fine for now.

@kg
Copy link
Member Author

kg commented Mar 21, 2024

Not a fan of having optimizations like these in the jiterpreter, but it is fine for now.

Will add an item to the todo list to undo this once we can do most/all of it in interp instead.

@kg kg merged commit 309185f into dotnet:main Mar 21, 2024
111 checks passed
@github-actions github-actions bot locked and limited conversation to collaborators Apr 21, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants