[wasm] Add limited constant propagation to the jiterpreter for ldc.i4 and ldloca #99706
+117
−16
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Right now if an interpreter opcode stores a constant (
ldc.i4
) or effectively-constant (ldloca
) expression into an interpreter local, we have to read it back from memory before using it later in a trace. However, there are many scenarios where it would be profitable to not do this, and instead embed the constant into the trace where the load would otherwise happen. This furthermore enables optimizing out null checks in some cases, since if the address being null-checked is constant, we can determine statically whether it is null and omit the runtime check entirely.The
System.Runtime.Intrinsics.Tests.Perf_Vector128Of(UInt16).LessThanOrEqualAnyBenchmark
perf case is a good example for this, because both types of cprop occur for it in my test harness.First, ldloca constant propagation eliminates a memory load and a null-check of the load (because the load was a pointer), for the following pair of interp opcodes:
In the above example we perform a ldloca, computing
pLocals + 32
and store it intopLocals[128]
.Then we load
pLocals[128]
to use it as the base address for an indirect-load-with-offset, which requires null checking the loaded pointer, and jiterp stores the null-checked pointer into what I callcknull_ptr
. Then the null-checked pointer can finally be used to compute the address for the indirect load, and perform the load.With constant propagation, the null check disappears and instead of reading
pLocals[128]
back, we just compute the address again from scratch. From outside the trace this is unobservable, since we still wrote the result of the ldloca opcode into the locals like we were supposed to, we're just not reading it.Second, the constant propagation makes the benchmark loop termination condition faster. The loop count is too big to fit into the interpreter's compare-with-immediate superinstructions, but jiterp is able to propagate the constant into the comparison, so it still gets a little faster:
This is still a somewhat fragile optimization (if the jiterpreter's constant analysis breaks, this will break stuff), so there's a new runtime option we can use to turn it off. However, the jiterpreter already uses constant analysis for various things (SIMD requires it), so if it's broken, we want to know about it... and this should surface any brokenness. The constant analysis is conservative by design (we discard all of our analysis info any time we cross a branch target).
Fixing
VectorNN.GetElementUnsafe<T>
to not generate extremely-hard-to-optimize code is left as an exercise for the reader.