Add MethodInstance roots to resolve compilation time regression #50204
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
In issue #50082, @benlorenz discovered that my PR #41099 caused large slowdowns in compilation of some methods.
The cause of the issue is that storing
MethodInstances
intoLineInfoNodes
, rather than just the name of the method, led to many more entries in theMethod
's roots table. This table is used for serialization/compression and deserialization/decompression of the contents of each of several type-inferredCodeInfo
objects associated with theMethod
. Each time aMethodInstance
is encoded for aMethod
, it is compared against all existing roots in the table. SinceMethodInstance
s are less likely to be shared, we often end up with worst-case quadratic performance. This aspect has been a recurring bottleneck going back to 2016: #14556Barring a switch to a hash table for storing methods roots, we need to make the roots list smaller.
Based on @vtjnash's suggestion that compiler-generated objects like
MethodInstance
s probably shouldn't be associated with theMethod
, I duplicated the roots table interface currently found onMethod
s intoMethodInstance
s. However, there is another issue here - sometimes theCodeInstance
is not available during decompression, so the associatedMethodInstance
roots table would not be accessible. Thus, we would need to also encode the name as a root of theMethod
, which is always available. I customized the encoding scheme to store both, and to discard the name if theMethodInstance
is available. An additional wrinkle is thatMethodInstances
present in thecode
statements cannot be substituted with symbols, and so must still be in theMethod
root table. Thus, this encoding is only done for elements ofLineInfoNodes
.I've also modified the stacktrace lookup to accommodate, though it may need further refinement. I also realized that the
CodeInstance
wasn't being passed into decompression fromretrieve_ir_for_inlining
, so it is now being passed in.The sysimage size in the master just before this PR is 178,865 KB. With this PR, it is 186,599 KB; ~4% increase. Remains to be seen if it solves the bottleneck Ben reported.
C/++ isn't my strong suit so I'm sure there are issues with how I've approached this; I also don't know if anything I've done disturbs the internals too much. My goal was to make sure we had at least 1 possible solution ready, so that reverting #41099 wasn't the only choice.