Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add MethodInstance roots to resolve compilation time regression #50204

Draft
wants to merge 10 commits into
base: master
Choose a base branch
from

Conversation

BioTurboNick
Copy link
Contributor

@BioTurboNick BioTurboNick commented Jun 17, 2023

In issue #50082, @benlorenz discovered that my PR #41099 caused large slowdowns in compilation of some methods.

The cause of the issue is that storing MethodInstances into LineInfoNodes, rather than just the name of the method, led to many more entries in the Method's roots table. This table is used for serialization/compression and deserialization/decompression of the contents of each of several type-inferred CodeInfo objects associated with the Method. Each time a MethodInstance is encoded for a Method, it is compared against all existing roots in the table. Since MethodInstances are less likely to be shared, we often end up with worst-case quadratic performance. This aspect has been a recurring bottleneck going back to 2016: #14556

Barring a switch to a hash table for storing methods roots, we need to make the roots list smaller.

Based on @vtjnash's suggestion that compiler-generated objects like MethodInstances probably shouldn't be associated with the Method, I duplicated the roots table interface currently found on Methods into MethodInstances. However, there is another issue here - sometimes the CodeInstance is not available during decompression, so the associated MethodInstance roots table would not be accessible. Thus, we would need to also encode the name as a root of the Method, which is always available. I customized the encoding scheme to store both, and to discard the name if the MethodInstance is available. An additional wrinkle is that MethodInstances present in the code statements cannot be substituted with symbols, and so must still be in the Method root table. Thus, this encoding is only done for elements of LineInfoNodes.

I've also modified the stacktrace lookup to accommodate, though it may need further refinement. I also realized that the CodeInstance wasn't being passed into decompression from retrieve_ir_for_inlining, so it is now being passed in.

The sysimage size in the master just before this PR is 178,865 KB. With this PR, it is 186,599 KB; ~4% increase. Remains to be seen if it solves the bottleneck Ben reported.

C/++ isn't my strong suit so I'm sure there are issues with how I've approached this; I also don't know if anything I've done disturbs the internals too much. My goal was to make sure we had at least 1 possible solution ready, so that reverting #41099 wasn't the only choice.

@oscardssmith
Copy link
Member

is there a reason we don't want a hash table here?

@BioTurboNick
Copy link
Contributor Author

Unfortunately I'm having trouble with the current batch of errors - for some reason it fails when precompiling JSON.jl inside (or after trying once) include("doc/make.jl") from the Documenter run, but not in the global environment. At least Julia runs, even if things are breaking - I'll stop here until/unless there's feedback on direction.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants