-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
newBE: value_labels_ranges is very slow #2681
Comments
It would be interesting to know more about this workload: why was the label-location dataflow analysis particularly slow in this case? Was there a higher than usual density of labels? Many basic blocks? Considering alternative approaches (against the baseline "regalloc tells us everything directly" approach which regalloc.rs does not currently support):
I actually kind of favor the latter, all other things being equal. As mentioned here (and in the same spirit as the "simpler GC without stackmaps" proposal #2459), I'd like to bias toward better factorization of complexity and less reliance on complex analyses and maintenance of metadata; the "post-hoc analysis" is partway there (the core compiler pipeline only sees blackbox value-label instructions) but this would be further so. Thoughts? |
This particular workload is compiling simple-raytracer with all of of its dependencies. I would prefer not allocting a stackslot for every value for three reasons: I don't think it is acceptible to regress the already poor debugmode performance of rust even more. I don't want the choice to generate debuginfo or jot to influence the generated code. Gcc also doesn't let it influence the generated code. This has the advantage that enabling debuginfo doesn't change the behaviour of a program in case of UB or miscompilations, thus making them easier to debug. Finally value debuginfo may be useful for on stack replacement in case of a tiered JIT. Regressing performance in this case is unacceptable. |
Yeah, perhaps not, though I'd be interested to measure how large the regression would be. The "right" answer here, I think, is to rely on regalloc.rs to provide us location info per vreg per program-point. Unfortunately without that we're forced to do an analysis of some sort to recover the info. It's possible the analysis data structures could be improved: I notice that a lot of time is spent in cloning |
FWIW, GCC does (or at least used to) go with the second approach suggested above of giving everything a fixed stack slot. |
This implementation has been deleted now that regalloc2 is in use! RA2 natively supports passing through and translating debug info, so there's no need to reverse-engineer it from the allocated instructions anymore. |
It literally took more time than the actual compilation on one profile.
The text was updated successfully, but these errors were encountered: