newBE: value_labels_ranges is very slow #2681

bjorn3 · 2021-02-23T21:05:14Z

It literally took more time than the actual compilation on one profile.

cfallin · 2021-02-23T21:24:38Z

It would be interesting to know more about this workload: why was the label-location dataflow analysis particularly slow in this case? Was there a higher than usual density of labels? Many basic blocks?

Considering alternative approaches (against the baseline "regalloc tells us everything directly" approach which regalloc.rs does not currently support):

The current approach, adding label ghost-instructions and doing a post-hoc analysis, is general but potentially involves slow analysis. It imposes no additional overhead on generated code, however (i.e. it can describe any generated code). This might be called a "passive" debug-info design.
There is an alternative where we explicitly reserve a stackslot for every label and store to this slot whenever the labeled value changes. Then the DWARF info is simple and constant (value label N always lives at BP-offset) but the runtime overhead is potentially considerable. However maybe this isn't so bad if we're already executing unoptimized debug-mode code. Let's call this an "active" debug-info design: rather than describing locations, we set a location and generate code to move data we care about observing.

I actually kind of favor the latter, all other things being equal. As mentioned here (and in the same spirit as the "simpler GC without stackmaps" proposal #2459), I'd like to bias toward better factorization of complexity and less reliance on complex analyses and maintenance of metadata; the "post-hoc analysis" is partway there (the core compiler pipeline only sees blackbox value-label instructions) but this would be further so. Thoughts?

bjorn3 · 2021-02-23T21:36:00Z

This particular workload is compiling simple-raytracer with all of of its dependencies.

I would prefer not allocting a stackslot for every value for three reasons: I don't think it is acceptible to regress the already poor debugmode performance of rust even more. I don't want the choice to generate debuginfo or jot to influence the generated code. Gcc also doesn't let it influence the generated code. This has the advantage that enabling debuginfo doesn't change the behaviour of a program in case of UB or miscompilations, thus making them easier to debug. Finally value debuginfo may be useful for on stack replacement in case of a tiered JIT. Regressing performance in this case is unacceptable.

cfallin · 2021-02-23T21:43:53Z

Yeah, perhaps not, though I'd be interested to measure how large the regression would be.

The "right" answer here, I think, is to rely on regalloc.rs to provide us location info per vreg per program-point. Unfortunately without that we're forced to do an analysis of some sort to recover the info.

It's possible the analysis data structures could be improved: I notice that a lot of time is spent in cloning HashMaps for example; perhaps a delta-based scheme with some structure sharing could be designed. If someone has the time to experiment with this (I unfortunately don't at the moment, first priority is completeness for the switchover) I'd be happy to discuss or review...

froydnj · 2021-02-23T22:39:24Z

FWIW, GCC does (or at least used to) go with the second approach suggested above of giving everything a fixed stack slot.

cfallin · 2022-05-04T22:47:22Z

This implementation has been deleted now that regalloc2 is in use! RA2 natively supports passing through and translating debug info, so there's no need to reverse-engineer it from the allocated instructions anymore.

akirilov-arm added cranelift Issues related to the Cranelift code generator cranelift:goal:compile-time Focus area: how fast Cranelift can compile or how much memory it uses. labels Oct 4, 2021

cfallin closed this as completed May 4, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

newBE: value_labels_ranges is very slow #2681

newBE: value_labels_ranges is very slow #2681

bjorn3 commented Feb 23, 2021

cfallin commented Feb 23, 2021 •

edited

Loading

bjorn3 commented Feb 23, 2021

cfallin commented Feb 23, 2021

froydnj commented Feb 23, 2021

cfallin commented May 4, 2022

newBE: value_labels_ranges is very slow #2681

newBE: value_labels_ranges is very slow #2681

Comments

bjorn3 commented Feb 23, 2021

cfallin commented Feb 23, 2021 • edited Loading

bjorn3 commented Feb 23, 2021

cfallin commented Feb 23, 2021

froydnj commented Feb 23, 2021

cfallin commented May 4, 2022

cfallin commented Feb 23, 2021 •

edited

Loading