The AllocOptPass is potentially slow (and runs too often) in some cases #54524
Labels
broadcast
Applying a function over a collection
compiler:codegen
Generation of LLVM IR and native code
performance
Must go faster
Looking a bit into why #54520 caused such a big latency improvement, the issue (or at least one issue) is that with the broadcast code a large majority of the time is spent in our own alloc optimization pass:
As can be seen, this pass runs four times, each time taking quite a long time. The time spent in this pass almost completely disappears when the broadcasting code is replaced with a loop (as was done in #54520).
It might be possible with some latency gains here if one can:
The text was updated successfully, but these errors were encountered: