Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Excessive loop unrolling #42332

Open
llvmbot opened this issue Aug 13, 2019 · 13 comments
Open

Excessive loop unrolling #42332

llvmbot opened this issue Aug 13, 2019 · 13 comments
Assignees
Labels
bugzilla Issues migrated from bugzilla loopoptim

Comments

@llvmbot
Copy link
Collaborator

llvmbot commented Aug 13, 2019

Bugzilla Link 42987
Version 6.0
OS All
Reporter LLVM Bugzilla Contributor
CC @chandlerc,@topperc,@DougGregor,@fhahn,@hfinkel,@jdoerfert,@RKSimon,@Meinersbur,@zygoloid,@rotateright

Extended Description

I think we need to re-evaluate the advantages and disadvantages of loop unrolling.

Clang is often unrolling loops excessively in cases where there is no advantage in unrolling.

Loop unrolling is advantageous when the loop overhead is costly or when expressions or branches that depend on the loop counter can be simplified. But loop unrolling gives no advantage when the bottleneck lies elsewhere.

The limiting factor is likely to be the floating point/vector unit in the CPU if a loop contains floating point or vector code. The loop overhead is often reduced to an integer addition and a fused compare/branch instruction.
The integer unit has plenty of resources to run the loop overhead simultaneously with the floating point or vector code at zero extra cost.

The situation is no better if the instruction decoder is the bottleneck, which is quite often the case. A tiny loop will fit into the micro-op cache or loopback buffer of modern CPUs so that the loop will run on decoded instructions only. A large unrolled loop is unlikely to fit into these buffers, which means that the unrolled loop is slower.

Even if the unrolled loop is not slower when measured in isolation, it can slow down other parts of the program because it consumes excessive amounts of code cache.

A simple example:

const int size = 58;
double a[size], b[size], c[size];

void test () {
    for (int i = 0; i < size; i++) {
        a[i] = b[i] + c[i];    
    }
}

clang -O2 -m64 will unroll this loop completely up to size = 59

clang -O3 -m64 will unroll this loop completely up to size = 119

Clang is vectorizing the loop, which is a good thing, but there is no advantage in unrolling further.

Literature: I have described the loopback buffer, micro-op cache, and other details of different CPUs in the manual "The microarchitecture of Intel, AMD and VIA CPUs" https://www.agner.org/optimize/microarchitecture.pdf

@hfinkel
Copy link
Collaborator

hfinkel commented Aug 13, 2019

We have, indeed, always considered full loop unrolling part of the early canonicalization process. We do, essentially, limit the partial unrolling factor based on an estimate of the number of uops and the target-specific size of the uop cache (the thresholds are now set by the LoopMicroOpBufferSize variable in the various lib/Target/X86/X86Sched*.td files). Full unrolling, however, we don't limit in the same way. I believe the rationale was that full unrolling tends to enable other optimizations, and so we do this limited only by some heuristic practicality threshold.

It might be interesting to conduct the following experiment. In include/llvm/CodeGen/BasicTTIImpl.h, in getUnrollingPreferences, where we have this:

UP.PartialThreshold = MaxOps;

add:

UP.Threshold = MaxOps;

and see how that affects things.

@Meinersbur
Copy link
Member

As Hal mentioned, (full) unrolling may enable other optimizations. I think of folding an index expression such as i*8+1 into a constant, remove a constant table lookup, or the SLP vectorizer might vectorize a stream of instructions that the LoopVectorizer cannot.

However, this requires different heuristics than we currently have which estimates the code size in instructions. The inliner heusristic e.g. also takes into account parameters which become constant.

@fhahn
Copy link
Contributor

fhahn commented Nov 27, 2021

mentioned in issue llvm/llvm-bugzilla-archive#44593

@llvmbot llvmbot transferred this issue from llvm/llvm-bugzilla-archive Dec 10, 2021
@AgnerF
Copy link

AgnerF commented Jul 27, 2022

What is the status of this issue? There still appears to be excessive loop unrolling even when it does not enable other optimizations.

@AgnerF
Copy link

AgnerF commented Aug 8, 2022

What can I do to draw attention to this important issue? Nothing has happened for three years.

I have made a comparison of different x86 compilers and tested how well they optimize different expressions. The results are reported in my C++ optimization manual. Here, Clang is better than all other compilers, except for one serious problem: excessive loop unrolling. Fixing this problem will make Clang a clear winner.

Let me explain why excessive loop unrolling is a problem. Modern x86 processors have a micro-op cache or a loop buffer, or both. These are critical resources with limited sizes. See my microarchitecture manual for details. Loop unrolling may decrease performance if the unrolled loop is too big for the loop buffer or if it pushes other code out of the micro-op cache.

There can be good reasons to unroll a loop:

  • vectorization. Unroll to fit the largest available vector register size
  • secondary optimization of expressions and branches that depend on the loop counter
  • improve caching
  • reducing loop overhead

The last one: reducing loop overhead, is rarely relevant today. State-of-the-art microprocessors can execute 4 or 5 independent instructions per clock cycle. The loop overhead is typically just a few integer instructions that can easily execute simultaneously with - or ahead of - the loop workload. Reducing loop overhead has no effect at all if the loop workload includes floating point or vector instructions or anything else that consumes more time than the loop overhead. The loop branch has good prediction in most cases.

Loop unrolling to improve caching also has less relevance today where microprocessors can make memory reads ahead of a later write to a different address.

The conclusion is that we should unroll only to the extent necessary for vectorization unless there are secondary optimizations to gain.

My tests show that Clang is unrolling far in excess of this. This may be a remnant from old times where priorities were different.

Please revise the loop unroll heuristics. In particular for x86 targets.

@RKSimon
Copy link
Collaborator

RKSimon commented Aug 8, 2022

I'd like to take a look at this - I have a number of loop vectorization bugs that I have on my backlog, including this and Issue #50452 (where the znver3 LoopMicroOpBufferSize model value seems to be causing some really weird issues).

Also Issue #37628 which is more about value tracking not realising that its only handling the scalar epilogue of a manually vectorized loop.

@RKSimon RKSimon self-assigned this Aug 8, 2022
@nikic
Copy link
Contributor

nikic commented Aug 8, 2022

For znver3 see also #50802. Would love to see that fixed. Clang is currently actively sabotaging the znver3 architecture by using a ridiculously large unrolling limit.

@davidbolvansky
Copy link
Collaborator

What do you propose? Lower LoopMicroOpBufferSize?

@AgnerF
Copy link

AgnerF commented Aug 16, 2022

@davidbolvansky: No, a heuristic that calculates the advantage of unrolling. If there is no significant advantage in unrolling beyond the max vector size, then don't.

@davidbolvansky
Copy link
Collaborator

Right, but I am worried that there would be a strong pushback as some (sometimes toy) benchmarks would regress.

See discussion (a bit different issue but similar story)
https://reviews.llvm.org/D102748

@davidbolvansky
Copy link
Collaborator

cc @sjoerdmeijer

@AgnerF
Copy link

AgnerF commented Aug 16, 2022

@davidbolvansky. Many test cases and benchmarks tend to focus on a single loop without considering how cache use affects other parts of a code. If you test one loop in isolation you may unroll up to the size of the micro-op cache without any adverse effects. But in the real world you will see bad effects of pushing other code out of the code cache, micro-op cache, data cache, or branch target buffer.

Are you also considering the loop buffer? It is very small, 20 - 70 micro-ops, and runs even faster than the micro-op cache.

fhahn added a commit that referenced this issue Aug 16, 2022
Test cases based on #42332 showing excessive unrolling with both known
and runtime trip counts.
@fhahn
Copy link
Contributor

fhahn commented Aug 16, 2022

I think the specific problem here (excessive unrolling of vector loops) can be addressed relatively easily compared to excessive unrolling in LLVM in general.

LLVM's loop vectorizer already tries to determine whether interleaving multiple vector iterations is beneficial to increase throughput. Doing unrolling later based on the unroller's very limited cost-model is probably rarely the right decision. We have at least the following 2 options:

  1. Let the vectorizer tell the unroller not to unroll vector loops it generated: https://reviews.llvm.org/D115261
  2. Let TTI's unroll preferences decide to avoid unrolling vector loops: https://reviews.llvm.org/D131972 (WIP!)

Either of those should probably be predicated on whether the uarch is out-of-order or now.

Note that besides unnecessary increasing code size, aggressive unrolling of vector loops can actively be harmful, e.g. because the vector loop isn't enter any longer. One such example is #40306.

More generally, the unroller has a way to estimate simplification benefits, but it only triggers after the static size based thresholds. We should probably move towards relying more on analyzing the expected benefits, but to do that the analyzer needs to be quicker (first step WIP https://reviews.llvm.org/D131973)

tycho added a commit to IntroversionSoftware/gamelibs-miniaudio that referenced this issue Jun 28, 2024
Clang has a tendency to *heavily* unroll loops all over the place:
    llvm/llvm-project#42332

Disable loop unrolling wherever it goes too nuts, enable vectorization
where it doesn't do so automatically, etc.

Signed-off-by: Steven Noonan <steven@uplinklabs.net>
tycho added a commit to IntroversionSoftware/gamelibs-miniaudio that referenced this issue Jun 28, 2024
Clang has a tendency to *heavily* unroll loops all over the place:
    llvm/llvm-project#42332

Disable loop unrolling wherever it goes too nuts, enable vectorization
where it doesn't do so automatically, etc.

Signed-off-by: Steven Noonan <steven@uplinklabs.net>
tycho added a commit to IntroversionSoftware/gamelibs-miniaudio that referenced this issue Jun 28, 2024
Clang has a tendency to *heavily* unroll loops all over the place:
    llvm/llvm-project#42332

Disable loop unrolling wherever it goes too nuts, enable vectorization
where it doesn't do so automatically, etc.

Signed-off-by: Steven Noonan <steven@uplinklabs.net>
tycho added a commit to IntroversionSoftware/gamelibs-miniaudio that referenced this issue Jun 28, 2024
Clang has a tendency to *heavily* unroll loops all over the place:
    llvm/llvm-project#42332

Disable loop unrolling wherever it goes too nuts, enable vectorization
where it doesn't do so automatically, etc.

Signed-off-by: Steven Noonan <steven@uplinklabs.net>
tycho added a commit to IntroversionSoftware/gamelibs-miniaudio that referenced this issue Jun 28, 2024
Clang has a tendency to *heavily* unroll loops all over the place:
    llvm/llvm-project#42332

Disable loop unrolling wherever it goes too nuts, enable vectorization
where it doesn't do so automatically, etc.

Signed-off-by: Steven Noonan <steven@uplinklabs.net>
tycho added a commit to IntroversionSoftware/gamelibs-miniaudio that referenced this issue Jun 28, 2024
Clang has a tendency to *heavily* unroll loops all over the place:
    llvm/llvm-project#42332

Disable loop unrolling wherever it goes too nuts, enable vectorization
where it doesn't do so automatically, etc.

Signed-off-by: Steven Noonan <steven@uplinklabs.net>
tycho added a commit to IntroversionSoftware/gamelibs-miniaudio that referenced this issue Jun 28, 2024
Clang has a tendency to *heavily* unroll loops all over the place:
    llvm/llvm-project#42332

Disable loop unrolling wherever it goes too nuts, enable vectorization
where it doesn't do so automatically, etc.

Signed-off-by: Steven Noonan <steven@uplinklabs.net>
tycho added a commit to IntroversionSoftware/gamelibs-miniaudio that referenced this issue Jun 28, 2024
Clang has a tendency to *heavily* unroll loops all over the place:
    llvm/llvm-project#42332

Disable loop unrolling wherever it goes too nuts, enable vectorization
where it doesn't do so automatically, etc.

Signed-off-by: Steven Noonan <steven@uplinklabs.net>
tycho added a commit to IntroversionSoftware/gamelibs-miniaudio that referenced this issue Jun 28, 2024
Clang has a tendency to *heavily* unroll loops all over the place:
    llvm/llvm-project#42332

Disable loop unrolling wherever it goes too nuts, enable vectorization
where it doesn't do so automatically, etc.

Signed-off-by: Steven Noonan <steven@uplinklabs.net>
tycho added a commit to IntroversionSoftware/gamelibs-miniaudio that referenced this issue Jun 28, 2024
Clang has a tendency to *heavily* unroll loops all over the place:
    llvm/llvm-project#42332

Disable loop unrolling wherever it goes too nuts, enable vectorization
where it doesn't do so automatically, etc.

Signed-off-by: Steven Noonan <steven@uplinklabs.net>
tycho added a commit to IntroversionSoftware/gamelibs-miniaudio that referenced this issue Jun 28, 2024
Clang has a tendency to *heavily* unroll loops all over the place:
    llvm/llvm-project#42332

Disable loop unrolling wherever it goes too nuts, enable vectorization
where it doesn't do so automatically, etc.

Signed-off-by: Steven Noonan <steven@uplinklabs.net>
tycho added a commit to IntroversionSoftware/gamelibs-miniaudio that referenced this issue Jun 28, 2024
Clang has a tendency to *heavily* unroll loops all over the place:
    llvm/llvm-project#42332

Disable loop unrolling wherever it goes too nuts, enable vectorization
where it doesn't do so automatically, etc.

Signed-off-by: Steven Noonan <steven@uplinklabs.net>
tycho added a commit to IntroversionSoftware/gamelibs-miniaudio that referenced this issue Jun 28, 2024
Clang has a tendency to *heavily* unroll loops all over the place:
    llvm/llvm-project#42332

Disable loop unrolling wherever it goes too nuts, enable vectorization
where it doesn't do so automatically, etc.

Signed-off-by: Steven Noonan <steven@uplinklabs.net>
tycho added a commit to IntroversionSoftware/gamelibs-miniaudio that referenced this issue Jun 28, 2024
Clang has a tendency to *heavily* unroll loops all over the place:
    llvm/llvm-project#42332

Disable loop unrolling wherever it goes too nuts, enable vectorization
where it doesn't do so automatically, etc.

Signed-off-by: Steven Noonan <steven@uplinklabs.net>
tycho added a commit to IntroversionSoftware/gamelibs-miniaudio that referenced this issue Jun 28, 2024
Clang has a tendency to *heavily* unroll loops all over the place:
    llvm/llvm-project#42332

Disable loop unrolling wherever it goes too nuts, enable vectorization
where it doesn't do so automatically, etc.

Signed-off-by: Steven Noonan <steven@uplinklabs.net>
tycho added a commit to IntroversionSoftware/gamelibs-miniaudio that referenced this issue Jun 28, 2024
Clang has a tendency to *heavily* unroll loops all over the place:
    llvm/llvm-project#42332

Disable loop unrolling wherever it goes too nuts, enable vectorization
where it doesn't do so automatically, etc.

Signed-off-by: Steven Noonan <steven@uplinklabs.net>
tycho added a commit to IntroversionSoftware/gamelibs-miniaudio that referenced this issue Jun 28, 2024
Clang has a tendency to *heavily* unroll loops all over the place:
    llvm/llvm-project#42332

Disable loop unrolling wherever it goes too nuts, enable vectorization
where it doesn't do so automatically, etc.

Signed-off-by: Steven Noonan <steven@uplinklabs.net>
tycho added a commit to IntroversionSoftware/gamelibs-miniaudio that referenced this issue Jun 28, 2024
Clang has a tendency to *heavily* unroll loops all over the place:
    llvm/llvm-project#42332

Disable loop unrolling wherever it goes too nuts, enable vectorization
where it doesn't do so automatically, etc.

Signed-off-by: Steven Noonan <steven@uplinklabs.net>
tycho added a commit to IntroversionSoftware/gamelibs-miniaudio that referenced this issue Jun 28, 2024
Clang has a tendency to *heavily* unroll loops all over the place:
    llvm/llvm-project#42332

Disable loop unrolling wherever it goes too nuts, enable vectorization
where it doesn't do so automatically, etc.

Signed-off-by: Steven Noonan <steven@uplinklabs.net>
tycho added a commit to IntroversionSoftware/gamelibs-miniaudio that referenced this issue Jun 28, 2024
Clang has a tendency to *heavily* unroll loops all over the place:
    llvm/llvm-project#42332

Disable loop unrolling wherever it goes too nuts, enable vectorization
where it doesn't do so automatically, etc.

Signed-off-by: Steven Noonan <steven@uplinklabs.net>
tycho added a commit to IntroversionSoftware/gamelibs-miniaudio that referenced this issue Jun 28, 2024
Clang has a tendency to *heavily* unroll loops all over the place:
    llvm/llvm-project#42332

Disable loop unrolling wherever it goes too nuts, enable vectorization
where it doesn't do so automatically, etc.

Signed-off-by: Steven Noonan <steven@uplinklabs.net>
tycho added a commit to IntroversionSoftware/gamelibs-miniaudio that referenced this issue Jun 28, 2024
Clang has a tendency to *heavily* unroll loops all over the place:
    llvm/llvm-project#42332

Disable loop unrolling wherever it goes too nuts, enable vectorization
where it doesn't do so automatically, etc.

Signed-off-by: Steven Noonan <steven@uplinklabs.net>
tycho added a commit to IntroversionSoftware/gamelibs-miniaudio that referenced this issue Jun 28, 2024
Clang has a tendency to *heavily* unroll loops all over the place:
    llvm/llvm-project#42332

Disable loop unrolling wherever it goes too nuts, enable vectorization
where it doesn't do so automatically, etc.

Signed-off-by: Steven Noonan <steven@uplinklabs.net>
tycho added a commit to IntroversionSoftware/gamelibs-miniaudio that referenced this issue Jun 28, 2024
Clang has a tendency to *heavily* unroll loops all over the place:
    llvm/llvm-project#42332

Disable loop unrolling wherever it goes too nuts, enable vectorization
where it doesn't do so automatically, etc.

Signed-off-by: Steven Noonan <steven@uplinklabs.net>
tycho added a commit to IntroversionSoftware/gamelibs-miniaudio that referenced this issue Jun 28, 2024
Clang has a tendency to *heavily* unroll loops all over the place:
    llvm/llvm-project#42332

Disable loop unrolling wherever it goes too nuts, enable vectorization
where it doesn't do so automatically, etc.

Signed-off-by: Steven Noonan <steven@uplinklabs.net>
tycho added a commit to IntroversionSoftware/gamelibs-miniaudio that referenced this issue Jun 28, 2024
Clang has a tendency to *heavily* unroll loops all over the place:
    llvm/llvm-project#42332

Disable loop unrolling wherever it goes too nuts, enable vectorization
where it doesn't do so automatically, etc.

Signed-off-by: Steven Noonan <steven@uplinklabs.net>
tycho added a commit to IntroversionSoftware/gamelibs-miniaudio that referenced this issue Jun 28, 2024
Clang has a tendency to *heavily* unroll loops all over the place:
    llvm/llvm-project#42332

Disable loop unrolling wherever it goes too nuts, enable vectorization
where it doesn't do so automatically, etc.

Signed-off-by: Steven Noonan <steven@uplinklabs.net>
tycho added a commit to IntroversionSoftware/gamelibs-miniaudio that referenced this issue Jun 28, 2024
Clang has a tendency to *heavily* unroll loops all over the place:
    llvm/llvm-project#42332

Disable loop unrolling wherever it goes too nuts, enable vectorization
where it doesn't do so automatically, etc.

Signed-off-by: Steven Noonan <steven@uplinklabs.net>
tycho added a commit to IntroversionSoftware/gamelibs-miniaudio that referenced this issue Jun 28, 2024
Clang has a tendency to *heavily* unroll loops all over the place:
    llvm/llvm-project#42332

Disable loop unrolling wherever it goes too nuts, enable vectorization
where it doesn't do so automatically, etc.

Signed-off-by: Steven Noonan <steven@uplinklabs.net>
tycho added a commit to IntroversionSoftware/gamelibs-miniaudio that referenced this issue Jun 28, 2024
Clang has a tendency to *heavily* unroll loops all over the place:
    llvm/llvm-project#42332

Disable loop unrolling wherever it goes too nuts, enable vectorization
where it doesn't do so automatically, etc.

Signed-off-by: Steven Noonan <steven@uplinklabs.net>
tycho added a commit to IntroversionSoftware/gamelibs-miniaudio that referenced this issue Jun 28, 2024
Clang has a tendency to *heavily* unroll loops all over the place:
    llvm/llvm-project#42332

Disable loop unrolling wherever it goes too nuts, enable vectorization
where it doesn't do so automatically, etc.

Signed-off-by: Steven Noonan <steven@uplinklabs.net>
tycho added a commit to IntroversionSoftware/gamelibs-miniaudio that referenced this issue Jun 28, 2024
Clang has a tendency to *heavily* unroll loops all over the place:
    llvm/llvm-project#42332

Disable loop unrolling wherever it goes too nuts, enable vectorization
where it doesn't do so automatically, etc.

Signed-off-by: Steven Noonan <steven@uplinklabs.net>
tycho added a commit to IntroversionSoftware/gamelibs-miniaudio that referenced this issue Jun 28, 2024
Clang has a tendency to *heavily* unroll loops all over the place:
    llvm/llvm-project#42332

Disable loop unrolling wherever it goes too nuts, enable vectorization
where it doesn't do so automatically, etc.

Signed-off-by: Steven Noonan <steven@uplinklabs.net>
tycho added a commit to IntroversionSoftware/gamelibs-miniaudio that referenced this issue Jun 28, 2024
Clang has a tendency to *heavily* unroll loops all over the place:
    llvm/llvm-project#42332

Disable loop unrolling wherever it goes too nuts, enable vectorization
where it doesn't do so automatically, etc.

Signed-off-by: Steven Noonan <steven@uplinklabs.net>
tycho added a commit to IntroversionSoftware/gamelibs-miniaudio that referenced this issue Jun 28, 2024
Clang has a tendency to *heavily* unroll loops all over the place:
    llvm/llvm-project#42332

Disable loop unrolling wherever it goes too nuts, enable vectorization
where it doesn't do so automatically, etc.

Signed-off-by: Steven Noonan <steven@uplinklabs.net>
tycho added a commit to IntroversionSoftware/gamelibs-miniaudio that referenced this issue Jul 19, 2024
Clang has a tendency to *heavily* unroll loops all over the place:
    llvm/llvm-project#42332

Disable loop unrolling wherever it goes too nuts, enable vectorization
where it doesn't do so automatically, etc.

Signed-off-by: Steven Noonan <steven@uplinklabs.net>
tycho added a commit to IntroversionSoftware/gamelibs-miniaudio that referenced this issue Jul 19, 2024
Clang has a tendency to *heavily* unroll loops all over the place:
    llvm/llvm-project#42332

Disable loop unrolling wherever it goes too nuts, enable vectorization
where it doesn't do so automatically, etc.

Signed-off-by: Steven Noonan <steven@uplinklabs.net>
tycho added a commit to IntroversionSoftware/gamelibs-miniaudio that referenced this issue Aug 26, 2024
Clang has a tendency to *heavily* unroll loops all over the place:
    llvm/llvm-project#42332

Disable loop unrolling wherever it goes too nuts, enable vectorization
where it doesn't do so automatically, etc.

Signed-off-by: Steven Noonan <steven@uplinklabs.net>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bugzilla Issues migrated from bugzilla loopoptim
Projects
None yet
Development

No branches or pull requests

9 participants