-
Notifications
You must be signed in to change notification settings - Fork 12.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Extremely slow optimizer performance when including large array of strings #39352
Comments
Great writeup. Thanks! |
Parts of timings for non-optimised build:
So the guess wrt parsing is wrong. Timings for optimised build are pretty much the same with exception of LLVM passes:
Main function has 250k instances of
generated to produce the on-stack array, whereas clang produces a constant and |
Potentially fixed by having rvalue static promotion implemented. |
So, even if we fixed this particular case of completely static array, there’s still the same issue when the array is not exactly fully static:
I constructed a program like above in C and made clang compile it without optimisations. It still haven’t finished. EDIT: it has finished (keep in mind, this is not optimised build):
Which sounds right about where I’d expect the time be spent. |
@nagisa Interesting. I can reproduce that result with clang with or without optimization, but gcc has no problem with such a program when not optimizing. When optimizing, it too doesn't show any signs of finishing. The test code I used, for reference: #include <string.h>
#include <stdio.h>
int main(int argc, char *argv[]) {
char *array[] = {
#include "num_array.h"
argv[0],
};
size_t sum = 0;
for (int i = 0; i < sizeof(array)/sizeof(*array); i++) {
sum += strlen(array[i]);
}
printf("%zu\n", sum);
return 0;
} To generate |
I reported this test case as a GCC bug: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=79266 |
I would just point that the llvm embedded version missed two commits (present in 3.9.1) that could be related to this bug: |
Cc @rust-lang/wg-compiler-performance |
For people coming here with a similar problem, looking for a workaround, I've noticed that include_bytes does not slow down compiles as much. I had a case where I needed to include 100000 strings in a program. Initially I made a static array of them, but this was very slow to compile. I changed it to use two include_bytes instead, one with an array of binary u32 (offset, length)-pairs, and one with the actual (utf8-encoded) texts. This compiled very quickly. Edit: include_bytes is apparently not super-efficient either, but not as bad as a large array (see: #65818 ). |
Consider the following test program:
num_array.rs
contains an array of 250k strings:Compiling this in debug mode took about 45 seconds; long, and potentially a good test case for compiler profiling, but not completely ridiculous.
Compiling this in release mode showed no signs of finishing after 45 minutes. stracing rustc showed two threads, one blocked in a futex and the other repeatedly allocating and freeing a memory buffer:
By way of comparison, an analogous C program compiled with GCC takes 4.6s to compile without optimization, or 5.6s with optimization. Python parses and runs an analogous program in 1.2s. So, 45s seems excessive for an unoptimized compile, and 45m+ seems wildly excessive for an optimized compile.
Complete test case (ready to
cargo run
orcargo run --release
): testcase.tar.gzThe text was updated successfully, but these errors were encountered: