-
Notifications
You must be signed in to change notification settings - Fork 257
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[BUG] Performance regressions in NDK r23b : arm64 vectorization bug (needs cherrypick) #1619
Comments
you're using libc.a? (which might be true for benchmarks but presumably not for your real code. do you see this regression with a dynamic build of your benchmarks?)
@stephenhines --- is this something Arm already knows about, or should we bring this up with them?
beta 1 is out now, beta 2 is almost ready, and the real release should be early next year. (normally i'd say "see https://github.com/android/ndk/wiki for our latest dates at all times", but we appear to have dropped the ball with r24 :-( ) |
I used libc.a just to dump the assembly, the benchmark was happening on a "normal" dynamic build. (FWIW, I dumped libc.so from an S21 with the same assembly)
The regression has been fixed later in LLVM, and I'm in touch with Arm on these issues too. r24 seems to contain the fix.
Thanks! A new NDK is always a risk for us because it always brings new issues; this time, we're switching to LLD, plus backtraces are broken. In terms of LTS, is r23 getting an update because of perf issues? |
yeah, i'm just trying to work out whether you're really reporting an OS bug (where the OS versions are the active ingredient) or an NDK bug (where i'd expect the strstr() caller would be more relevant, and the good/bad NDK should have the same results regardless of which OS version the resulting binary was run on?).
i'll let danalbert/srhines answer that later... do you have a list of specific cherrypicks? (that would make it more likely that we could patch r23 than "r24 is better, but no-one really knows where/why" :-) ) |
That's actually a good point; the benchmarks on our CI run on a quite old Android so I expect it to be NDK-related... But then it should be me... :catthink:
https://reviews.llvm.org/rG467b1f1cd2f2774714ce59919702c3963914b6a8 |
Thanks! @stephenhines will have to say for sure, but that looks like a fairly easily cherry-picked patch, so triaging to r23c. |
https://android-review.googlesource.com/c/toolchain/llvm_android/+/1915778 cherry-picks the fix to the r23 toolchain branch. |
Thanks folks! @enh-google on |
yeah, probably best for clarity to start again with a fresh bug for whatever you've found there :-) i'll rename this bug to make it clear it's about the arm64 vectorization cherrypick... |
Should be fixed in r23 build 8486889. |
Bug: android/ndk#1619 467b1f1c [SimplifyCFG] Allow hoisting terminators only with HoistCommonInsts=false. NB: only library changes are applied and changes to the tests are skipped. Test: N/A Change-Id: Ic31b3f7bb93c32f922766826c3138673130a1da6
Changelog updates are in a separate commit to make cherry-picking to master easier. Bug: android/ndk#1590 Bug: android/ndk#1608 Bug: android/ndk#1619 Bug: android/ndk#1645 Bug: android/ndk#1672 Test: ./checkbuild.py && ./run_tests.py Change-Id: Ie5571ed436cb0a3fe9ad675ed15f62fff4e978d6 (cherry picked from commit 59e8e507c2a2147c2bc806087c953dd36f6b1c41) Merged-In: Ie5571ed436cb0a3fe9ad675ed15f62fff4e978d6
Separate from the toolchain update to avoid merge conflicts. Bug: android/ndk#1590 Bug: android/ndk#1608 Bug: android/ndk#1619 Bug: android/ndk#1645 Bug: android/ndk#1672 Test: None Change-Id: I6e24e582dc0c300db173083009da9a1494360137 (cherry picked from commit 25ab62f84177b8f57782048a01a755c5730d6e6b) Merged-In: I6e24e582dc0c300db173083009da9a1494360137
I'm working at upgrading our custom build system from NDK r21d to r23b, and facing a number of performance regressions.
A weird one is a regression in strstr(), happens only in AArch64:
More or less, it's 2x slower (orange bars are runs on my branch). Our code that is being benchmarked is a simple wrapper around libc's strstr().
Assembly in libc.a for api30, NDK r21d:
Looked at the assembly of libc.a for api31, NDK r23b:
The only expected change I'm seeing are PAC/BTI instructions, all the rest I'm having hard times to follow to be honest.
Is it known? Expected?
Another (but a much smaller: ~10%) regression happens in this simple code:
which becomes
with r21d, and
with r23b. There's definitely a redundant cmp w10, 95. Godbolt suggests trunk LLVM is better at codegen.
Overall, due to perf issues introduced by https://reviews.llvm.org/D84108, we are seeing a number of cases where the code is no longer vectorized where it used to. I can try build up few cases if you're interested, a bit more complicated than here though.
Are there plans to update LLVM in NDK r23? It's LTS after all.
If not, what's the ETA for r24 then?
Thanks!
The text was updated successfully, but these errors were encountered: