-
Notifications
You must be signed in to change notification settings - Fork 151
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
addr2line taking an exorbitant amount of time #74
Comments
This is perf trying to compute which functions are inlined at every stack frame for every sample. If you don't want this, you need to pass Lines 80 to 86 in 0b8d12d
|
Hi, I wrote a patch for perf which uses a long-running addr2line process instead of one subprocess per address-to-look-up. It dramatically improves performance of perf, making flamegraphs on large samples with inlining a possibility again. See: https://eighty-twenty.org/2021/09/09/perf-addr2line-speed-improvement |
@tonyg very cool! Did you submit your patch to perf upstream? It seems like the kind of thing they might consider merging. |
The blog post links to https://lore.kernel.org/linux-perf-users/20210909112202.1947499-1-tonyg@leastfixedpoint.com/ which was submitted about an hour ago. |
Thanks @djc! As @bjorn3 said, I've sent it upstream, but it's too soon to say if it'll be acceptable or not. Another possible audience is the Debian maintainers, if the kernel folks won't take it. I'm not sure if it's just Debian that has the slowdown; certainly, other distributions are linking |
I can cofirm in that ubnutu (admittedly based on Debian) also suffers the perf problem with addr2line so it is not just (vanilla) debian |
I am trying the solution of @tonyg and it's indeed a lot faster, but it generates flamegraphs that are missing a lot of information, like I can get a graph that only has kernel level traces, or has the application's call stack but does not show function names (debug info is properly generated). I'm on ubuntu, linux 5.13 Any advice on how I could debug this? |
That's interesting! Does the same binary, run with the unpatched |
I was not adding |
ok, so that was a stupid mistake on my part: libdw and others were not installed, so perf was built without the ability to read the symbols. That was written plainly right at the beginning of make's output 😑 |
Could anyone update us about the Currently on my application a recording of 30 seconds takes about an hour to render. |
It does appear in the current Linux tree which is easy to check, here is the commit on GitHub. Looks like it's available as of 5.16. |
I spent a lot of time today trying to get this to work, I couldn't upgrade my kernel (for other reasons), so I finally cloned perf, added the patch, and built it from source. It worked, flamegraph is now created significantly quicker. Adding instructions for anyone who is also stuck (and definitely also for me when I have to inevitably do this on another server/machine): uninstall existing perf if any:
install perf dev dependencies: sudo apt-get update
sudo apt-get install flex bison glibc-source libelf-dev libdw-dev libunwind-dev libnewt-dev libgtk2.0-dev binutils-dev libnuma-dev libbabeltrace-ctf-dev libperl-dev python2-dev libiberty-dev zlib1g-dev libzstd-dev libbabeltrace-dev Now download the linux kernel source (replace 5.15 with your kernel version can be found by
Now replace srcline.c in the util folder ( now build perf and install:
finally copy perf binary to
and now flamegraph should run significantly faster! 🥳 |
Fixed by using https://github.com/gimli-rs/addr2line. |
https://github.com/gimli-rs/addr2line is awesome! My time cost boosted from 3min to 10s after using it.
|
Driving by from a more general gimli's addr2line build is a bit different today: cargo build --release --bin addr2line --features=bin Rather than replace the system install, I just update my PATH when invoking eg: perf to find this binary.
|
Hi,
I've recently been using flamegraph on a Linux system and once the recording is done it takes an extremely long time. It seems to be invoking addr2line over and over again making the whole process quite slow.
The output is:
21k samples don't sound that much but if it's invoking a program for every sample it that seems to become very expensive.
The text was updated successfully, but these errors were encountered: