Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

addr2line taking an exorbitant amount of time #74

Closed
Licenser opened this issue Feb 20, 2020 · 16 comments · Fixed by #127
Closed

addr2line taking an exorbitant amount of time #74

Licenser opened this issue Feb 20, 2020 · 16 comments · Fixed by #127

Comments

@Licenser
Copy link
Contributor

Hi,
I've recently been using flamegraph on a Linux system and once the recording is done it takes an extremely long time. It seems to be invoking addr2line over and over again making the whole process quite slow.

The output is:

[ perf record: Woken up 682 times to write data ]
[ perf record: Captured and wrote 170,633 MB perf.data (21198 samples) ]

21k samples don't sound that much but if it's invoking a program for every sample it that seems to become very expensive.

@bjorn3
Copy link

bjorn3 commented Feb 20, 2020

This is perf trying to compute which functions are inlined at every stack frame for every sample. If you don't want this, you need to pass --no-inline to perf. The invocation can be found at

flamegraph/src/lib.rs

Lines 80 to 86 in 0b8d12d

pub fn output() -> Vec<u8> {
Command::new("perf")
.arg("script")
.output()
.expect("unable to call perf script")
.stdout
}

@bors bors bot closed this as completed in ee462f2 Mar 23, 2021
@tonyg
Copy link

tonyg commented Sep 9, 2021

Hi, I wrote a patch for perf which uses a long-running addr2line process instead of one subprocess per address-to-look-up. It dramatically improves performance of perf, making flamegraphs on large samples with inlining a possibility again. See: https://eighty-twenty.org/2021/09/09/perf-addr2line-speed-improvement

@djc
Copy link
Contributor

djc commented Sep 9, 2021

@tonyg very cool! Did you submit your patch to perf upstream? It seems like the kind of thing they might consider merging.

@bjorn3
Copy link

bjorn3 commented Sep 9, 2021

The blog post links to https://lore.kernel.org/linux-perf-users/20210909112202.1947499-1-tonyg@leastfixedpoint.com/ which was submitted about an hour ago.

@tonyg
Copy link

tonyg commented Sep 9, 2021

Thanks @djc! As @bjorn3 said, I've sent it upstream, but it's too soon to say if it'll be acceptable or not. Another possible audience is the Debian maintainers, if the kernel folks won't take it. I'm not sure if it's just Debian that has the slowdown; certainly, other distributions are linking perf against libbfd which doesn't suffer from the problem.

@Licenser
Copy link
Contributor Author

Licenser commented Sep 9, 2021

I can cofirm in that ubnutu (admittedly based on Debian) also suffers the perf problem with addr2line so it is not just (vanilla) debian

@Geal
Copy link

Geal commented Nov 3, 2021

I am trying the solution of @tonyg and it's indeed a lot faster, but it generates flamegraphs that are missing a lot of information, like I can get a graph that only has kernel level traces, or has the application's call stack but does not show function names (debug info is properly generated). I'm on ubuntu, linux 5.13

Any advice on how I could debug this?

@tonyg
Copy link

tonyg commented Nov 4, 2021

I am trying the solution of @tonyg and it's indeed a lot faster, but it generates flamegraphs that are missing a lot of information, like I can get a graph that only has kernel level traces, or has the application's call stack but does not show function names (debug info is properly generated). I'm on ubuntu, linux 5.13

That's interesting! Does the same binary, run with the unpatched perf, yield better flamegraphs? (My advice, though I'm sure you've already done this, would be to double-check your Cargo.toml settings for profile.bench and profile.release, ensuring that debug = true and strip = false...)

@Geal
Copy link

Geal commented Nov 4, 2021

I was not adding strip = false but debug=true was there for the release profile, and I verified that the generated binary had symbols using strings.
The unpatched perf yielded better flamegraphs yes

@Geal
Copy link

Geal commented Nov 8, 2021

ok, so that was a stupid mistake on my part: libdw and others were not installed, so perf was built without the ability to read the symbols. That was written plainly right at the beginning of make's output 😑

@osa1
Copy link

osa1 commented Jan 13, 2022

Could anyone update us about the perf patch mentioned above please? Is it merged to upstream? In the linked thread I don't see an email that announces that it's merged so I think it's not?

Currently on my application a recording of 30 seconds takes about an hour to render.

@djc
Copy link
Contributor

djc commented Jan 13, 2022

It does appear in the current Linux tree which is easy to check, here is the commit on GitHub.

Looks like it's available as of 5.16.

@kalradivyanshu
Copy link

kalradivyanshu commented Apr 30, 2023

I spent a lot of time today trying to get this to work, I couldn't upgrade my kernel (for other reasons), so I finally cloned perf, added the patch, and built it from source. It worked, flamegraph is now created significantly quicker.

Adding instructions for anyone who is also stuck (and definitely also for me when I have to inevitably do this on another server/machine):
(instructions are for ubuntu 22.04 linux kernel 5.15 on arm64)

uninstall existing perf if any:

sudo apt-get remove linux-tools-generic

install perf dev dependencies:

sudo apt-get update
sudo apt-get install flex bison glibc-source libelf-dev libdw-dev libunwind-dev libnewt-dev libgtk2.0-dev binutils-dev libnuma-dev libbabeltrace-ctf-dev libperl-dev python2-dev libiberty-dev zlib1g-dev libzstd-dev libbabeltrace-dev

Now download the linux kernel source (replace 5.15 with your kernel version can be found by uname -r)

wget -c https://mirrors.edge.kernel.org/pub/linux/kernel/v5.x/linux-5.15.tar.gz
tar -xzf linux-5.15.tar.gz
cd linux-5.15/tools/perf

Now replace srcline.c in the util folder (perf/util) with the srcline.c from linux kernel 5.16 that has the patch mentioned above: link

now build perf and install:

make clean; make ARCH=arm64
sudo make install

finally copy perf binary to /usr/bin:

cp /usr/bin/

and now flamegraph should run significantly faster! 🥳

@ilya-zlobintsev
Copy link

ilya-zlobintsev commented Dec 29, 2023

I am still experiencing this issue despite being on an up-to-date Arch Linux system (binutils 2.41.0, kernel 6.6). Flamegraphs take 10+ minutes to generate due to slow addr2line calls.

Fixed by using https://github.com/gimli-rs/addr2line.

@lixin-wei
Copy link

https://github.com/gimli-rs/addr2line is awesome! My time cost boosted from 3min to 10s after using it.

git clone https://github.com/gimli-rs/addr2line
cd addr2line
cargo build --release --examples
sudo cp /usr/bin/addr2line /usr/bin/addr2line-bak
sudo cp target/release/examples/addr2line /usr/bin/addr2line 

@wez
Copy link

wez commented May 1, 2024

Driving by from a more general perf + rust problem and found this thread super helpful!

gimli's addr2line build is a bit different today:

cargo build --release --bin addr2line --features=bin

Rather than replace the system install, I just update my PATH when invoking eg: perf to find this binary.

PATH=/home/wez/Downloads/addr2line/target/release:$PATH perf report -g --stdio -G

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

10 participants