Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

missing arguments with DWARF from dlopen-ed library #842

Closed
honggyukim opened this issue Jul 31, 2019 · 17 comments
Closed

missing arguments with DWARF from dlopen-ed library #842

honggyukim opened this issue Jul 31, 2019 · 17 comments

Comments

@honggyukim
Copy link
Collaborator

It doesn't properly record arguments using DWARF info when the library is opened by dlopen. Please see the example inside tests directory as follows:

$ gcc -pg -g -shared -fPIC s-lib.c -o libabc_test_lib.so
$ g++ -pg -g -shared -fPIC s-libfoo.cpp -o libfoo.so
$ gcc -pg -g s-dlopen.c -o t-dlopen -ldl

After building the binaries as above, it can be traced as follows:

$ uftrace -a t-dlopen
# DURATION     TID     FUNCTION
            [   361] | main(1, 0x7ffec9e5b748) {
 643.248 us [   361] |   dlopen("./libabc_test_lib.so", RTLD_LAZY) = 0x5632371fe7c0;
   3.484 us [   361] |   dlsym(0x5632371fe7c0, "lib_a") = &lib_a;
            [   361] |   lib_a() {
            [   361] |     lib_b() {
   2.691 us [   361] |       lib_c();
   3.891 us [   361] |     } /* lib_b */
   4.678 us [   361] |   } /* lib_a */
  33.023 us [   361] |   dlclose(0x5632371fe7c0) = 0;
 334.513 us [   361] |   dlopen("./libfoo.so", RTLD_LAZY) = 0x5632371fe7c0;
   3.000 us [   361] |   dlsym(0x5632371fe7c0, "foo") = &foo;
            [   361] |   foo() {
   0.210 us [   361] |     AAA::bar();
   3.188 us [   361] |   } /* foo */
  29.892 us [   361] |   dlclose(0x5632371fe7c0) = 0;
   1.072 ms [   361] | } = 0; /* main */

But it doesn't record arguments and return values for lib_a, lib_b, and lib_c.

@DanielTimLee
Copy link
Contributor

Is this issue related to #539 ?

@honggyukim
Copy link
Collaborator Author

@DanielTimLee You're right. It's same issue, but I will keep this as well because it's easier to reproduce the problem.

@namhyung namhyung added this to the v0.9.5 milestone Dec 28, 2019
@namhyung
Copy link
Owner

It turns out to be a non trivial change since it needs to modify the filter/trigger in libmcount at runtime. Currently it initializes them at load time and accesses them in a lock-less way assuming it's not changing. But with dlopen, we need to add or delete some.

@namhyung
Copy link
Owner

Now I think we can do this with #1678.

@namhyung
Copy link
Owner

I've finally got some time to work on it. Please try review/dlopen-args-v1.

$ uftrace -a t-dlopen
# DURATION     TID      FUNCTION
   0.817 us [1259485] | __monstartup();
   0.880 us [1259485] | __cxa_atexit();
            [1259485] | main(1, 0x7ffe278a4028) {
            [1259485] |   dlopen("./libabc_test_lib.so", RTLD_LAZY) {
   1.000  s [1259485] |     /* linux:schedule */
   1.004  s [1259485] |   } = 0x56134422d4c0; /* dlopen */
  11.946 us [1259485] |   dlsym(0x56134422d4c0, "lib_a") = &lib_a;
            [1259485] |   lib_a(1) {
            [1259485] |     lib_b(2) {
   5.932 us [1259485] |       lib_c(1) = 0;
   7.427 us [1259485] |     } = 1; /* lib_b */
   8.738 us [1259485] |   } = 0; /* lib_a */
  69.223 us [1259485] |   dlclose(0x56134422d4c0) = 0;
            [1259485] |   dlopen("./libfoo.so", RTLD_LAZY) {
   1.000  s [1259485] |     /* linux:schedule */
   1.012  s [1259485] |   } = 0x56134422d4c0; /* dlopen */
  10.177 us [1259485] |   dlsym(0x56134422d4c0, "foo") = &foo;
            [1259485] |   foo(1) {
   0.420 us [1259485] |     AAA::bar(1);
   5.701 us [1259485] |   } /* foo */
  85.863 us [1259485] |   dlclose(0x56134422d4c0) = 0;
   2.018  s [1259485] | } = 0; /* main */

@namhyung
Copy link
Owner

Uploaded review/dlopen-args-v2.

  • fixed a compiler error in the unit test
  • add new test cases

@honggyukim
Copy link
Collaborator Author

Thanks very much for the work for this! It works great but I have some more requests.

As you can see below, it doesn't show MODULE NAME for the dlopen-ed lib. In addition, it doesn't show the source location for the functions in the same library.

$ uftrace -a --srcline -f +module t-dlopen 
# DURATION     TID         MODULE NAME   FUNCTION [SOURCE]
   0.548 us [  10331]         t-dlopen | __monstartup();
   0.138 us [  10331]         t-dlopen | __cxa_atexit();
            [  10331]         t-dlopen | main(1, 0x7fff9af927f8) { /* s-dlopen.c:5 */
   1.010  s [  10331]         t-dlopen |   dlopen("./libabc_test_lib.so", RTLD_LAZY) = 0x60a39b663cd0;
   3.550 us [  10331]         t-dlopen |   dlsym(0x60a39b663cd0, "lib_a") = &lib_a;
            [  10331]        [unknown] |   lib_a(1) {
            [  10331]        [unknown] |     lib_b(2) {
   3.107 us [  10331]        [unknown] |       lib_c(1) = 0;
   4.000 us [  10331]        [unknown] |     } = 1; /* lib_b */
   4.391 us [  10331]        [unknown] |   } = 0; /* lib_a */
  34.751 us [  10331]         t-dlopen |   dlclose(0x60a39b663cd0) = 0;
   1.011  s [  10331]         t-dlopen |   dlopen("./libfoo.so", RTLD_LAZY) = 0x60a39b2e1cb0;
   4.984 us [  10331]         t-dlopen |   dlsym(0x60a39b2e1cb0, "foo") = &foo;
            [  10331]        [unknown] |   foo(1) {
   0.230 us [  10331]        [unknown] |     AAA::bar(1);
   4.093 us [  10331]        [unknown] |   } /* foo */
  44.295 us [  10331]         t-dlopen |   dlclose(0x60a39b2e1cb0) = 0;
   2.021  s [  10331]         t-dlopen | } = 0; /* main */

@gichoel
Copy link
Contributor

gichoel commented Dec 1, 2024

I have verified that it works on RISC-V architecture boards.

Before review/dlopen-args-v2:

$ uftrace -a --srcline -f +module t-dlopen
# DURATION     TID         MODULE NAME   FUNCTION [SOURCE]
            [   6829]         t-dlopen | main(0x3f7fce91d4, 0x3fd8ec61d8) { /* ./tests/s-dlopen.c:5 */
 994.514 us [   6829]         t-dlopen |   dlopen("./libabc_test_lib.so", RTLD_LAZY) = 0x2ad5e95070;
   6.000 us [   6829]         t-dlopen |   dlsym(0x2ad5e95070, "lib_a") = &lib_a;
            [   6829]        [unknown] |   lib_a() {
            [   6829]        [unknown] |     lib_b() {
   2.250 us [   6829]        [unknown] |       lib_c();
   4.250 us [   6829]        [unknown] |     } /* lib_b */
   5.500 us [   6829]        [unknown] |   } /* lib_a */
  51.001 us [   6829]         t-dlopen |   dlclose(0x2ad5e95070) = 0;
 180.252 us [   6829]         t-dlopen |   dlopen("./libfoo.so", RTLD_LAZY) = 0x2ad5e95070;
   3.000 us [   6829]         t-dlopen |   dlsym(0x2ad5e95070, "foo") = &foo;
            [   6829]        [unknown] |   foo() {
   0.500 us [   6829]        [unknown] |     AAA::bar();
   2.250 us [   6829]        [unknown] |   } /* foo */
  27.751 us [   6829]         t-dlopen |   dlclose(0x2ad5e95070) = 0;
   1.302 ms [   6829]         t-dlopen | } = 0; /* main */

After review/dlopen-args-v2:

$ ./uftrace -a --libmcount-path=. --srcline -f +module t-dlopen
# DURATION     TID         MODULE NAME   FUNCTION [SOURCE]
            [   5995]         t-dlopen | main(0x3f8d0191d4, 0x3ff20ce1d8) { /* ./tests/s-dlopen.c:5 */
            [   5995]         t-dlopen |   dlopen("./libabc_test_lib.so", RTLD_LAZY) {
   1.000  s [   5995]          [event] |     /* linux:schedule */
  63.251 us [   5995]          [event] |     /* linux:schedule (pre-empted) */
   1.041  s [   5995]         t-dlopen |   } = 0x2ac75340b0; /* dlopen */
  34.250 us [   5995]         t-dlopen |   dlsym(0x2ac75340b0, "lib_a") = &lib_a;
            [   5995]        [unknown] |   lib_a(0x2ac6fb49c0) {
            [   5995]        [unknown] |     lib_b(0x3f8d13655e) {
  14.250 us [   5995]        [unknown] |       lib_c(0x3f8d136592) = 0;
  27.500 us [   5995]        [unknown] |     } = 1; /* lib_b */
  39.501 us [   5995]        [unknown] |   } = 0; /* lib_a */
 269.254 us [   5995]         t-dlopen |   dlclose(0x2ac75340b0) = 0;
            [   5995]         t-dlopen |   dlopen("./libfoo.so", RTLD_LAZY) {
 164.502 us [   5995]          [event] |     /* linux:schedule (pre-empted) */
  77.251 us [   5995]          [event] |     /* linux:schedule (pre-empted) */
  75.501 us [   5995]          [event] |     /* linux:schedule (pre-empted) */
  76.501 us [   5995]          [event] |     /* linux:schedule (pre-empted) */
  74.501 us [   5995]          [event] |     /* linux:schedule (pre-empted) */
  76.251 us [   5995]          [event] |     /* linux:schedule (pre-empted) */
  89.502 us [   5995]          [event] |     /* linux:schedule (pre-empted) */
  29.500 us [   5995]          [event] |     /* linux:schedule (pre-empted) */
   1.000  s [   5995]          [event] |     /* linux:schedule */
   1.057  s [   5995]         t-dlopen |   } = 0x2ac7551250; /* dlopen */
  29.251 us [   5995]         t-dlopen |   dlsym(0x2ac7551250, "foo") = &foo;
            [   5995]        [unknown] |   foo(0x2ac6fb4a06) {
   4.250 us [   5995]        [unknown] |     AAA::bar(0x3f8d1365ae);
  16.000 us [   5995]        [unknown] |   } /* foo */
 293.755 us [   5995]         t-dlopen |   dlclose(0x2ac7551250) = 0;
   2.100  s [   5995]         t-dlopen | } = 0; /* main */

However, it does not output values on the RISC-V architecture as issue #1876 is not yet resolved.

@honggyukim
Copy link
Collaborator Author

Hi @gichoel, did you compile it with gcc? Or clang?

@gichoel
Copy link
Contributor

gichoel commented Dec 1, 2024

@honggyukim uftrace was compiled with gcc, and the library and test code build commands used in the example are shown below.

$ gcc -pg -g -shared -fPIC ./tests/s-lib.c -o libabc_test_lib.so
$ g++ -pg -g -shared -fPIC ./tests/s-libfoo.cpp -o libfoo.so
$ gcc -pg -g ./tests/s-dlopen.c -o t-dlopen -ldl

As per the intent of the question, I changed the CC in the .config to clang and tried to build with the clang compiler using the command below.

$ cat .config
# this file is generated automatically
override prefix := /usr/local
override bindir := /usr/local/bin
override libdir := /usr/local/lib/uftrace
override mandir := /usr/local/share/man
override etcdir := /usr/local/etc

override ARCH   := riscv64
override CC     := clang
override LD     := ld
override CFLAGS  =
override LDFLAGS =
override ANDROID =

override srcdir := /home/user/workdir/uftrace
override objdir := /home/user/workdir/uftrace

$ make clean
$ make -j$(nproc)

$ clang -pg -g -shared -fPIC ./tests/s-lib.c -o libabc_test_lib.so
$ clang -pg -g -shared -fPIC ./tests/s-libfoo.cpp -o libfoo.so
$ clang -pg -g ./tests/s-dlopen.c -o t-dlopen -ldl

After building with clang, I ran it and saw that the argument values were output like below:

$ ./uftrace -a --libmcount-path=. --srcline -f +module t-dlopen
# DURATION     TID         MODULE NAME   FUNCTION [SOURCE]
            [   1854]         t-dlopen | main(1, 0x3fdd9221e8) { /* ./tests/s-dlopen.c:5 */
            [   1854]         t-dlopen |   dlopen("./libabc_test_lib.so", RTLD_LAZY) {
   1.000  s [   1854]          [event] |     /* linux:schedule */
   1.042  s [   1854]         t-dlopen |   } = 0x2aab36b2a0; /* dlopen */
  37.000 us [   1854]         t-dlopen |   dlsym(0x2aab36b2a0, "lib_a") = &lib_a;
            [   1854]        [unknown] |   lib_a(1) {
            [   1854]        [unknown] |     lib_b(2) {
  16.500 us [   1854]        [unknown] |       lib_c(1) = 0;
  31.000 us [   1854]        [unknown] |     } = 1; /* lib_b */
  43.501 us [   1854]        [unknown] |   } = 0; /* lib_a */
 278.505 us [   1854]         t-dlopen |   dlclose(0x2aab36b2a0) = 0;
            [   1854]         t-dlopen |   dlopen("./libfoo.so", RTLD_LAZY) {
  32.751 us [   1854]          [event] |     /* linux:schedule (pre-empted) */
   1.000  s [   1854]          [event] |     /* linux:schedule */
   1.056  s [   1854]         t-dlopen |   } = 0x2aaca12180; /* dlopen */
  32.001 us [   1854]         t-dlopen |   dlsym(0x2aaca12180, "foo") = &foo;
            [   1854]        [unknown] |   foo(1) {
   4.000 us [   1854]        [unknown] |     AAA::bar(1);
  48.501 us [   1854]        [unknown] |   } /* foo */
 278.504 us [   1854]         t-dlopen |   dlclose(0x2aaca12180) = 0;
   2.099  s [   1854]         t-dlopen | } = 0; /* main */


However, I found that the argument value is not output under the following conditions:

$ cat .config
# this file is generated automatically
override prefix := /usr/local
override bindir := /usr/local/bin
override libdir := /usr/local/lib/uftrace
override mandir := /usr/local/share/man
override etcdir := /usr/local/etc

override ARCH   := riscv64
override CC     := clang
override LD     := ld
override CFLAGS  =
override LDFLAGS =
override ANDROID =

override srcdir := /home/user/workdir/uftrace
override objdir := /home/user/workdir/uftrace

$ gcc -pg -g -shared -fPIC ./tests/s-lib.c -o libabc_test_lib.so
$ g++ -pg -g -shared -fPIC ./tests/s-libfoo.cpp -o libfoo.so
$ gcc -pg -g ./tests/s-dlopen.c -o t-dlopen -ldl

$ ./uftrace -a --libmcount-path=. --srcline -f +module t-dlopen
# DURATION     TID         MODULE NAME   FUNCTION [SOURCE]
            [   2440]         t-dlopen | main(0x3fb48071d4, 0x3fed9581e8) { /* ./tests/s-dlopen.c:5 */
            [   2440]         t-dlopen |   dlopen("./libabc_test_lib.so", RTLD_LAZY) {
  31.250 us [   2440]          [event] |     /* linux:schedule (pre-empted) */
   1.000  s [   2440]          [event] |     /* linux:schedule */
   1.042  s [   2440]         t-dlopen |   } = 0x2ae5e42990; /* dlopen */
  37.001 us [   2440]         t-dlopen |   dlsym(0x2ae5e42990, "lib_a") = &lib_a;
            [   2440]        [unknown] |   lib_a(0x2ae5af49c0) {
            [   2440]        [unknown] |     lib_b(0x3fb492455e) {
  14.750 us [   2440]        [unknown] |       lib_c(0x3fb4924592) = 0;
  30.000 us [   2440]        [unknown] |     } = 1; /* lib_b */
  42.501 us [   2440]        [unknown] |   } = 0; /* lib_a */
 279.005 us [   2440]         t-dlopen |   dlclose(0x2ae5e42990) = 0;
            [   2440]         t-dlopen |   dlopen("./libfoo.so", RTLD_LAZY) {
 
   ......

   1.055  s [   2440]         t-dlopen |   } = 0x2ae7404710; /* dlopen */
  31.000 us [   2440]         t-dlopen |   dlsym(0x2ae7404710, "foo") = &foo;
            [   2440]        [unknown] |   foo(0x2ae5af4a06) {
   4.250 us [   2440]        [unknown] |     AAA::bar(0x3fb49245ae);
  16.250 us [   2440]        [unknown] |   } /* foo */
 281.505 us [   2440]         t-dlopen |   dlclose(0x2ae7404710) = 0;
   2.098  s [   2440]         t-dlopen | } = 0; /* main */

@honggyukim
Copy link
Collaborator Author

honggyukim commented Dec 1, 2024

Compiling uftrace is not an issue here. The matter is which compiler is used for example code compilation among gcc or clang.

It's because the first argument cannot be properly read for binaries compiled with gcc. This is a known issue as I reported in AArch64 at #778 and #781 but RISC-V has the same problem.

This problem is related to the combination with AArch64 or RISC-V with -pg flag only in gcc so you need to test it only with clang compiled binaries when testing it with RISC-V.

@namhyung
Copy link
Owner

namhyung commented Dec 2, 2024

Pushed review/dlopen-args-v3 for module and srcline support.

$ uftrace -a --srcline -f +module t-dlopen
# DURATION     TID         MODULE NAME   FUNCTION [SOURCE]
   0.471 us [1501839]         t-dlopen | __monstartup();
   0.309 us [1501839]         t-dlopen | __cxa_atexit();
            [1501839]         t-dlopen | main(1, 0x7fffd4ba5c58) { /* s-dlopen.c:5 */
            [1501839]         t-dlopen |   dlopen("./libabc_test_lib.so", RTLD_LAZY) {
   1.000  s [1501839]          [event] |     /* linux:schedule */
   1.004  s [1501839]         t-dlopen |   } = 0x564cebf934c0; /* dlopen */
  11.150 us [1501839]         t-dlopen |   dlsym(0x564cebf934c0, "lib_a") = &lib_a;
            [1501839] libabc_test_lib. |   lib_a(1) { /* s-lib.c:10 */
            [1501839] libabc_test_lib. |     lib_b(2) { /* s-lib.c:15 */
   7.404 us [1501839] libabc_test_lib. |       lib_c(1) = 0; /* s-lib.c:20 */
   9.115 us [1501839] libabc_test_lib. |     } = 1; /* lib_b */
  10.346 us [1501839] libabc_test_lib. |   } = 0; /* lib_a */
 100.142 us [1501839]         t-dlopen |   dlclose(0x564cebf934c0) = 0;
            [1501839]         t-dlopen |   dlopen("./libfoo.so", RTLD_LAZY) {
   1.000  s [1501839]          [event] |     /* linux:schedule */
   1.006  s [1501839]         t-dlopen |   } = 0x564cebf934c0; /* dlopen */
   9.641 us [1501839]         t-dlopen |   dlsym(0x564cebf934c0, "foo") = &foo;
            [1501839]        libfoo.so |   foo(1) { /* s-libfoo.cpp:12 */
   0.446 us [1501839]        libfoo.so |     AAA::bar(1);
   5.328 us [1501839]        libfoo.so |   } /* foo */
  91.842 us [1501839]         t-dlopen |   dlclose(0x564cebf934c0) = 0;
   2.011  s [1501839]         t-dlopen | } = 0; /* main */

@honggyukim
Copy link
Collaborator Author

Thanks. It works great! But I'm wondering if we can support location filter for dlopen-ed functions as well.

$ uftrace -a --srcline -f +module -L s-dlopen.c t-dlopen 
# DURATION     TID         MODULE NAME   FUNCTION [SOURCE]
   2.023  s [   9711]         t-dlopen | main(1, 0x7ffd68536958) = 0; /* s-dlopen.c:5 */

$ uftrace -a --srcline -f +module -L s-lib.c t-dlopen 
Usage: failed to set filter or trigger: s-lib.c
# DURATION     TID         MODULE NAME   FUNCTION [SOURCE]
   0.439 us [   9603]         t-dlopen | __monstartup();
   0.144 us [   9603]         t-dlopen | __cxa_atexit();
            [   9603]         t-dlopen | main(1, 0x7ffe6d0e6b48) { /* s-dlopen.c:5 */
   1.010  s [   9603]         t-dlopen |   dlopen("./libabc_test_lib.so", RTLD_LAZY) = 0x5df2abfb2ac0;
            [   9603] libabc_test_lib. |   lib_a(1) { /* s-lib.c:10 */
            [   9603] libabc_test_lib. |     lib_b(2) { /* s-lib.c:15 */
   7.897 us [   9603] libabc_test_lib. |       lib_c(1) = 0; /* s-lib.c:20 */
   9.788 us [   9603] libabc_test_lib. |     } = 1; /* lib_b */
  12.109 us [   9603] libabc_test_lib. |   } = 0; /* lib_a */
  11.925 us [   9603]         t-dlopen |   dlsym(0x5df2aaab8c40, "foo") = &foo;
            [   9603]        libfoo.so |   foo(1) { /* s-libfoo.cpp:12 */
   0.613 us [   9603]        libfoo.so |     AAA::bar(1);
   9.698 us [   9603]        libfoo.so |   } /* foo */
  99.084 us [   9603]         t-dlopen |   dlclose(0x5df2aaab8c40) = 0;
   2.025  s [   9603]         t-dlopen | } = 0; /* main */

@namhyung
Copy link
Owner

namhyung commented Dec 3, 2024

Well.. filters with dlopen can be tricky. And I think it works already.

When it finds no matching entries for the filter, it ignores the filter and shows the whole functions. So in the beginning, the filters won't work since it has no match. But after the first dlopen(), it updates the filter tree and finds one. And the filter is working and it dropped dlsym() before lib_a() as well as dlclose() and dlopen() after lib_a().

But after the second dlopen() it again updated the filters, and it has no entries now. So the filters are effectively doing nothing and it shows the every functions after that.

@honggyukim
Copy link
Collaborator Author

If that is complicated then can we apply it at least for analysis commands such as replay? It's not urgent requirement but I'm just asking.

@namhyung
Copy link
Owner

namhyung commented Dec 5, 2024

Let's handle filters separately. For now, I'd like to land the argument support first.

@honggyukim
Copy link
Collaborator Author

OK. Let' do it later. Thanks for the work!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants