Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

gmm-init-mono: double free or corruption #2709

Closed
LeLiu opened this issue Sep 15, 2018 · 6 comments
Closed

gmm-init-mono: double free or corruption #2709

LeLiu opened this issue Sep 15, 2018 · 6 comments

Comments

@LeLiu
Copy link
Contributor

LeLiu commented Sep 15, 2018

gmm-init-mono crushed when i run the thchs30 scripts.

screen shot 2018-09-15 at 15 11 25

something i do wrong or it's a bug? how can i fix it ?

log file:
exp/mono/log/init.log

@danpovey
Copy link
Contributor

It's unlikely to be a bug because that code is so old. It could be a compilation issue or a problem with your machine, maybe. Try "make depend" and "make" in src/, with -j options, to see if it's properly compiled. Also test whether that command fails on all your types of machine.
Also, run it in valgrind and see if it finds anything. The command is the line after #, starting with gmm-init-mono. Run that, preceded by valgrind. If that doesn't work you can try in gdb (gdb --args [program] [args], type "r" at the prompt, and "bt" when it crashes).

@LeLiu
Copy link
Contributor Author

LeLiu commented Sep 17, 2018

When use "run.pl" to instead of "queue.pl" for excution, no error occuerd. I'm trying to find out why.

@danpovey
Copy link
Contributor

Maybe some weird libc version mismatch issue, although normally it would refuse to execute at all. but gdb or valgrind might show more, if you log into the machine where it faile.

@LeLiu
Copy link
Contributor Author

LeLiu commented Sep 18, 2018

This happens because I used
./configure --shared --static-fst --static-math --mathlib=OPENBLAS --openblas-root=../tools/OpenBLAS/install/ to generate the "kaldi.mk". When remove the "--shared" option from the command (./configure --static-fst --static-math --mathlib=OPENBLAS --openblas-root=../tools/OpenBLAS/install/) to do recomplilation, the new none-shared version worked well.

I googled this, there are some people have been troubled by this problem in other projects. It's seemed to be a adverse side effect of the "-fPIC" option. This option make class's static members and global varibles in a 'GOT' segment of the lib. Symbols in 'GOT' allocated when lib loading, freed when lib unloading. But, if there are symbols with the same name in two or more libs's 'GOT', the loader always free the one in the first loaded lib (this is why 'double free').

If use gdb to analysis the coredump file of "gmm-init-mono" (gdb gmm-init-mono core.114234), the stack-back-trace like this:

#0  0x00002b9d15fcc1f7 in raise () from /lib64/libc.so.6
#1  0x00002b9d15fcd8e8 in abort () from /lib64/libc.so.6
#2  0x00002b9d1600bf47 in __libc_message () from /lib64/libc.so.6
#3  0x00002b9d16013619 in _int_free () from /lib64/libc.so.6
#4  0x00002b9d15834e43 in std::basic_string<char, std::char_traits<char>, std::allocator<char> >::~basic_string() () from /lib64/libstdc++.so.6
#5  0x00002b9d15fcfdda in __cxa_finalize () from /lib64/libc.so.6
#6  0x00002b9d129a7393 in __do_global_dtors_aux () from /nfs/data/disk05/kaldi-base/0.0.4-180914/debug/src/lib/libkaldi-hmm.so
#7  0x00007ffe3bb07930 in ?? ()
#8  0x00002b9d10ffab7a in _dl_fini () from /lib64/ld-linux-x86-64.so.2

It's mean there are same symbols in libkaldi-hmm.so and other so file simultaneously.

Then, i used objdump -x -R libkaldi-hmm.so | grep R_X86_64_GLOB_DAT (do the same with other so libs), find that there many same-name variables in diffrent so files. So I think myabe this issue can be triggerd every time when use '--shared' option to compile kaldi.


The fllowing is a simple example for this '--shared' (or '-fPIC') option issue.

test.zip

Make and run the './test', will crushed with same error in the init.log, if use gdb to run, can get a same stack-back-trace.

For me, the kaldi codes are too complex to find out exact lines, but i think this may be the reason.

@danpovey
Copy link
Contributor

danpovey commented Sep 18, 2018 via email

@LeLiu
Copy link
Contributor Author

LeLiu commented Sep 18, 2018

Thanks for you reply :).

As you said, It's a compilation problem. when i remove "--static-fst" from the configure command, also succeed. I actually made a mistake to use "--shared" with "--static-fst" . Default, it caused complie errors(like this: /usr/bin/ld: kaldi-base.a(kaldi-math.o): relocation R_X86_64_PC32 against symbol `_ZN5kaldi4RandEPNS_11RandomStateE' can not be used when making a shared object; recompile with -fPIC /usr/bin/ld: final link failed: Bad value), but i forced a change in the comliation options ( I have just forgot this earlier ).

And, i didn't find anything particularly different about the machine.

Thank you again.

@LeLiu LeLiu closed this as completed Sep 21, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants