Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

dlclose() doesn't always unload the library #887

Closed
afalkenhahn opened this issue Jan 12, 2019 · 9 comments
Closed

dlclose() doesn't always unload the library #887

afalkenhahn opened this issue Jan 12, 2019 · 9 comments

Comments

@afalkenhahn
Copy link

This is giving me headaches: After hours of debugging I've found out dlclose() doesn't always unload the library which has the consequence that static global variables inside the library still keep their old values which I don't want.

Unfortunately, the error isn't easy to reproduce because it only happens sometimes. Often the library has all static globals initialized to 0 after dlopen() but sometimes they keep their state, although I called dlclose() on it.

Of course, dlclose() will only completely unload the library if the reference count drops to zero but there are no other instances around that I'm aware of so it really should be unloaded immediately. I've also made sure that there are no imports like libstdc++ which could use STB_GNU_UNIQUE which could prevent unloading.

Here are the imports of the library I'm dlopen()ing:

0x0000000000000001 (NEEDED) Shared library: [liblog.so]
0x0000000000000001 (NEEDED) Shared library: [libandroid.so]
0x0000000000000001 (NEEDED) Shared library: [libOpenSLES.so]
0x0000000000000001 (NEEDED) Shared library: [libGLESv2.so]
0x0000000000000001 (NEEDED) Shared library: [libz.so]
0x0000000000000001 (NEEDED) Shared library: [libdl.so]
0x0000000000000001 (NEEDED) Shared library: [libc.so]
0x0000000000000001 (NEEDED) Shared library: [libm.so]

It's really pure C. There shouldn't be anything preventing the library from being completely unloaded on dlclose(). But it happens. Unfortunately, only sporadically, so I'm unable to come up with a test case but at least I was able to reproduce the problem on Android systems up to the latest version of P.

After 12 hours of debugging I'm really desperate and I don't know to solve this. Of course, I could manually set all static globals to 0 after calling dlopen() on my library but since it is a huge project, this can only be a last resort.

So that's why I'd like to hear some feedback here first on this issue. Is this known? Does anybody have an idea how this could happen?

My set up is this: In the onCreate() in the Java activity, the Java activity first calls System.loadLibrary() on another shared library which then calls dlopen() and dlclose(). So there are actually two shared libraries involved: One is loaded by the Java code and the other one, i.e. the one that sometimes can't be unloaded correctly, is loaded by the one that is loaded by the Java code.

@alexcohn
Copy link

Do you check the return value from dlclose() ?

@dimitry-
Copy link
Contributor

Does it happen on all versions of android?

Your library unload may be postponed when there are registered thread_local destructors (for example when there is a thread_local variable of a class). In which case the library gets unloaded once last thread called the destructor. This behavior was introduced in P if I am not mistaken.

@afalkenhahn
Copy link
Author

Do you check the return value from dlclose() ?

Yes, I did, and it returned 0, i.e. success.

Does it happen on all versions of android?

I've tested several devices using Android 7, 8, and 9 and it happened on all of them. Unfortunately, it's very difficult to reproduce and I really spent like 12 hours on it, but it DOES happen :(

Your library unload may be postponed when there are registered thread_local destructors (for example when there is a thread_local variable of a class). In which case the library gets unloaded once last thread called the destructor. This behavior was introduced in P if I am not mistaken.

I don't use thread local variables so this can't be the reason either.

@enh
Copy link
Contributor

enh commented Jan 14, 2019

does following the instructions (https://android.googlesource.com/platform/bionic/+/master/android-changes-for-ndk-developers.md#enable-logging-of-dlopen_dlsym-and-library-loading-errors-for-apps-available-in-android-o) to turn on logging for dlopen/dclose tell you anything interesting in the failure cases?

@DanAlbert
Copy link
Member

DanAlbert commented Jan 14, 2019

While we do expect dlclose to actually unload when possible on P and newer, note that it is not specified to do anything: http://pubs.opengroup.org/onlinepubs/9699919799/functions/dlclose.html. If your library is intended to reset its state on close/reopen, aiui you should do so explicitly.

@afalkenhahn
Copy link
Author

@enh: Thanks for the suggestion, so now it's getting interested. Enabling dlopen/dlclose logging through adb indeed tells me something interesting on dlclose, namely:

...dlclose(...............) ... not unloading - decrementing ref_count to 1

So there it is. But there is only one call to dlopen() for this library. I can verify this by looking at the logs where each dlopen() call is listed and, as expected, there is only one call to dlopen() on my library.

So how on earth can it be that the internal ref_count is 2 so that dlclose() doesn't unload it but decrements the ref_count instead? This doesn't make sense to me at all... is there some autoloading going on when an app is first started that increases the ref_count or how does this come to be? In the logs there is definitely only one dlopen() to my library.

@DanAlbert
Copy link
Member

Was some other library that depends on this one loaded? A dependency will also increase the ref count.

System.loadLibrary will also increase the ref count.

Do you have any thread local globals with non-trivial destructors? Those are the other case (mentioned above) and they will also cause the ref count to increase.

@DanAlbert
Copy link
Member

In the logs there is definitely only one dlopen() to my library.

(never mind the System.loadLibrary option, this would have been logged as dlopen)

@afalkenhahn
Copy link
Author

Was some other library that depends on this one loaded? A dependency will also increase the ref count.

Right, that was indeed the case. The library that calls the dlopen() on the other library actually also had a dependency on that library which of course explains the ref_count being one too high.

Now I'm feeling a little bit embarrassed but I'm almost sure that I tested removing this unneeded dependency when I debugged the issue on Saturday but maybe something went wrong with the build so that the issue didn't go away but of course this really must have been the cause of all the problems and I can clearly reproduce it. With the dependency the ref_count is one too high, without it the library can be unloaded correctly. Issue solved I guess.

Thanks for your help and sorry for not seeing the obvious!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants