-
Notifications
You must be signed in to change notification settings - Fork 562
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ASSERT and CRASH with two clients: regression in 8.0 #4501
Comments
Can you elaborate on this method please? AIUI the multiple clients feature is the best (only?) method for the following use-case: a proprietary/closed-source client running in conjunction with an open-source client. |
Mixing closed-source and open-source should work just as well with libraries. The library interface can easily be a binary interface with the implementation a black box. If the two clients are truly self-contained and separate and the output from one won't be used by the other, running them as separate top-level clients can make sense. But when trying to combine features into a single tool, libraries seem the way go, allowing clean programmatic control over the internals. Running completely separate clients/tools does seem to happen quite a bit for things like drcachesim address trace analysis, running different analyses at once: but there we generally do offline analysis and the analysis tools are not DR clients and are quite different, with no concerns such as scratch registers or resource isolation that a DR client has which can make it difficult to combine or run simultaneously. But, at least in our usage, it seems rare to run two separate DR clients at once: do you end up doing that? For combining features, a library interface with explicit API calls generally works better than having separate top-level clients. Separating out concerns, isolating functionality, and creating modular units makes these libraries more re-usable. E.g., where possible it seems better to have a library function that takes in a scratch register, rather than a function that goes and obtains its own scratch registers, pushing all responsibility for which scratches to use up to the top level client, avoiding conflicts and confusion over the scratch strategy. Similarly, for logging or how to handle errors it is best to push that up to a single top-level client. Thus, instead of just having a drcov code coverage client that is run at the top level alongside others, we have a drcovlib code coverage library that can be added to any client (and is used by drmemory for fuzzing features). Instead of just a drsyscall or drstrace client, we have a drsyscall extension library. |
Thanks for the detailed answer @derekbruening.
Yes, frequently. A typical case is a user running their instrumentation client with our closed-source emulation client. |
I assume your emulation client has a simple interaction with other clients: it only has an app2app transformation and nothing else, e.g. I also assume it could alternatively be a library that a user's instrumentation client would link with, instead of a separate client. |
I would like to add another use case for using several top-level clients: one a compulsory emulation client which is necessary for an application to run, another is an optional client which a user can add for additional features (e.g. memory tracing, instruction counting, performance analysis etc). It might be easier to develop separate optional clients to enable different types of analysis (and also to allow users to develop their own clients following some set of restrictions built on top of restrictions for all DR clients). Yes, they all can be linked to the emulation library, but one may still want to use several such clients together, in which case them being linked to the emulation library will create a problem for it (this library will have to handle multiple requests). I think supporting several clients is a very useful feature. We can still use libraries approach, but when more convenient we can allow users to select which clients they are loading. |
@derekbruening, could you please explain how to obtained this log? it looks very much like a call trace, I have always struggled with debug assert reports because they point to a place in code but don't produce a Java-like stack trace. Thank you. |
Yes, it is just a backtrace in gdb. Likely you are hitting one or both of two problems in gdb: 1) gdb does not do a good job of walking the callstack when the current $pc is not in a known function, such as when it's in generated code, even when a simple frame pointer walk would work; 2) DR and its libraries are not loaded by ld.so, so gdb does not find the symbols. For 1) see https://github.com/DynamoRIO/dynamorio/wiki/Debugging#call-stacks; for 2) see https://github.com/DynamoRIO/dynamorio/wiki/Debugging#loading-client-symbols. I also have a python gdb helper that automates symbols for DR itself; I should check it in. |
There's also libdynamorio.so-gdb.py but it has been broken for a long time: #2100. |
Sure, SGTM. Let's get a test added to avoid future regressions. |
The main interaction between clients is via The distinct separation at runtime of a collection of instrumentation clients from a collection of emulation clients is convenient not least because we can have more than one emulation client running with an instrumentation client, with the selection of which emulation client to load based on h/w capability. |
I'm trying to get to the bottom of it (still in progress). It looks like |
One approach would be a |
Found: 9293e7a is the first bad commit.
results in
@derekbruening, I would be grateful fi you could look at it because the diff is quite complex. Maybe you could point me to some place in the diff? Thank you. |
When It happens because during
but of course this first module has not been fully initialised yet. Similarly only one module of all module has been initialised by this point. Somehow we should call |
I just tried 9.0.19097 and it seems this issue is still present and coming from the same place at
In debug build:
|
This is related to #3850: when looking up symbols, DR tries to obtain @derekbruening, would it be correct to skip uninitialised modules when I have tried this fix on x86 and Arm64 and it seems to be working. |
Thank you for this explanation: this makes it clear. It looks like privload_load_process() calls privload_add_areas() before processing imports or relocs, and privload_add_areas() is what calls privload_create_os_privmod_data(). So would a solution be to have privload_process_early_mods() first do a pass through the early libs and call privload_add_areas() for each, and then a separate pass to call privload_load_process(), and have privload_load_process() skip the call to privload_add_areas() if Please also add a regression test to the test suite with two clients to ensure this use case doesn't break in the future. |
@derekbruening, yes, this solution works too. But there is another issue -- when in the end the libraries are unloaded and when DR calls Note: this behaviour can only be reproduced in release build because in debug mode part of the code that is responsible for calling the destructors is not used. It appears that by the time this second destructor is called the PLT entry |
The above segfault is happening in glibc's
The calls stack looks like this (the top frame is shown incorrectly due to some glitch in GDB -- it in fact refers to
|
Correction: It seems like the proposed solution results in a memory leak. The solution I originally used (skipping clients in the I didn't notice it at first because the fini_array segfault only appears in non-debug build while only debug build would enable checks for memory leaks. |
How did this used to work, before 9293e7a? |
Some more specific questions: Apparently the main purpose of 9293e7a was to add TLS handling to the Windows loader. Was the patch intended to have no effect (in some sense) on the way the loader works on Linux? Multiple clients used to work on Linux. Did they used to work on Windows? Do multiple clients now/still work on Windows? If 9293e7a did not add any new feature to Linux then presumably the previous code demonstrates that it is possible to have multiple clients on Linux (though perhaps something has been added since then that is incompatible with the way the loader used to work). Is there a good (fundamental) reason why multiple clients on Windows would be hard to implement, something to do with the way TLS and the calling of initialisers work on Windows, for example? Would it be acceptable to have multiple clients working only on Linux? Would it be acceptable to separate out the Linux and Windows loaders a bit more, at the cost of increasing the total quantity of code, if that makes it easier for developers, who may not have access to both systems, to more easily improve the implementation on one without breaking it on the other? Thanks for any answers. I'm still not able to see the wood for the trees in the private loader. |
I would vote for not adding code divergence for different platforms in the shared private loader code as it will just increase complexity and maintenance of what is already complex. It sounded like @yury-khrustalev had a solution by modifying |
Some other client modules because some will not be initialised and clients should be leaves of the dependency tree and not provide symbols for other modules. Also add a test that two clients can be loaded without a crash. Fixes #4501 Change-Id: I9eb4838b349e06094653491c669ab133e92c048c
But:
Debug build reports this assert:
Along with fixing this, we need to add a test of multiple clients.
We have tests of registering multiple clients but apparently not of running multiple
clients which is a little surprising. OTOH this is a rarely used feature these days:
instead we refactor clients into libraries for explicit coordination.
The text was updated successfully, but these errors were encountered: