-
Notifications
You must be signed in to change notification settings - Fork 158
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Test4_4 seems to be in deadlock on amd64_ubu14 #274
Comments
Is the failure consistent? It seems to be fine on Fedora 25. A lot of suspicious lines in that log, like these stand out to me:
|
It's quite consistent (deterministic, even) and only showing up on Ubuntu 14 so far as we've been able to tell. We do seem to have something go sideways in parsing such that it takes quite a long time to make it through libc, but that also seems to be triggering some bad proccontrol-level behavior. |
FWIW, the infinite loop is not the new one I added, as that would say "Waiting for neonatal threads" in the log. It's the testsuite's Do you have an older commit that works on the same machine? If so, can you |
The lines about failing to read user thread info and getTIDs are red herrings. Dyninst isn't careful about not reading user-thread info before that info is initialized, but it's not harmful. I think the problem starts on line 5436 of the DYNINST_DEBUG_PROCCONTROL trace. After an exec, ProcControlAPI goes to re-create the breakpoint that monitors library loading, but fails to insert it. In the new process that's inserted into 7faba38e3940. In the previous process that was 7f45619a7940. Since only the upper bits changed, I'm guessing we have the symbol offset correct and the ld.so library's load address incorrect. The ld.so load address should have come from the interpreter base read via /proc/PID/auxv in AuxvParser::readAuxvInfo in common/src/linuxKludges.C. Unfortunately, there's no good debug printing there, so you may have to add some to see if something's going wrong. |
I can reproduce this running as a normal user on Ubuntu 14.04, but as root it's fine. Root indeed gets the same "Error: Attempt to read user thread info...", but does not get the "Error: plat_readMem failed!" I suspected Ubuntu's "yama" security model, and I'm surprised that |
Maybe |
I'd normally expect that if it were ptrace_scope issues then we wouldn't be able to operate on child process 19593 at all, but we handled its fork just fine. It's after the exec that we lose access to the process. In the above log, the parent process 19591 exits soon before we start having problems with 19593. Could 19591's exit cause a re-parenting of 19593, and trigger the yama/ptrace_scope restrictions that prevent you from debugging processes you're not the parent of? We might be able to test that by throwing a sleep before the exit of the test4_4 parent. I'm curious, does test pc_fork_exec also fail on these systems? |
OK, that's interesting about re-parenting though. The actual call that's failing is Now, ptrace only checks for security access when you're first attaching, whereas process_vm makes the check every time you call it. So There is an escape hatch in I'll send a PR for the process_vm I'm also tempted to send a kernel patch for yama to allow active ptracers free reign. I'll think on it. |
Having sysctl kernel.yama.ptrace_scope=1, one may only call ptrace attach on direct descendants. The same restriction is also checked for `process_vm_readv`/`writev` and certain procfs files. However, if an intermediate parent process already exited, we could end up with a grandchild that we're still ptracing but isn't our descendant, so we can't use the `process_vm` functions anymore -> `EPERM`. We already had a fallback here for `EFAULT`, to just use `ptrace` memory access, so use the same fallback after `EPERM` too. Fixes dyninst#274.
|
@cuviper We observed that current master branch will lead to a deadlock when running test4_4 on Ubuntu 14. Preliminary debugging suggests that the mutator is forever looping for checking whether two threads have finished. Merging your two pull requests about proccontrol does not seem solve this issue.
I attached the debugging log by enabling DYNINST_DEBUG_PROCCONTROL.
Do you have any insights on this issue?
https://gist.github.com/mxz297/23be1aa22fdecee056aa9fbfbe41464a
The text was updated successfully, but these errors were encountered: