Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Non-deterministic deadlock #85

Open
alexkamp opened this issue Mar 20, 2018 · 0 comments
Open

Non-deterministic deadlock #85

alexkamp opened this issue Mar 20, 2018 · 0 comments

Comments

@alexkamp
Copy link

hey,

I am using NuProcess and I have an issue with one of my unit tests. In the test, I start 'sleep 10s' as external process and use a java.util.Timer to call destroy() on the process after 1 second. The test does not always terminate. This is surprising, because even if destroy() does not work, it should terminate after 10 seconds, because sleep terminates after 10 seconds.

It is incredibly hard to debug this, because this is non-deterministic. Sometimes everything works fine. It always does when I attach a debugger.

Here is the output of 'jstack' on a deadlocked instance:

2018-03-20 11:03:48
Full thread dump OpenJDK 64-Bit Server VM (25.111-b14 mixed mode):

"Attach Listener" #17 daemon prio=9 os_prio=0 tid=0x00007f0748001000 nid=0x654 waiting on condition [0x0000000000000000]
   java.lang.Thread.State: RUNNABLE

"process reaper" #15 daemon prio=10 os_prio=0 tid=0x00007f0730268800 nid=0x62c waiting on condition [0x00007f075c0e1000]
   java.lang.Thread.State: TIMED_WAITING (parking)
	at sun.misc.Unsafe.park(Native Method)
	- parking to wait for  <0x00000000e39b98f0> (a java.util.concurrent.SynchronousQueue$TransferStack)
	at java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:215)
	at java.util.concurrent.SynchronousQueue$TransferStack.awaitFulfill(SynchronousQueue.java:460)
	at java.util.concurrent.SynchronousQueue$TransferStack.transfer(SynchronousQueue.java:362)
	at java.util.concurrent.SynchronousQueue.poll(SynchronousQueue.java:941)
	at java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1066)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1127)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
	at java.lang.Thread.run(Thread.java:745)

"Timer-0" #13 prio=5 os_prio=0 tid=0x00007f073020b800 nid=0x62a in Object.wait() [0x00007f07341fd000]
   java.lang.Thread.State: WAITING (on object monitor)
	at java.lang.Object.wait(Native Method)
	- waiting on <0x00000000e35d7010> (a java.util.TaskQueue)
	at java.lang.Object.wait(Object.java:502)
	at java.util.TimerThread.mainLoop(Timer.java:526)
	- locked <0x00000000e35d7010> (a java.util.TaskQueue)
	at java.util.TimerThread.run(Timer.java:505)

"/127.0.0.1:60474 to /127.0.0.1:33567 workers Thread 3" #12 prio=5 os_prio=0 tid=0x00007f0784545800 nid=0x629 runnable [0x00007f075da1a000]
   java.lang.Thread.State: RUNNABLE
	at sun.nio.ch.EPollArrayWrapper.epollWait(Native Method)
	at sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:269)
	at sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:93)
	at sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:86)
	- locked <0x00000000836c78a8> (a sun.nio.ch.Util$3)
	- locked <0x00000000836c7898> (a java.util.Collections$UnmodifiableSet)
	- locked <0x00000000836c7690> (a sun.nio.ch.EPollSelectorImpl)
	at sun.nio.ch.SelectorImpl.select(SelectorImpl.java:97)
	at sun.nio.ch.SelectorImpl.select(SelectorImpl.java:101)
	at org.gradle.internal.remote.internal.inet.SocketConnection$SocketInputStream.read(SocketConnection.java:178)
	at com.esotericsoftware.kryo.io.Input.fill(Input.java:139)
	at com.esotericsoftware.kryo.io.Input.require(Input.java:159)
	at com.esotericsoftware.kryo.io.Input.readByte(Input.java:255)
	at org.gradle.internal.serialize.kryo.KryoBackedDecoder.readByte(KryoBackedDecoder.java:80)
	at org.gradle.internal.remote.internal.hub.InterHubMessageSerializer$MessageReader.read(InterHubMessageSerializer.java:63)
	at org.gradle.internal.remote.internal.hub.InterHubMessageSerializer$MessageReader.read(InterHubMessageSerializer.java:52)
	at org.gradle.internal.remote.internal.inet.SocketConnection.receive(SocketConnection.java:79)
	at org.gradle.internal.remote.internal.hub.MessageHub$ConnectionReceive.run(MessageHub.java:263)
	at org.gradle.internal.concurrent.ExecutorPolicy$CatchAndRecordFailures.onExecute(ExecutorPolicy.java:63)
	at org.gradle.internal.concurrent.ManagedExecutorImpl$1.run(ManagedExecutorImpl.java:46)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
	at org.gradle.internal.concurrent.ThreadFactoryImpl$ManagedThreadRunnable.run(ThreadFactoryImpl.java:55)
	at java.lang.Thread.run(Thread.java:745)

"/127.0.0.1:60474 to /127.0.0.1:33567 workers Thread 2" #11 prio=5 os_prio=0 tid=0x00007f0784581000 nid=0x628 waiting on condition [0x00007f075db1b000]
   java.lang.Thread.State: WAITING (parking)
	at sun.misc.Unsafe.park(Native Method)
	- parking to wait for  <0x00000000836df028> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
	at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
	at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2039)
	at org.gradle.internal.remote.internal.hub.queue.EndPointQueue.take(EndPointQueue.java:48)
	at org.gradle.internal.remote.internal.hub.MessageHub$ConnectionDispatch.run(MessageHub.java:314)
	at org.gradle.internal.concurrent.ExecutorPolicy$CatchAndRecordFailures.onExecute(ExecutorPolicy.java:63)
	at org.gradle.internal.concurrent.ManagedExecutorImpl$1.run(ManagedExecutorImpl.java:46)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
	at org.gradle.internal.concurrent.ThreadFactoryImpl$ManagedThreadRunnable.run(ThreadFactoryImpl.java:55)
	at java.lang.Thread.run(Thread.java:745)

"Test worker" #10 prio=5 os_prio=0 tid=0x00007f078453d000 nid=0x627 waiting on condition [0x00007f075dc1a000]
   java.lang.Thread.State: WAITING (parking)
	at sun.misc.Unsafe.park(Native Method)
	- parking to wait for  <0x00000000e3b5d950> (a java.util.concurrent.FutureTask)
	at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
	at java.util.concurrent.FutureTask.awaitDone(FutureTask.java:429)
	at java.util.concurrent.FutureTask.get(FutureTask.java:191)
	at com.zaxxer.nuprocess.linux.LinuxProcess.spawnWithCwd(LinuxProcess.java:127)
	at com.zaxxer.nuprocess.internal.BasePosixProcess.start(BasePosixProcess.java:183)
	at com.zaxxer.nuprocess.linux.LinProcessFactory.createProcess(LinProcessFactory.java:40)
	- locked <0x00000000e378db20> (a java.lang.Class for com.zaxxer.nuprocess.linux.LinProcessFactory)
	at com.zaxxer.nuprocess.NuProcessBuilder.start(NuProcessBuilder.java:266)
	at de.alexkamp.acov.execution.ConcreteProcessRunner.startProcess(ConcreteProcessRunner.kt:42)
	at de.alexkamp.acov.execution.ConcreteProcessRunner.startProcess(ConcreteProcessRunner.kt:30)
	at de.alexkamp.acov.execution.ConcreteProcessRunner.startProcess(ConcreteProcessRunner.kt:21)
	at de.alexkamp.acov.execution.ProcessRunner$DefaultImpls.runProcess(ProcessRunner.kt:105)
	at de.alexkamp.acov.execution.ConcreteProcessRunner.runProcess(ConcreteProcessRunner.kt:21)
	at de.alexkamp.acov.execution.ProcessExecutorTest.testTimeout(ProcessExecutorTest.kt:72)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:498)
	at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47)
	at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
	at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44)
	at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
	at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:271)
	at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:70)
	at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:50)
	at org.junit.runners.ParentRunner$3.run(ParentRunner.java:238)
	at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:63)
	at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:236)
	at org.junit.runners.ParentRunner.access$000(ParentRunner.java:53)
	at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:229)
	at org.junit.runners.ParentRunner.run(ParentRunner.java:309)
	at org.gradle.api.internal.tasks.testing.junit.JUnitTestClassExecuter.runTestClass(JUnitTestClassExecuter.java:114)
	at org.gradle.api.internal.tasks.testing.junit.JUnitTestClassExecuter.execute(JUnitTestClassExecuter.java:57)
	at org.gradle.api.internal.tasks.testing.junit.JUnitTestClassProcessor.processTestClass(JUnitTestClassProcessor.java:66)
	at org.gradle.api.internal.tasks.testing.SuiteTestClassProcessor.processTestClass(SuiteTestClassProcessor.java:51)
	at sun.reflect.GeneratedMethodAccessor5.invoke(Unknown Source)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:498)
	at org.gradle.internal.dispatch.ReflectionDispatch.dispatch(ReflectionDispatch.java:35)
	at org.gradle.internal.dispatch.ReflectionDispatch.dispatch(ReflectionDispatch.java:24)
	at org.gradle.internal.dispatch.ContextClassLoaderDispatch.dispatch(ContextClassLoaderDispatch.java:32)
	at org.gradle.internal.dispatch.ProxyDispatchAdapter$DispatchingInvocationHandler.invoke(ProxyDispatchAdapter.java:93)
	at com.sun.proxy.$Proxy1.processTestClass(Unknown Source)
	at org.gradle.api.internal.tasks.testing.worker.TestWorker.processTestClass(TestWorker.java:108)
	at sun.reflect.GeneratedMethodAccessor4.invoke(Unknown Source)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:498)
	at org.gradle.internal.dispatch.ReflectionDispatch.dispatch(ReflectionDispatch.java:35)
	at org.gradle.internal.dispatch.ReflectionDispatch.dispatch(ReflectionDispatch.java:24)
	at org.gradle.internal.remote.internal.hub.MessageHubBackedObjectConnection$DispatchWrapper.dispatch(MessageHubBackedObjectConnection.java:146)
	at org.gradle.internal.remote.internal.hub.MessageHubBackedObjectConnection$DispatchWrapper.dispatch(MessageHubBackedObjectConnection.java:128)
	at org.gradle.internal.remote.internal.hub.MessageHub$Handler.run(MessageHub.java:404)
	at org.gradle.internal.concurrent.ExecutorPolicy$CatchAndRecordFailures.onExecute(ExecutorPolicy.java:63)
	at org.gradle.internal.concurrent.ManagedExecutorImpl$1.run(ManagedExecutorImpl.java:46)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
	at org.gradle.internal.concurrent.ThreadFactoryImpl$ManagedThreadRunnable.run(ThreadFactoryImpl.java:55)
	at java.lang.Thread.run(Thread.java:745)

"Service Thread" #8 daemon prio=9 os_prio=0 tid=0x00007f0784146000 nid=0x621 runnable [0x0000000000000000]
   java.lang.Thread.State: RUNNABLE

"C1 CompilerThread2" #7 daemon prio=9 os_prio=0 tid=0x00007f0784143800 nid=0x620 waiting on condition [0x0000000000000000]
   java.lang.Thread.State: RUNNABLE

"C2 CompilerThread1" #6 daemon prio=9 os_prio=0 tid=0x00007f0784141000 nid=0x61f waiting on condition [0x0000000000000000]
   java.lang.Thread.State: RUNNABLE

"C2 CompilerThread0" #5 daemon prio=9 os_prio=0 tid=0x00007f078413f800 nid=0x61e waiting on condition [0x0000000000000000]
   java.lang.Thread.State: RUNNABLE

"Signal Dispatcher" #4 daemon prio=9 os_prio=0 tid=0x00007f078413e000 nid=0x61d runnable [0x0000000000000000]
   java.lang.Thread.State: RUNNABLE

"Finalizer" #3 daemon prio=8 os_prio=0 tid=0x00007f0784084000 nid=0x61c in Object.wait() [0x00007f075e9f0000]
   java.lang.Thread.State: WAITING (on object monitor)
	at java.lang.Object.wait(Native Method)
	- waiting on <0x0000000083601928> (a java.lang.ref.ReferenceQueue$Lock)
	at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:143)
	- locked <0x0000000083601928> (a java.lang.ref.ReferenceQueue$Lock)
	at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:164)
	at java.lang.ref.Finalizer$FinalizerThread.run(Finalizer.java:209)

"Reference Handler" #2 daemon prio=10 os_prio=0 tid=0x00007f078407f000 nid=0x61b in Object.wait() [0x00007f075eaf1000]
   java.lang.Thread.State: WAITING (on object monitor)
	at java.lang.Object.wait(Native Method)
	- waiting on <0x00000000836019c0> (a java.lang.ref.Reference$Lock)
	at java.lang.Object.wait(Object.java:502)
	at java.lang.ref.Reference.tryHandlePending(Reference.java:191)
	- locked <0x00000000836019c0> (a java.lang.ref.Reference$Lock)
	at java.lang.ref.Reference$ReferenceHandler.run(Reference.java:153)

"main" #1 prio=5 os_prio=0 tid=0x00007f078400a000 nid=0x611 waiting on condition [0x00007f078bdad000]
   java.lang.Thread.State: WAITING (parking)
	at sun.misc.Unsafe.park(Native Method)
	- parking to wait for  <0x00000000836f4028> (a java.util.concurrent.CountDownLatch$Sync)
	at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
	at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836)
	at java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedInterruptibly(AbstractQueuedSynchronizer.java:997)
	at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1304)
	at java.util.concurrent.CountDownLatch.await(CountDownLatch.java:231)
	at org.gradle.api.internal.tasks.testing.worker.TestWorker.execute(TestWorker.java:69)
	at org.gradle.api.internal.tasks.testing.worker.TestWorker.execute(TestWorker.java:44)
	at org.gradle.process.internal.worker.child.ActionExecutionWorker.execute(ActionExecutionWorker.java:83)
	at org.gradle.process.internal.worker.child.ActionExecutionWorker.execute(ActionExecutionWorker.java:35)
	at org.gradle.process.internal.worker.child.SystemApplicationClassLoaderWorker.call(SystemApplicationClassLoaderWorker.java:119)
	at org.gradle.process.internal.worker.child.SystemApplicationClassLoaderWorker.call(SystemApplicationClassLoaderWorker.java:64)
	at worker.org.gradle.process.internal.worker.GradleWorkerMain.run(GradleWorkerMain.java:62)
	at worker.org.gradle.process.internal.worker.GradleWorkerMain.main(GradleWorkerMain.java:67)

"VM Thread" os_prio=0 tid=0x00007f0784077800 nid=0x61a runnable 

"GC task thread#0 (ParallelGC)" os_prio=0 tid=0x00007f078401f000 nid=0x616 runnable 

"GC task thread#1 (ParallelGC)" os_prio=0 tid=0x00007f0784021000 nid=0x617 runnable 

"GC task thread#2 (ParallelGC)" os_prio=0 tid=0x00007f0784022800 nid=0x618 runnable 

"GC task thread#3 (ParallelGC)" os_prio=0 tid=0x00007f0784024800 nid=0x619 runnable 

"VM Periodic Task Thread" os_prio=0 tid=0x00007f0784148800 nid=0x622 waiting on condition 

JNI global references: 517

What I can tell from the stack trace:
(1) the main thread is locked on a CountDownLatch. This is gradle (my build system), waiting for the unit test to terminate.
(2) The Reference Handler, Signal Dispatcher and Finalizer threads are part of the JVM. Same goes for the Compiler threads, the Service thread, the GC threads, the VM periodic task thread and the process reaper.
(3) The "/127.0.0.1:60474 to /127.0.0.1:33567 workers Thread x" threads are part of gradle. They are used by the communication between gradle worker process and gradle cli process.
(4) The "Attach Listener" is part of the JVM/jstack.
(5) The "Timer-0" is what I intend to use for the timeout.
Now for the interesting stuff:
(6) The "Test worker" executes my unit test, it is blocked on the Future in LinuxProcess.java:127. Looking at the code, this should be executed in a ThreadPoolExecutor. However, I can not see any thread that belongs to this thread pool. This indicates that either the future has been executed, or that the executor never start a thread, despite the fact that there is a task.

Running jstack again, about 20 minutes later, shows the same output, except that the process reaper thread is gone. Looking at a heap dump, there is an instance of java.lang.Thread with the name "NuProcessLinuxCwdChangeable-1". The firstTask field is set to what seems to be the future I am looking for. The threadStatus is set to 2, which seems to indicate terminated (http://grepcode.com/file/repository.grepcode.com/java/root/jdk/openjdk/8-b132/sun/misc/VM.java#VM.toThreadState%28int%29 shows the code which interprets this integer).

Any insights into what is going on?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant