Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

出现主节点丢失,不能重新自动选举leader的问题 #111

Closed
panshiming opened this issue Jun 18, 2016 · 9 comments
Closed

出现主节点丢失,不能重新自动选举leader的问题 #111

panshiming opened this issue Jun 18, 2016 · 9 comments

Comments

@panshiming
Copy link

启动一个定时任务,如果是主节点会在zk相应server目录下创建leader/election/host路径,不知道什么原因,这个host节点丢失了,框架也不能自动去选举主节点,导致任务hang住,一直等待选举

@terrymanu
Copy link
Member

请确定原因,并提供dump文件。

@panshiming
Copy link
Author

触发导致的条件单实例断开和zookeeper的链接,过了一段时间恢复网络链接,应该是没有获取到RECONNECT事件导致的 ,具体dump文件后续复现一下再提供

@youngerzjj
Copy link

我们在生产环境也发现了这个问题,在elastic-job-console-1.0.2上看到任务是正常的,但是实际这个分片一直没跑,dump日志如下:

2016-07-12 14:05:36
Full thread dump Java HotSpot(TM) 64-Bit Server VM (25.45-b02 mixed mode):

"Attach Listener" #12237 daemon prio=9 os_prio=0 tid=0x00007f5d6c001000 nid=0x2135 waiting on condition [0x0000000000000000]
java.lang.Thread.State: RUNNABLE

"Curator-LeaderLatch-0" #12236 daemon prio=5 os_prio=0 tid=0x00007f5d3400b800 nid=0x41a7 in Object.wait() [0x00007f5d60b80000]
java.lang.Thread.State: WAITING (on object monitor)
at java.lang.Object.wait(Native Method)
at java.lang.Object.wait(Object.java:502)
at org.apache.curator.framework.state.ConnectionStateManager.blockUntilConnected(ConnectionStateManager.java:215)
- locked <0x00000000e0c77de8> (a org.apache.curator.framework.state.ConnectionStateManager)
at org.apache.curator.framework.imps.CuratorFrameworkImpl.blockUntilConnected(CuratorFrameworkImpl.java:212)
at org.apache.curator.framework.imps.CuratorFrameworkImpl.blockUntilConnected(CuratorFrameworkImpl.java:218)
at org.apache.curator.framework.recipes.AfterConnectionEstablished$1.run(AfterConnectionEstablished.java:55)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)

"Curator-LeaderLatch-0" #12235 daemon prio=5 os_prio=0 tid=0x00007f5d2c2d5800 nid=0x4129 in Object.wait() [0x00007f5d60a7f000]
java.lang.Thread.State: WAITING (on object monitor)
at java.lang.Object.wait(Native Method)
at java.lang.Object.wait(Object.java:502)
at org.apache.curator.framework.state.ConnectionStateManager.blockUntilConnected(ConnectionStateManager.java:215)
- locked <0x00000000e0c77de8> (a org.apache.curator.framework.state.ConnectionStateManager)
at org.apache.curator.framework.imps.CuratorFrameworkImpl.blockUntilConnected(CuratorFrameworkImpl.java:212)
at org.apache.curator.framework.imps.CuratorFrameworkImpl.blockUntilConnected(CuratorFrameworkImpl.java:218)
at org.apache.curator.framework.recipes.AfterConnectionEstablished$1.run(AfterConnectionEstablished.java:55)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)

"ThreadPoolTaskExecutor-10" #40 prio=5 os_prio=0 tid=0x00007f5d2c312000 nid=0x6df1 waiting on condition [0x00007f5d60c81000]
java.lang.Thread.State: WAITING (parking)
at sun.misc.Unsafe.park(Native Method)
- parking to wait for <0x00000000e1037078> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2039)
at java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:442)
at java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1067)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1127)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)

"ThreadPoolTaskExecutor-9" #38 prio=5 os_prio=0 tid=0x00007f5d2c21c000 nid=0x6a43 waiting on condition [0x00007f5d60d82000]
java.lang.Thread.State: WAITING (parking)
at sun.misc.Unsafe.park(Native Method)
- parking to wait for <0x00000000e1037078> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2039)
at java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:442)
at java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1067)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1127)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)

"ThreadPoolTaskExecutor-8" #37 prio=5 os_prio=0 tid=0x00007f5d2c29f000 nid=0x68ae waiting on condition [0x00007f5d60e83000]
java.lang.Thread.State: WAITING (parking)
at sun.misc.Unsafe.park(Native Method)
- parking to wait for <0x00000000e1037078> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2039)
at java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:442)
at java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1067)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1127)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)

"ThreadPoolTaskExecutor-7" #35 prio=5 os_prio=0 tid=0x00007f5d2c29e000 nid=0x6861 waiting on condition [0x00007f5d60f84000]
java.lang.Thread.State: WAITING (parking)
at sun.misc.Unsafe.park(Native Method)
- parking to wait for <0x00000000e1037078> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2039)
at java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:442)
at java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1067)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1127)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)

"ThreadPoolTaskExecutor-6" #34 prio=5 os_prio=0 tid=0x00007f5d2c01a000 nid=0x67eb waiting on condition [0x00007f5d61085000]
java.lang.Thread.State: WAITING (parking)
at sun.misc.Unsafe.park(Native Method)
- parking to wait for <0x00000000e1037078> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2039)
at java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:442)
at java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1067)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1127)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)

"ThreadPoolTaskExecutor-5" #33 prio=5 os_prio=0 tid=0x00007f5d2c2a1800 nid=0x5fa4 waiting on condition [0x00007f5d61186000]
java.lang.Thread.State: WAITING (parking)
at sun.misc.Unsafe.park(Native Method)
- parking to wait for <0x00000000e1037078> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2039)
at java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:442)
at java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1067)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1127)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)

"ThreadPoolTaskExecutor-4" #32 prio=5 os_prio=0 tid=0x00007f5d2c2a0800 nid=0x5f3f waiting on condition [0x00007f5d61287000]
java.lang.Thread.State: WAITING (parking)
at sun.misc.Unsafe.park(Native Method)
- parking to wait for <0x00000000e1037078> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2039)
at java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:442)
at java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1067)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1127)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)

"ThreadPoolTaskExecutor-3" #29 prio=5 os_prio=0 tid=0x00007f5d2c15c800 nid=0x59de waiting on condition [0x00007f5d61388000]
java.lang.Thread.State: WAITING (parking)
at sun.misc.Unsafe.park(Native Method)
- parking to wait for <0x00000000e1037078> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2039)
at java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:442)
at java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1067)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1127)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)

"ThreadPoolTaskExecutor-2" #27 prio=5 os_prio=0 tid=0x00007f5d2c309000 nid=0x58d6 waiting on condition [0x00007f5d6178a000]
java.lang.Thread.State: WAITING (parking)
at sun.misc.Unsafe.park(Native Method)
- parking to wait for <0x00000000e1037078> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2039)
at java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:442)
at java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1067)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1127)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)

"commons-pool-EvictionTimer" #26 daemon prio=5 os_prio=0 tid=0x00007f5d20138800 nid=0x4deb in Object.wait() [0x00007f5d61689000]
java.lang.Thread.State: TIMED_WAITING (on object monitor)
at java.lang.Object.wait(Native Method)
at java.util.TimerThread.mainLoop(Timer.java:552)
- locked <0x00000000e1037978> (a java.util.TaskQueue)
at java.util.TimerThread.run(Timer.java:505)

"ThreadPoolTaskExecutor-1" #24 prio=5 os_prio=0 tid=0x00007f5d2c274800 nid=0x4de6 waiting on condition [0x00007f5d6216d000]
java.lang.Thread.State: WAITING (parking)
at sun.misc.Unsafe.park(Native Method)
- parking to wait for <0x00000000e1037078> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2039)
at java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:442)
at java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1067)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1127)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)

"Timer-1" #22 daemon prio=5 os_prio=0 tid=0x00007f5d2c04a000 nid=0x4d39 in Object.wait() [0x00007f5d7c313000]
java.lang.Thread.State: TIMED_WAITING (on object monitor)
at java.lang.Object.wait(Native Method)
at java.util.TimerThread.mainLoop(Timer.java:552)
- locked <0x00000000e0c16618> (a java.util.TaskQueue)
at java.util.TimerThread.run(Timer.java:505)

"Abandoned connection cleanup thread" #21 daemon prio=5 os_prio=0 tid=0x00007f5d2c05c800 nid=0x4d38 in Object.wait() [0x00007f5d7c414000]
java.lang.Thread.State: TIMED_WAITING (on object monitor)
at java.lang.Object.wait(Native Method)
at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:143)
- locked <0x00000000e0c1aa70> (a java.lang.ref.ReferenceQueue$Lock)
at com.mysql.jdbc.AbandonedConnectionCleanupThread.run(AbandonedConnectionCleanupThread.java:40)

"DestroyJavaVM" #20 prio=5 os_prio=0 tid=0x00007f5d9c009000 nid=0x4d1b waiting on condition [0x0000000000000000]
java.lang.Thread.State: RUNNABLE

"Timer-0" #19 daemon prio=5 os_prio=0 tid=0x00007f5d9c7ad000 nid=0x4d33 in Object.wait() [0x00007f5d7c715000]
java.lang.Thread.State: TIMED_WAITING (on object monitor)
at java.lang.Object.wait(Native Method)
at java.util.TimerThread.mainLoop(Timer.java:552)
- locked <0x00000000e0c1adb0> (a java.util.TaskQueue)
at java.util.TimerThread.run(Timer.java:505)

"DEFAULT.processFeedDataJob_Scheduler_QuartzSchedulerThread" #18 prio=5 os_prio=0 tid=0x00007f5d9c792800 nid=0x4d32 in Object.wait() [0x00007f5d7c816000]
java.lang.Thread.State: TIMED_WAITING (on object monitor)
at java.lang.Object.wait(Native Method)
at org.quartz.simpl.SimpleThreadPool.blockForAvailableThreads(SimpleThreadPool.java:452)
- locked <0x00000000e0c22250> (a java.lang.Object)
at org.quartz.core.QuartzSchedulerThread.run(QuartzSchedulerThread.java:263)

"DEFAULT.processFeedDataJob_Scheduler_Worker-1" #17 prio=5 os_prio=0 tid=0x00007f5d9c783800 nid=0x4d31 in Object.wait() [0x00007f5d7c917000]
java.lang.Thread.State: WAITING (on object monitor)
at java.lang.Object.wait(Native Method)
at java.lang.Object.wait(Object.java:502)
at org.apache.curator.framework.recipes.leader.LeaderLatch.await(LeaderLatch.java:327)
- locked <0x00000000e124efb0> (a org.apache.curator.framework.recipes.leader.LeaderLatch)
at com.dangdang.ddframe.job.internal.storage.JobNodeStorage.executeInLeader(JobNodeStorage.java:171)
at com.dangdang.ddframe.job.internal.failover.FailoverService.failoverIfNecessary(FailoverService.java:84)
at com.dangdang.ddframe.job.internal.job.AbstractElasticJob.execute(AbstractElasticJob.java:89)
at org.quartz.core.JobRunShell.run(JobRunShell.java:202)
at org.quartz.simpl.SimpleThreadPool$WorkerThread.run(SimpleThreadPool.java:573)
- locked <0x00000000e0c798f0> (a java.lang.Object)

"pool-7-thread-1" #16 prio=5 os_prio=0 tid=0x00007f5d9c76c800 nid=0x4d30 waiting on condition [0x00007f5d7ca1d000]
java.lang.Thread.State: TIMED_WAITING (parking)
at sun.misc.Unsafe.park(Native Method)
- parking to wait for <0x00000000e0c1ca20> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
at java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:215)
at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2078)
at java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue.take(ScheduledThreadPoolExecutor.java:1093)
at java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue.take(ScheduledThreadPoolExecutor.java:809)
at java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1067)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1127)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)

"Curator-TreeCache-0" #14 daemon prio=5 os_prio=0 tid=0x00007f5d3c006800 nid=0x4d2e in Object.wait() [0x00007f5d7cd1e000]
java.lang.Thread.State: WAITING (on object monitor)
at java.lang.Object.wait(Native Method)
at java.lang.Object.wait(Object.java:502)
at org.apache.curator.framework.recipes.leader.LeaderLatch.await(LeaderLatch.java:327)
- locked <0x00000000e124b488> (a org.apache.curator.framework.recipes.leader.LeaderLatch)
at com.dangdang.ddframe.job.internal.storage.JobNodeStorage.executeInLeader(JobNodeStorage.java:171)
at com.dangdang.ddframe.job.internal.failover.FailoverService.failoverIfNecessary(FailoverService.java:84)
at com.dangdang.ddframe.job.internal.failover.FailoverListenerManager.failover(FailoverListenerManager.java:99)
at com.dangdang.ddframe.job.internal.failover.FailoverListenerManager.access$100(FailoverListenerManager.java:39)
at com.dangdang.ddframe.job.internal.failover.FailoverListenerManager$1.dataChanged(FailoverListenerManager.java:78)
at com.dangdang.ddframe.job.internal.listener.AbstractJobListener.childEvent(AbstractJobListener.java:37)
at org.apache.curator.framework.recipes.cache.TreeCache$2.apply(TreeCache.java:685)
at org.apache.curator.framework.recipes.cache.TreeCache$2.apply(TreeCache.java:679)
at org.apache.curator.framework.listen.ListenerContainer$1.run(ListenerContainer.java:92)
at com.google.common.util.concurrent.MoreExecutors$DirectExecutorService.execute(MoreExecutors.java:299)
at org.apache.curator.framework.listen.ListenerContainer.forEach(ListenerContainer.java:84)
at org.apache.curator.framework.recipes.cache.TreeCache.callListeners(TreeCache.java:678)
at org.apache.curator.framework.recipes.cache.TreeCache.access$1400(TreeCache.java:69)
at org.apache.curator.framework.recipes.cache.TreeCache$4.run(TreeCache.java:790)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)

"Curator-Framework-0" #13 daemon prio=5 os_prio=0 tid=0x00007f5d9c6f7000 nid=0x4d2d waiting on condition [0x00007f5d7ce1f000]
java.lang.Thread.State: WAITING (parking)
at sun.misc.Unsafe.park(Native Method)
- parking to wait for <0x00000000e0c22680> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2039)
at java.util.concurrent.DelayQueue.take(DelayQueue.java:211)
at java.util.concurrent.DelayQueue.take(DelayQueue.java:70)
at org.apache.curator.framework.imps.CuratorFrameworkImpl.backgroundOperationsLoop(CuratorFrameworkImpl.java:780)
at org.apache.curator.framework.imps.CuratorFrameworkImpl.access$300(CuratorFrameworkImpl.java:62)
at org.apache.curator.framework.imps.CuratorFrameworkImpl$4.call(CuratorFrameworkImpl.java:257)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)

"main-EventThread" #12 daemon prio=5 os_prio=0 tid=0x00007f5d9c737000 nid=0x4d2c waiting on condition [0x00007f5d7d120000]
java.lang.Thread.State: WAITING (parking)
at sun.misc.Unsafe.park(Native Method)
- parking to wait for <0x00000000e0c24898> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2039)
at java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:442)
at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:494)

"main-SendThread(10.144.156.103:2181)" #11 daemon prio=5 os_prio=0 tid=0x00007f5d9c736000 nid=0x4d2b runnable [0x00007f5d7d221000]
java.lang.Thread.State: RUNNABLE
at sun.nio.ch.EPollArrayWrapper.epollWait(Native Method)
at sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:269)
at sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:79)
at sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:86)
- locked <0x00000000e0c775f8> (a sun.nio.ch.Util$2)
- locked <0x00000000e0c775e8> (a java.util.Collections$UnmodifiableSet)
- locked <0x00000000e0c77370> (a sun.nio.ch.EPollSelectorImpl)
at sun.nio.ch.SelectorImpl.select(SelectorImpl.java:97)
at org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:349)
at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1081)

"Curator-ConnectionStateManager-0" #10 daemon prio=5 os_prio=0 tid=0x00007f5d9c6df000 nid=0x4d2a waiting on condition [0x00007f5d7d533000]
java.lang.Thread.State: WAITING (parking)
at sun.misc.Unsafe.park(Native Method)
- parking to wait for <0x00000000e0c77d90> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2039)
at java.util.concurrent.ArrayBlockingQueue.take(ArrayBlockingQueue.java:403)
at org.apache.curator.framework.state.ConnectionStateManager.processEvents(ConnectionStateManager.java:245)
at org.apache.curator.framework.state.ConnectionStateManager.access$000(ConnectionStateManager.java:43)
at org.apache.curator.framework.state.ConnectionStateManager$1.call(ConnectionStateManager.java:111)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)

"Service Thread" #8 daemon prio=9 os_prio=0 tid=0x00007f5d9c0c1000 nid=0x4d28 runnable [0x0000000000000000]
java.lang.Thread.State: RUNNABLE

"C1 CompilerThread2" #7 daemon prio=9 os_prio=0 tid=0x00007f5d9c0bb800 nid=0x4d27 waiting on condition [0x0000000000000000]
java.lang.Thread.State: RUNNABLE

"C2 CompilerThread1" #6 daemon prio=9 os_prio=0 tid=0x00007f5d9c0ba000 nid=0x4d26 waiting on condition [0x0000000000000000]
java.lang.Thread.State: RUNNABLE

"C2 CompilerThread0" #5 daemon prio=9 os_prio=0 tid=0x00007f5d9c0b7000 nid=0x4d25 waiting on condition [0x0000000000000000]
java.lang.Thread.State: RUNNABLE

"Signal Dispatcher" #4 daemon prio=9 os_prio=0 tid=0x00007f5d9c0b5800 nid=0x4d24 runnable [0x0000000000000000]
java.lang.Thread.State: RUNNABLE

"Finalizer" #3 daemon prio=8 os_prio=0 tid=0x00007f5d9c07e000 nid=0x4d22 in Object.wait() [0x00007f5d7e648000]
java.lang.Thread.State: WAITING (on object monitor)
at java.lang.Object.wait(Native Method)
at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:143)
- locked <0x00000000e0c79678> (a java.lang.ref.ReferenceQueue$Lock)
at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:164)
at java.lang.ref.Finalizer$FinalizerThread.run(Finalizer.java:209)

"Reference Handler" #2 daemon prio=10 os_prio=0 tid=0x00007f5d9c07b800 nid=0x4d21 in Object.wait() [0x00007f5d7e749000]
java.lang.Thread.State: WAITING (on object monitor)
at java.lang.Object.wait(Native Method)
at java.lang.Object.wait(Object.java:502)
at java.lang.ref.Reference$ReferenceHandler.run(Reference.java:157)
- locked <0x00000000e0c78d48> (a java.lang.ref.Reference$Lock)

"VM Thread" os_prio=0 tid=0x00007f5d9c076800 nid=0x4d20 runnable

"GC task thread#0 (ParallelGC)" os_prio=0 tid=0x00007f5d9c01e000 nid=0x4d1c runnable

"GC task thread#1 (ParallelGC)" os_prio=0 tid=0x00007f5d9c020000 nid=0x4d1d runnable

"GC task thread#2 (ParallelGC)" os_prio=0 tid=0x00007f5d9c021800 nid=0x4d1e runnable

"GC task thread#3 (ParallelGC)" os_prio=0 tid=0x00007f5d9c023800 nid=0x4d1f runnable

"VM Periodic Task Thread" os_prio=0 tid=0x00007f5d9c0c3800 nid=0x4d29 waiting on condition

JNI global references: 292

@terrymanu
Copy link
Member

请提供esjob的dump并说明版本号

@youngerzjj
Copy link

版本是elastic-job-core-1.0.2,使用curator-framework-2.8.0.jar,zookeeper-3.4.6.jar
job.txt

@ZhangShufan15
Copy link

会不会是,你的任务下次执行时间还没有到?印象分片是在任务执行的时候触发的

@youngerzjj
Copy link

任务是10秒钟触发一次,执行时间肯定到了,现在是任务hang住,不动了

@terrymanu
Copy link
Member

请升级至1.0.6之后的版本,1.0.2之前确实有这个问题

@youngerzjj
Copy link

好的,谢谢

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants