AbstractRedisClient#shutdown() never ends when Redis is unstable #1768

ikeyat · 2021-06-08T11:47:14Z

Bug Report

Our batch application accesses Redis Cluster with Lettuce 5.1.8.RELEASE.
In our batch application, AbstractRedisClient#shutdown() never ends when Redis is unstable, so the Java process never stops.
From my understanding, shutdown has a timeout parameter (default: 2 sec) to prevent infinite waiting.
It looks the timeout parameter does not work in some cases.
I'm not sure the condition that happen and how to reproduce. But it tends to happen after Redis server fails over and RedisCommandTimeoutException throws.

Current Behavior

The following is a thread dump of calling shutdown. The batch application has been running for a few weeks because of this waiting.

"main" #1 prio=5 os_prio=0 cpu=17398.97ms elapsed=481829.80s tid=0x00007ff96c015800 nid=0x24c8 waiting on condition  [0x00007ff973179000]
   java.lang.Thread.State: WAITING (parking)
        at jdk.internal.misc.Unsafe.park(java.base@11.0.5/Native Method)
        - parking to wait for  <0x00000000deffa2d0> (a java.util.concurrent.CompletableFuture$Signaller)
        at java.util.concurrent.locks.LockSupport.park(java.base@11.0.5/LockSupport.java:194)
        at java.util.concurrent.CompletableFuture$Signaller.block(java.base@11.0.5/CompletableFuture.java:1796)
        at java.util.concurrent.ForkJoinPool.managedBlock(java.base@11.0.5/ForkJoinPool.java:3128)
        at java.util.concurrent.CompletableFuture.waitingGet(java.base@11.0.5/CompletableFuture.java:1823)
        at java.util.concurrent.CompletableFuture.get(java.base@11.0.5/CompletableFuture.java:1998)
        at io.lettuce.core.AbstractRedisClient.shutdown(AbstractRedisClient.java:393)
        at io.lettuce.core.cluster.RedisClusterClient.shutdown(RedisClusterClient.java:958)
        at org.springframework.data.redis.connection.lettuce.LettuceConnectionFactory.destroy(LettuceConnectionFactory.java:306)
        at org.springframework.beans.factory.support.DisposableBeanAdapter.destroy(DisposableBeanAdapter.java:256)
        at org.springframework.beans.factory.support.DefaultSingletonBeanRegistry.destroyBean(DefaultSingletonBeanRegistry.java:571)
        at org.springframework.beans.factory.support.DefaultSingletonBeanRegistry.destroySingleton(DefaultSingletonBeanRegistry.java:543)
        at org.springframework.beans.factory.support.DefaultListableBeanFactory.destroySingleton(DefaultListableBeanFactory.java:1052)
        at org.springframework.beans.factory.support.DefaultSingletonBeanRegistry.destroySingletons(DefaultSingletonBeanRegistry.java:504)
        at org.springframework.beans.factory.support.DefaultListableBeanFactory.destroySingletons(DefaultListableBeanFactory.java:1059)
        at org.springframework.context.support.AbstractApplicationContext.destroyBeans(AbstractApplicationContext.java:1035)
        at org.springframework.context.support.AbstractApplicationContext.doClose(AbstractApplicationContext.java:1011)
        at org.springframework.context.support.AbstractApplicationContext.close(AbstractApplicationContext.java:961)
        - locked <0x00000000c08931a0> (a java.lang.Object)
        at org.springframework.batch.core.launch.support.CommandLineJobRunner.start(CommandLineJobRunner.java:373)
        at org.springframework.batch.core.launch.support.CommandLineJobRunner.main(CommandLineJobRunner.java:564)

   Locked ownable synchronizers:
        - None

I found an another similar waiting thread.

"lettuce-epollEventLoop-4-2" #15 daemon prio=5 os_prio=0 cpu=590.61ms elapsed=481728.55s tid=0x00007ff96da4e000 nid=0x2649 waiting on condition  [0x00007ff9254d3000]
   java.lang.Thread.State: WAITING (parking)
        at jdk.internal.misc.Unsafe.park(java.base@11.0.5/Native Method)
        - parking to wait for  <0x00000000df0ce878> (a java.util.concurrent.CompletableFuture$Signaller)
        at java.util.concurrent.locks.LockSupport.park(java.base@11.0.5/LockSupport.java:194)
        at java.util.concurrent.CompletableFuture$Signaller.block(java.base@11.0.5/CompletableFuture.java:1796)
        at java.util.concurrent.ForkJoinPool.managedBlock(java.base@11.0.5/ForkJoinPool.java:3128)
        at java.util.concurrent.CompletableFuture.waitingGet(java.base@11.0.5/CompletableFuture.java:1823)
        at java.util.concurrent.CompletableFuture.join(java.base@11.0.5/CompletableFuture.java:2043)
        at io.lettuce.core.RedisChannelHandler.close(RedisChannelHandler.java:126)
        at io.lettuce.core.cluster.RedisClusterClient.lambda$connectToNodeAsync$1(RedisClusterClient.java:504)
        at io.lettuce.core.cluster.RedisClusterClient$$Lambda$226/0x0000000100519040.accept(Unknown Source)
        at java.util.concurrent.CompletableFuture.uniWhenComplete(java.base@11.0.5/CompletableFuture.java:859)
        at java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(java.base@11.0.5/CompletableFuture.java:837)
        at java.util.concurrent.CompletableFuture.postComplete(java.base@11.0.5/CompletableFuture.java:506)
        at java.util.concurrent.CompletableFuture.complete(java.base@11.0.5/CompletableFuture.java:2073)
        at io.lettuce.core.AbstractRedisClient.lambda$null$3(AbstractRedisClient.java:341)
        at io.lettuce.core.AbstractRedisClient$$Lambda$249/0x0000000100578c40.accept(Unknown Source)
        at java.util.concurrent.CompletableFuture.uniWhenComplete(java.base@11.0.5/CompletableFuture.java:859)
        at java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(java.base@11.0.5/CompletableFuture.java:837)
        at java.util.concurrent.CompletableFuture.postComplete(java.base@11.0.5/CompletableFuture.java:506)
        at java.util.concurrent.CompletableFuture.complete(java.base@11.0.5/CompletableFuture.java:2073)
        at io.lettuce.core.SslConnectionBuilder$SslChannelInitializer$2.userEventTriggered(SslConnectionBuilder.java:211)
        at io.netty.channel.AbstractChannelHandlerContext.invokeUserEventTriggered(AbstractChannelHandlerContext.java:329)
        at io.netty.channel.AbstractChannelHandlerContext.invokeUserEventTriggered(AbstractChannelHandlerContext.java:315)
        at io.netty.channel.AbstractChannelHandlerContext.fireUserEventTriggered(AbstractChannelHandlerContext.java:307)
        at io.netty.channel.ChannelInboundHandlerAdapter.userEventTriggered(ChannelInboundHandlerAdapter.java:108)
        at io.netty.handler.codec.ByteToMessageDecoder.userEventTriggered(ByteToMessageDecoder.java:366)
        at io.netty.channel.AbstractChannelHandlerContext.invokeUserEventTriggered(AbstractChannelHandlerContext.java:329)
        at io.netty.channel.AbstractChannelHandlerContext.invokeUserEventTriggered(AbstractChannelHandlerContext.java:315)
        at io.netty.channel.AbstractChannelHandlerContext.fireUserEventTriggered(AbstractChannelHandlerContext.java:307)
        at io.netty.channel.ChannelInboundHandlerAdapter.userEventTriggered(ChannelInboundHandlerAdapter.java:108)
        at io.netty.channel.AbstractChannelHandlerContext.invokeUserEventTriggered(AbstractChannelHandlerContext.java:329)
        at io.netty.channel.AbstractChannelHandlerContext.invokeUserEventTriggered(AbstractChannelHandlerContext.java:315)
        at io.netty.channel.AbstractChannelHandlerContext.fireUserEventTriggered(AbstractChannelHandlerContext.java:307)
        at io.netty.channel.DefaultChannelPipeline$HeadContext.userEventTriggered(DefaultChannelPipeline.java:1452)
        at io.netty.channel.AbstractChannelHandlerContext.invokeUserEventTriggered(AbstractChannelHandlerContext.java:329)
        at io.netty.channel.AbstractChannelHandlerContext.invokeUserEventTriggered(AbstractChannelHandlerContext.java:315)
        at io.netty.channel.DefaultChannelPipeline.fireUserEventTriggered(DefaultChannelPipeline.java:959)
        at io.lettuce.core.protocol.CommandHandler.lambda$channelActive$0(CommandHandler.java:273)
        at io.lettuce.core.protocol.CommandHandler$$Lambda$261/0x000000010057c040.run(Unknown Source)
        at io.netty.util.concurrent.PromiseTask$RunnableAdapter.call(PromiseTask.java:38)
        at io.netty.util.concurrent.PromiseTask.run(PromiseTask.java:73)
        at io.netty.util.concurrent.AbstractEventExecutor.safeExecute(AbstractEventExecutor.java:163)
        at io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:404)
        at io.netty.channel.epoll.EpollEventLoop.run(EpollEventLoop.java:335)
        at io.netty.util.concurrent.SingleThreadEventExecutor$5.run(SingleThreadEventExecutor.java:897)
        at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
        at java.lang.Thread.run(java.base@11.0.5/Thread.java:834)

   Locked ownable synchronizers:
        - None

Environment

Lettuce version(s): 5.1.8.RELEASE
Redis version: 4 (Azure Cache for Redis)

The text was updated successfully, but these errors were encountered:

mp911de · 2021-06-08T12:36:47Z

Thanks for report. It's a bug in Lettuce where a blocking call happens on a thread that must not be blocked:

        at io.lettuce.core.RedisChannelHandler.close(RedisChannelHandler.java:126)
        at io.lettuce.core.cluster.RedisClusterClient.lambda$connectToNodeAsync$1(RedisClusterClient.java:504)
        at io.lettuce.core.cluster.RedisClusterClient$$Lambda$226/0x0000000100519040.accept(Unknown Source)

When a cluster node connection fails, we call now the asynchronous closeAsync method instead of the blocking close method.

mp911de · 2021-06-09T07:29:31Z

That's fixed now.

mp911de added the type: bug A general bug label Jun 8, 2021

mp911de added this to the 6.0.6 milestone Jun 8, 2021

mp911de added a commit that referenced this issue Jun 9, 2021

Fix asynchronous resource cleanup on failed connect #1768

743ef11

When a cluster node connection fails, we call now the asynchronous closeAsync method instead of the blocking close method.

mp911de added a commit that referenced this issue Jun 9, 2021

Fix asynchronous resource cleanup on failed connect #1768

5306ee6

When a cluster node connection fails, we call now the asynchronous closeAsync method instead of the blocking close method.

mp911de added a commit that referenced this issue Jun 9, 2021

Fix asynchronous resource cleanup on failed connect #1768

3c78ca0

When a cluster node connection fails, we call now the asynchronous closeAsync method instead of the blocking close method.

mp911de closed this as completed Jun 9, 2021

mp911de added a commit that referenced this issue Jun 18, 2021

Fix asynchronous resource cleanup on failed connect #1768

89bc775

mp911de added a commit that referenced this issue Jun 18, 2021

Fix asynchronous resource cleanup on failed connect #1768

87e7d95

mp911de added a commit that referenced this issue Jun 18, 2021

Fix asynchronous resource cleanup on failed connect #1768

63728f8

ikeyat mentioned this issue Dec 8, 2022

AbstractRedisClient#shutdown() never ends sometimes #2278

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

AbstractRedisClient#shutdown() never ends when Redis is unstable #1768

AbstractRedisClient#shutdown() never ends when Redis is unstable #1768

ikeyat commented Jun 8, 2021

mp911de commented Jun 8, 2021

mp911de commented Jun 9, 2021

AbstractRedisClient#shutdown() never ends when Redis is unstable #1768

AbstractRedisClient#shutdown() never ends when Redis is unstable #1768

Comments

ikeyat commented Jun 8, 2021

Bug Report

Current Behavior

Environment

mp911de commented Jun 8, 2021

mp911de commented Jun 9, 2021