Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

AbstractRedisClient#shutdown() never ends when Redis is unstable #1768

Closed
ikeyat opened this issue Jun 8, 2021 · 2 comments
Closed

AbstractRedisClient#shutdown() never ends when Redis is unstable #1768

ikeyat opened this issue Jun 8, 2021 · 2 comments
Labels
type: bug A general bug
Milestone

Comments

@ikeyat
Copy link

ikeyat commented Jun 8, 2021

Bug Report

Our batch application accesses Redis Cluster with Lettuce 5.1.8.RELEASE.
In our batch application, AbstractRedisClient#shutdown() never ends when Redis is unstable, so the Java process never stops.
From my understanding, shutdown has a timeout parameter (default: 2 sec) to prevent infinite waiting.
It looks the timeout parameter does not work in some cases.
I'm not sure the condition that happen and how to reproduce. But it tends to happen after Redis server fails over and RedisCommandTimeoutException throws.

Current Behavior

The following is a thread dump of calling shutdown. The batch application has been running for a few weeks because of this waiting.

"main" #1 prio=5 os_prio=0 cpu=17398.97ms elapsed=481829.80s tid=0x00007ff96c015800 nid=0x24c8 waiting on condition  [0x00007ff973179000]
   java.lang.Thread.State: WAITING (parking)
        at jdk.internal.misc.Unsafe.park(java.base@11.0.5/Native Method)
        - parking to wait for  <0x00000000deffa2d0> (a java.util.concurrent.CompletableFuture$Signaller)
        at java.util.concurrent.locks.LockSupport.park(java.base@11.0.5/LockSupport.java:194)
        at java.util.concurrent.CompletableFuture$Signaller.block(java.base@11.0.5/CompletableFuture.java:1796)
        at java.util.concurrent.ForkJoinPool.managedBlock(java.base@11.0.5/ForkJoinPool.java:3128)
        at java.util.concurrent.CompletableFuture.waitingGet(java.base@11.0.5/CompletableFuture.java:1823)
        at java.util.concurrent.CompletableFuture.get(java.base@11.0.5/CompletableFuture.java:1998)
        at io.lettuce.core.AbstractRedisClient.shutdown(AbstractRedisClient.java:393)
        at io.lettuce.core.cluster.RedisClusterClient.shutdown(RedisClusterClient.java:958)
        at org.springframework.data.redis.connection.lettuce.LettuceConnectionFactory.destroy(LettuceConnectionFactory.java:306)
        at org.springframework.beans.factory.support.DisposableBeanAdapter.destroy(DisposableBeanAdapter.java:256)
        at org.springframework.beans.factory.support.DefaultSingletonBeanRegistry.destroyBean(DefaultSingletonBeanRegistry.java:571)
        at org.springframework.beans.factory.support.DefaultSingletonBeanRegistry.destroySingleton(DefaultSingletonBeanRegistry.java:543)
        at org.springframework.beans.factory.support.DefaultListableBeanFactory.destroySingleton(DefaultListableBeanFactory.java:1052)
        at org.springframework.beans.factory.support.DefaultSingletonBeanRegistry.destroySingletons(DefaultSingletonBeanRegistry.java:504)
        at org.springframework.beans.factory.support.DefaultListableBeanFactory.destroySingletons(DefaultListableBeanFactory.java:1059)
        at org.springframework.context.support.AbstractApplicationContext.destroyBeans(AbstractApplicationContext.java:1035)
        at org.springframework.context.support.AbstractApplicationContext.doClose(AbstractApplicationContext.java:1011)
        at org.springframework.context.support.AbstractApplicationContext.close(AbstractApplicationContext.java:961)
        - locked <0x00000000c08931a0> (a java.lang.Object)
        at org.springframework.batch.core.launch.support.CommandLineJobRunner.start(CommandLineJobRunner.java:373)
        at org.springframework.batch.core.launch.support.CommandLineJobRunner.main(CommandLineJobRunner.java:564)

   Locked ownable synchronizers:
        - None

I found an another similar waiting thread.

"lettuce-epollEventLoop-4-2" #15 daemon prio=5 os_prio=0 cpu=590.61ms elapsed=481728.55s tid=0x00007ff96da4e000 nid=0x2649 waiting on condition  [0x00007ff9254d3000]
   java.lang.Thread.State: WAITING (parking)
        at jdk.internal.misc.Unsafe.park(java.base@11.0.5/Native Method)
        - parking to wait for  <0x00000000df0ce878> (a java.util.concurrent.CompletableFuture$Signaller)
        at java.util.concurrent.locks.LockSupport.park(java.base@11.0.5/LockSupport.java:194)
        at java.util.concurrent.CompletableFuture$Signaller.block(java.base@11.0.5/CompletableFuture.java:1796)
        at java.util.concurrent.ForkJoinPool.managedBlock(java.base@11.0.5/ForkJoinPool.java:3128)
        at java.util.concurrent.CompletableFuture.waitingGet(java.base@11.0.5/CompletableFuture.java:1823)
        at java.util.concurrent.CompletableFuture.join(java.base@11.0.5/CompletableFuture.java:2043)
        at io.lettuce.core.RedisChannelHandler.close(RedisChannelHandler.java:126)
        at io.lettuce.core.cluster.RedisClusterClient.lambda$connectToNodeAsync$1(RedisClusterClient.java:504)
        at io.lettuce.core.cluster.RedisClusterClient$$Lambda$226/0x0000000100519040.accept(Unknown Source)
        at java.util.concurrent.CompletableFuture.uniWhenComplete(java.base@11.0.5/CompletableFuture.java:859)
        at java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(java.base@11.0.5/CompletableFuture.java:837)
        at java.util.concurrent.CompletableFuture.postComplete(java.base@11.0.5/CompletableFuture.java:506)
        at java.util.concurrent.CompletableFuture.complete(java.base@11.0.5/CompletableFuture.java:2073)
        at io.lettuce.core.AbstractRedisClient.lambda$null$3(AbstractRedisClient.java:341)
        at io.lettuce.core.AbstractRedisClient$$Lambda$249/0x0000000100578c40.accept(Unknown Source)
        at java.util.concurrent.CompletableFuture.uniWhenComplete(java.base@11.0.5/CompletableFuture.java:859)
        at java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(java.base@11.0.5/CompletableFuture.java:837)
        at java.util.concurrent.CompletableFuture.postComplete(java.base@11.0.5/CompletableFuture.java:506)
        at java.util.concurrent.CompletableFuture.complete(java.base@11.0.5/CompletableFuture.java:2073)
        at io.lettuce.core.SslConnectionBuilder$SslChannelInitializer$2.userEventTriggered(SslConnectionBuilder.java:211)
        at io.netty.channel.AbstractChannelHandlerContext.invokeUserEventTriggered(AbstractChannelHandlerContext.java:329)
        at io.netty.channel.AbstractChannelHandlerContext.invokeUserEventTriggered(AbstractChannelHandlerContext.java:315)
        at io.netty.channel.AbstractChannelHandlerContext.fireUserEventTriggered(AbstractChannelHandlerContext.java:307)
        at io.netty.channel.ChannelInboundHandlerAdapter.userEventTriggered(ChannelInboundHandlerAdapter.java:108)
        at io.netty.handler.codec.ByteToMessageDecoder.userEventTriggered(ByteToMessageDecoder.java:366)
        at io.netty.channel.AbstractChannelHandlerContext.invokeUserEventTriggered(AbstractChannelHandlerContext.java:329)
        at io.netty.channel.AbstractChannelHandlerContext.invokeUserEventTriggered(AbstractChannelHandlerContext.java:315)
        at io.netty.channel.AbstractChannelHandlerContext.fireUserEventTriggered(AbstractChannelHandlerContext.java:307)
        at io.netty.channel.ChannelInboundHandlerAdapter.userEventTriggered(ChannelInboundHandlerAdapter.java:108)
        at io.netty.channel.AbstractChannelHandlerContext.invokeUserEventTriggered(AbstractChannelHandlerContext.java:329)
        at io.netty.channel.AbstractChannelHandlerContext.invokeUserEventTriggered(AbstractChannelHandlerContext.java:315)
        at io.netty.channel.AbstractChannelHandlerContext.fireUserEventTriggered(AbstractChannelHandlerContext.java:307)
        at io.netty.channel.DefaultChannelPipeline$HeadContext.userEventTriggered(DefaultChannelPipeline.java:1452)
        at io.netty.channel.AbstractChannelHandlerContext.invokeUserEventTriggered(AbstractChannelHandlerContext.java:329)
        at io.netty.channel.AbstractChannelHandlerContext.invokeUserEventTriggered(AbstractChannelHandlerContext.java:315)
        at io.netty.channel.DefaultChannelPipeline.fireUserEventTriggered(DefaultChannelPipeline.java:959)
        at io.lettuce.core.protocol.CommandHandler.lambda$channelActive$0(CommandHandler.java:273)
        at io.lettuce.core.protocol.CommandHandler$$Lambda$261/0x000000010057c040.run(Unknown Source)
        at io.netty.util.concurrent.PromiseTask$RunnableAdapter.call(PromiseTask.java:38)
        at io.netty.util.concurrent.PromiseTask.run(PromiseTask.java:73)
        at io.netty.util.concurrent.AbstractEventExecutor.safeExecute(AbstractEventExecutor.java:163)
        at io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:404)
        at io.netty.channel.epoll.EpollEventLoop.run(EpollEventLoop.java:335)
        at io.netty.util.concurrent.SingleThreadEventExecutor$5.run(SingleThreadEventExecutor.java:897)
        at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
        at java.lang.Thread.run(java.base@11.0.5/Thread.java:834)

   Locked ownable synchronizers:
        - None

Environment

  • Lettuce version(s): 5.1.8.RELEASE
  • Redis version: 4 (Azure Cache for Redis)
@mp911de
Copy link
Collaborator

mp911de commented Jun 8, 2021

Thanks for report. It's a bug in Lettuce where a blocking call happens on a thread that must not be blocked:

        at io.lettuce.core.RedisChannelHandler.close(RedisChannelHandler.java:126)
        at io.lettuce.core.cluster.RedisClusterClient.lambda$connectToNodeAsync$1(RedisClusterClient.java:504)
        at io.lettuce.core.cluster.RedisClusterClient$$Lambda$226/0x0000000100519040.accept(Unknown Source)

@mp911de mp911de added the type: bug A general bug label Jun 8, 2021
@mp911de mp911de added this to the 6.0.6 milestone Jun 8, 2021
mp911de added a commit that referenced this issue Jun 9, 2021
When a cluster node connection fails, we call now the asynchronous closeAsync method instead of the blocking close method.
mp911de added a commit that referenced this issue Jun 9, 2021
When a cluster node connection fails, we call now the asynchronous closeAsync method instead of the blocking close method.
mp911de added a commit that referenced this issue Jun 9, 2021
When a cluster node connection fails, we call now the asynchronous closeAsync method instead of the blocking close method.
@mp911de
Copy link
Collaborator

mp911de commented Jun 9, 2021

That's fixed now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type: bug A general bug
Projects
None yet
Development

No branches or pull requests

2 participants