-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Deadlock observed during network issues accessing Redis #2376
Comments
Oof. Well this looks a gnarly one. Excellent stack dumps, thanks. I'll try
to look at this on Monday.
…On Fri, 17 Feb 2023, 17:54 Joel Mough, ***@***.***> wrote:
Hi,
We recently observed a deadlock in one of our applications that is
reaching out to Redis. In both cases, there was an issue on one of the
Redis replicas for several minutes that took a replica offline. When it
came back, the services did not recover gracefully and, after getting a
crash dump, DebugDiag highlighted the following deadlock (I added some
notes on the lock locations in bold):
Thread 1 -
[[GCFrame]]
[[GCFrame]]
[[HelperMethodFrame] (System.Threading.Monitor.Enter)]
System.Threading.Monitor.Enter(System.Object) *Attempts to lock on
_writtenAwaitingResponse, deadlocks*
StackExchange.Redis.PhysicalConnection.GetHeadMessages(StackExchange.Redis.Message
ByRef, StackExchange.Redis.Message ByRef)+47
StackExchange.Redis.ExceptionFactory.AddCommonDetail(System.Collections.Generic.List1>,
System.Text.StringBuilder, StackExchange.Redis.Message,
StackExchange.Redis.ConnectionMultiplexer,
StackExchange.Redis.ServerEndPoint)+78
StackExchange.Redis.ExceptionFactory.Timeout(StackExchange.Redis.ConnectionMultiplexer,
System.String, StackExchange.Redis.Message,
StackExchange.Redis.ServerEndPoint, System.Nullable1)+4b3
StackExchange.Redis.ConnectionMultiplexer.ExecuteSyncImpl[[StackExchange.Redis.RedisValue,
StackExchange.Redis]](StackExchange.Redis.Message,
StackExchange.Redis.ResultProcessor1, StackExchange.Redis.ServerEndPoint,
StackExchange.Redis.RedisValue)+199 **Grabs lock on result box**
StackExchange.Redis.RedisBase.ExecuteSync[[StackExchange.Redis.RedisValue,
StackExchange.Redis]](StackExchange.Redis.Message,
StackExchange.Redis.ResultProcessor1, StackExchange.Redis.ServerEndPoint,
StackExchange.Redis.RedisValue)+af
StackExchange.Redis.RedisDatabase.StringGet(StackExchange.Redis.RedisKey,
StackExchange.Redis.CommandFlags)+e5
Thread 2:
[[HelperMethodFrame] (System.Threading.Monitor.Enter)]
System.Threading.Monitor.Enter(System.Object) *Trying to lock the result
box*
StackExchange.Redis.SimpleResultBox.StackExchange.Redis.IResultBox.ActivateContinuations()+2f
StackExchange.Redis.Message.Complete()+4c
StackExchange.Redis.PhysicalConnection.RecordConnectionFailed(StackExchange.Redis.ConnectionFailureType,
System.Exception, System.String, Boolean,
System.IO.Pipelines.IDuplexPipe)+e57 - *This line grabs a lock on
_writtenAwaitingResponse*
StackExchange.Redis.PhysicalConnection+d__115.MoveNext()+8c9
[[HelperMethodFrame]]
mscorlib_ni!System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw()+20
System.IO.Pipelines.PipeCompletion.ThrowLatchedException()+e
System.IO.Pipelines.Pipe.GetReadResult(System.IO.Pipelines.ReadResult
ByRef)+46
System.IO.Pipelines.Pipe.GetReadAsyncResult()+e0
StackExchange.Redis.PhysicalConnection+d__115.MoveNext()+558
mscorlib_ni!System.Threading.ExecutionContext.RunInternal(System.Threading.ExecutionContext,
System.Threading.ContextCallback, System.Object, Boolean)+172
mscorlib_ni!System.Threading.ExecutionContext.Run(System.Threading.ExecutionContext,
System.Threading.ContextCallback, System.Object, Boolean)+15
mscorlib_ni!System.Runtime.CompilerServices.AsyncMethodBuilderCore+MoveNextRunner.Run()+6f
Pipelines.Sockets.Unofficial.DedicatedThreadPoolPipeScheduler.Execute(System.Action`1,
System.Object)+1e
Pipelines.Sockets.Unofficial.DedicatedThreadPoolPipeScheduler.RunWorkLoop()+11a
mscorlib_ni!System.Threading.ExecutionContext.RunInternal(System.Threading.ExecutionContext,
System.Threading.ContextCallback, System.Object, Boolean)+172
mscorlib_ni!System.Threading.ExecutionContext.Run(System.Threading.ExecutionContext,
System.Threading.ContextCallback, System.Object, Boolean)+15
mscorlib_ni!System.Threading.ExecutionContext.Run(System.Threading.ExecutionContext,
System.Threading.ContextCallback, System.Object)+55
mscorlib_ni!System.Threading.ThreadHelper.ThreadStart(System.Object)+60
In the second thread, it looks like that thread grabs a lock on
_writtenAwaitingResponse in PhsyicalConnection.cs, then gets stuck on the
lock in SimpleResultBox.ActivateContinuations. The first thread meanwhile
has already grabbed the continuations lock and is trying to grab the
_writtenAwaitingResponse in GetHeadMessages.
We are currently on library version 2.6.66, and the application is a .NET
Framework 4.8 Windows Service.
Thank you!
—
Reply to this email directly, view it on GitHub
<#2376>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAAEHMGUKHKL5JNZBDS4LKTWX63N7ANCNFSM6AAAAAAU7WU4HQ>
.
You are receiving this because you are subscribed to this thread.Message
ID: ***@***.***>
|
PR: #2378 |
What is intriguing to me: I'm assuming that this error is rare, but: that could be entirely wrong - it could be happening way more than we have ever realized, just: until this ticket we were ignorant and dismissing it as noise. Anyhow, the PR above addresses this in 3 different ways - it fixes both the paths above to never require both locks simultaneously, and it fixes a more general exception data scenario to use a lock-with-timeout, in case there are some other as-yet-unidentified routes Great report, thanks; I propose we deploy this one without too much delay |
2.6.96 is now available on myget: https://github.com/StackExchange/StackExchange.Redis/#package-status |
2.6.96 is now available on NuGet; I don't know how often this problem was occurring for you, but: probably worth updating! |
Thanks for looking into this so quickly! We'll go ahead and update soon. |
Hi,
We recently observed a deadlock in one of our applications that is reaching out to Redis. In both cases, there was an issue on one of the Redis replicas for several minutes that took a replica offline. When it came back, the services did not recover gracefully and, after getting a crash dump, DebugDiag highlighted the following deadlock (I added some notes on the lock locations in bold):
Thread 1 -
[[GCFrame]]
[[GCFrame]]
[[HelperMethodFrame] (System.Threading.Monitor.Enter)] System.Threading.Monitor.Enter(System.Object) Attempts to lock on _writtenAwaitingResponse, deadlocks
StackExchange.Redis.PhysicalConnection.GetHeadMessages(StackExchange.Redis.Message ByRef, StackExchange.Redis.Message ByRef)+47
StackExchange.Redis.ExceptionFactory.AddCommonDetail(System.Collections.Generic.List
1>, System.Text.StringBuilder, StackExchange.Redis.Message, StackExchange.Redis.ConnectionMultiplexer, StackExchange.Redis.ServerEndPoint)+78 StackExchange.Redis.ExceptionFactory.Timeout(StackExchange.Redis.ConnectionMultiplexer, System.String, StackExchange.Redis.Message, StackExchange.Redis.ServerEndPoint, System.Nullable
1)+4b3StackExchange.Redis.ConnectionMultiplexer.ExecuteSyncImpl[[StackExchange.Redis.RedisValue, StackExchange.Redis]](StackExchange.Redis.Message, StackExchange.Redis.ResultProcessor
1, StackExchange.Redis.ServerEndPoint, StackExchange.Redis.RedisValue)+199 **Grabs lock on result box** StackExchange.Redis.RedisBase.ExecuteSync[[StackExchange.Redis.RedisValue, StackExchange.Redis]](StackExchange.Redis.Message, StackExchange.Redis.ResultProcessor
1, StackExchange.Redis.ServerEndPoint, StackExchange.Redis.RedisValue)+afStackExchange.Redis.RedisDatabase.StringGet(StackExchange.Redis.RedisKey, StackExchange.Redis.CommandFlags)+e5
Thread 2:
[[HelperMethodFrame] (System.Threading.Monitor.Enter)] System.Threading.Monitor.Enter(System.Object) Trying to lock the result box
StackExchange.Redis.SimpleResultBox.StackExchange.Redis.IResultBox.ActivateContinuations()+2f
StackExchange.Redis.Message.Complete()+4c
StackExchange.Redis.PhysicalConnection.RecordConnectionFailed(StackExchange.Redis.ConnectionFailureType, System.Exception, System.String, Boolean, System.IO.Pipelines.IDuplexPipe)+e57 - This line grabs a lock on _writtenAwaitingResponse
StackExchange.Redis.PhysicalConnection+d__115.MoveNext()+8c9
[[HelperMethodFrame]]
mscorlib_ni!System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw()+20
System.IO.Pipelines.PipeCompletion.ThrowLatchedException()+e
System.IO.Pipelines.Pipe.GetReadResult(System.IO.Pipelines.ReadResult ByRef)+46
System.IO.Pipelines.Pipe.GetReadAsyncResult()+e0
StackExchange.Redis.PhysicalConnection+d__115.MoveNext()+558
mscorlib_ni!System.Threading.ExecutionContext.RunInternal(System.Threading.ExecutionContext, System.Threading.ContextCallback, System.Object, Boolean)+172
mscorlib_ni!System.Threading.ExecutionContext.Run(System.Threading.ExecutionContext, System.Threading.ContextCallback, System.Object, Boolean)+15
mscorlib_ni!System.Runtime.CompilerServices.AsyncMethodBuilderCore+MoveNextRunner.Run()+6f
Pipelines.Sockets.Unofficial.DedicatedThreadPoolPipeScheduler.Execute(System.Action`1, System.Object)+1e
Pipelines.Sockets.Unofficial.DedicatedThreadPoolPipeScheduler.RunWorkLoop()+11a
mscorlib_ni!System.Threading.ExecutionContext.RunInternal(System.Threading.ExecutionContext, System.Threading.ContextCallback, System.Object, Boolean)+172
mscorlib_ni!System.Threading.ExecutionContext.Run(System.Threading.ExecutionContext, System.Threading.ContextCallback, System.Object, Boolean)+15
mscorlib_ni!System.Threading.ExecutionContext.Run(System.Threading.ExecutionContext, System.Threading.ContextCallback, System.Object)+55
mscorlib_ni!System.Threading.ThreadHelper.ThreadStart(System.Object)+60
In the second thread, it looks like that thread grabs a lock on _writtenAwaitingResponse in PhsyicalConnection.cs, then gets stuck on the lock in SimpleResultBox.ActivateContinuations. The first thread meanwhile has already grabbed the continuations lock and is trying to grab the _writtenAwaitingResponse in GetHeadMessages.
We are currently on library version 2.6.66, and the application is a .NET Framework 4.8 Windows Service.
Thank you!
The text was updated successfully, but these errors were encountered: