You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Issue tracker is used for reporting bugs and discussing new features. Please use stackoverflow for supporting issues.
Expected Behavior
I would expect the client to retry commands related to executing the ClusterClient.Watch() method.
Current Behavior
Currently if there's a broken connection the TCP call will timeout after the configured or default (3 seconds) timeout, causing the whole method call to fail. This can happen as early as when trying to execute the initial WATCH command against the server. The Watch() method call returns with an error like redis: Conn is in a bad state: read tcp 123.123.123.123:37106->42.42.42.42:6379: i/o timeout
Possible Solution
Detect and unwrap a BadConnErr in shouldRetry before evaluating retry conditions.
Steps to Reproduce
N/A, this is an intermittent issue that occurs in a cloud environment.
Context (Environment)
Version: github.com/redis/go-redis/v9 v9.5.1
Detailed Description
Reading through the code, this happens because the Watch() method uses a Tx object, which uses a StickyConnPool:
When the initial command fails, it may get retried (depending on options) by baseClient._process, but a repeated call to StickyConnPool.Get will now return this BadConnErr, which will bubble out.
When the ClusterClient.Watch method machinery attempts a retry it will evaluate the error using shouldRetry which will return false, even though the wrapped error is a timeoutError.
Possible Implementation
Change shouldRetry to detect if the passed in err is BadConnErr and unwrap it before invoking the existing logic.
However note that there's a wrinkle in that if baseClient._process (which also uses shouldRetry) that is executing on a Tx object that is using the sticky pool now attempts a retry it will continue trying to get a conn that is in a bad state. There is the StickyConnPool.Reset() method that appears to be intended to restore the sticky pool to an uninitialized state so that calling Get on it can attempt to get a fresh conn from the underlying pool, however this method is not called anywhere in the code at the moment. It appears that there needs to be extra logic added to reset the sticky pool before retries are attempted.
The text was updated successfully, but these errors were encountered:
Issue tracker is used for reporting bugs and discussing new features. Please use
stackoverflow for supporting issues.
Expected Behavior
I would expect the client to retry commands related to executing the
ClusterClient.Watch()
method.Current Behavior
Currently if there's a broken connection the TCP call will timeout after the configured or default (3 seconds) timeout, causing the whole method call to fail. This can happen as early as when trying to execute the initial
WATCH
command against the server. TheWatch()
method call returns with an error likeredis: Conn is in a bad state: read tcp 123.123.123.123:37106->42.42.42.42:6379: i/o timeout
Possible Solution
Detect and unwrap a
BadConnErr
inshouldRetry
before evaluating retry conditions.Steps to Reproduce
N/A, this is an intermittent issue that occurs in a cloud environment.
Context (Environment)
Version:
github.com/redis/go-redis/v9 v9.5.1
Detailed Description
Reading through the code, this happens because the
Watch()
method uses aTx
object, which uses aStickyConnPool
:go-redis/tx.go
Lines 26 to 32 in 0f0a284
This StickyConnPool captures any errors that occur while executing commands and wraps them in the internal
BadConnErr
and stores internally:go-redis/internal/pool/pool_sticky.go
Line 122 in 0f0a284
When the initial command fails, it may get retried (depending on options) by
baseClient._process
, but a repeated call toStickyConnPool.Get
will now return thisBadConnErr
, which will bubble out.When the
ClusterClient.Watch
method machinery attempts a retry it will evaluate the error usingshouldRetry
which will return false, even though the wrapped error is atimeoutError
.Possible Implementation
Change
shouldRetry
to detect if the passed in err isBadConnErr
and unwrap it before invoking the existing logic.However note that there's a wrinkle in that if
baseClient._process
(which also usesshouldRetry
) that is executing on aTx
object that is using the sticky pool now attempts a retry it will continue trying to get aconn
that is in a bad state. There is theStickyConnPool.Reset()
method that appears to be intended to restore the sticky pool to an uninitialized state so that callingGet
on it can attempt to get a fresh conn from the underlying pool, however this method is not called anywhere in the code at the moment. It appears that there needs to be extra logic added to reset the sticky pool before retries are attempted.The text was updated successfully, but these errors were encountered: