-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ClusterAllFailedError on version 4.24.1 #1330
Comments
Hi @sundeqvist, thanks for raising this issue! Before 4.24.1, ioredis asked cluster nodes for cluster slot information when connecting and periodically after connected. If all cluster nodes failed to provide the information (ex all nodes were down), ioredis would raise the "Failed to refresh slots cache" error and reconnect to the cluster (and print debug log However, after 4.24.1, ioredis will raise and reconnect to the cluster even the cluster has already connected. This change is introduced to make failover detection faster. For your case, I'd suggest listen to the "node error" event ( |
Hello,
|
Hi @lvkmahesh , I just had a talk with @leibale about the issue. Actually, can you add a listener for the "node error" event (just like the reply I posted) and post the errors here? We'd like to better understand what caused the issue. |
Thanks for looking at this so quickly. The trace that is emitted in the Error: timeout |
Same here after upgrade 4.19.2 => 4.27.1 We see |
I believe the commit might have failed to build and deploy. @luin Could you have another look? |
## [4.27.2](v4.27.1...v4.27.2) (2021-05-04) ### Bug Fixes * **cluster:** avoid ClusterAllFailedError in certain cases ([aa9c5b1](aa9c5b1)), closes [#1330](#1330)
🎉 This issue has been resolved in version 4.27.2 🎉 The release is available on: Your semantic-release bot 📦🚀 |
Hi all, I'm on 4.27.6 and I'm having a similar issue where I see this error just when under a heavy load:
And then the Lambda ends with this error message: {
"errorType": "Error",
"errorMessage": "None of startup nodes is available",
"stack": [
"Error: None of startup nodes is available",
" at Cluster.closeListener (/var/task/node_modules/ioredis/built/cluster/index.js:184:35)",
" at Object.onceWrapper (events.js:482:28)",
" at Cluster.emit (events.js:388:22)",
" at /var/task/node_modules/ioredis/built/cluster/index.js:367:18",
" at processTicksAndRejections (internal/process/task_queues.js:77:11)"
]
} Here's my cluster = new Redis.Cluster(
[
{ 'host': redisSecret.host }
],
{
dnsLookup: (address, callback) => callback(null, address),
redisOptions: {
tls: true,
},
clusterRetryStrategy: (times) => {
const ms = Math.min(100 * times, 2000);
console.log(`Cluster retry #${times}: Will wait ${ms} ms`);
return ms;
}
}
); Questions:
Sorry to comment on a closed issue but this is the only place I've found anyone talking about seeing this error only under heavy load as opposed to just incorrect network configuration or similar. |
Need help to drill down the following issue
I'm facing the same issue where the new lambda version release started throwing a massive number of errors ioredis version AWS Redis cluster metrics look good and healthy, it is confirmed by AWS tech support too. Redis Cluster Initilization
|
@bkvaiude I looked into this a bit. The "ClusterAllFailedError: Failed to refresh slots cache." is thrown prior to ioredis attempting to reconnect to the cluster. In other words, it's a recoverable error. On my team we're tracking a metric for them but not treating them as an error that needs to be addressed. Ideally I would like get a clear signal when it can't manage to reconnect. I thought to do that using @trademark18 Again, I think the "Failed to refresh slots cache" errors can be treated as warnings. It looks like something is closing the connection (you can see it originates from |
Thanks, @vaughandroid for sharing your experience, it is really insightful and helpful. The problem which I faced was pretty much stupid, because of the wrong AWS ElasticCache Clustered without TLS and AUTH The problematic situation for the developer: The IORedis doesn't provide the right information for generated error. When we try to replicate the issue locally with a trial and error approach, we received the same error for the following case
Now, you see it is difficult to process the error details for the above use cases and act on them. In our case, we doubted the Redis and started looking into the performance metrics @trademark18 I hope IORedis can consider this feedback and do the needful changes with error handling documentation And also Thank you very much! |
## [4.27.2](redis/ioredis@v4.27.1...v4.27.2) (2021-05-04) ### Bug Fixes * **cluster:** avoid ClusterAllFailedError in certain cases ([aa9c5b1](redis/ioredis@aa9c5b1)), closes [#1330](redis/ioredis#1330)
Hey! We're using ioredis with our AWS ElastiCache cluster running 3 shards on version 5.0.5. We're making the redis calls through a lambda with quite high traffic, meaning multiple concurrent lambdas running.
I've done some version bumping and I'm seeing the error
ClusterAllFailedError: Failed to refresh slots cache
intermittently. Through some debugging I've narrowed version 4.24.1 to be the culprit - any version before that works fine. When settingDEBUG=ioredis:*
in the lambda env theClusterAllFailedError: Failed to refresh slots cache
is in most cases followed by these logs:When looking at the 4.24.1 commit 8524eea I can tell that code related to this error has been touched - could this fix have introduced unintended issues? Any pointers would be appreciated 👍
The text was updated successfully, but these errors were encountered: