-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Cluster: Failed to refresh slots cache. when options.redisOptions.enableOfflineQueue = false #581
Comments
Same here; Redis-logs are also printing - Client closed connection Err Any solutions ? |
This issue has been automatically marked as stale because it has not had recent activity. It will be closed after 7 days if no further activity occurs, but feel free to re-open a closed issue if needed. |
This is clearly a defect. I'd try to root cause it and submit a PR, but I'm not familiar enough with the code and unfortunately don't have the time to figure it out. While the obviously work around is to not set that option, it is not at all clear what the implications are as the both exists at both the cluster level and the individual server level. At a minimum, the documentation needs to be updated to clarify the behavior of these options for a Redis Cluster. |
This issue has been automatically marked as stale because it has not had recent activity. It will be closed after 7 days if no further activity occurs, but feel free to re-open a closed issue if needed. |
Commenting to keep the issue open. |
This issue has been automatically marked as stale because it has not had recent activity. It will be closed after 7 days if no further activity occurs, but feel free to re-open a closed issue if needed. |
@luin Any chance this will be addressed? |
This issue has been automatically marked as stale because it has not had recent activity. It will be closed after 7 days if no further activity occurs, but feel free to re-open a closed issue if needed. |
unstale |
Hey @ccs018, I took a look at your issue. The problem is that currently, ioredis tries to refresh the slots map before actually connecting to the cluster, which causes it to fail and disconnect from the node before it managed to connect (Because you don't have the offline queue enabled). From my understanding, the One possible fix for this is to move the |
@shaharmor , thanks for taking a look at this.
Thanks for clarifying. The documentation is not clear which redis node options are applicable in cluster mode. In my use case, queuing commands when offline would likely cause issues as I can have multiple clients attempting to update the same keys and if one client queued an update while its connection was down and another client was able to make another update, then when the first client reconnected it would incorrectly clobber the other client's update (the updates would be made out-of-order).
I believe that this is incorrect behavior. This can typically be a temporary event and ioredis should not be unilaterally removing the node from the cluster nodes list. It should query the cluster for the topology since when the node recovers ioredis is not proactively adding the node back to the cluster nodes list. Consider a rolling upgrade scenario. As each node is taken down to be upgraded, ioredis is removing the node from the cluster nodes list. Eventually, there will be none left. Restarting clients is not an acceptable recovery when one is trying to design a zero-outage system - which includes upgrades. |
Just saw the PR - Looks like this will fix a couple similar issues. Though I still don't believe that when a node goes offline that it should be removed from the cluster nodes list. Just mark it as offline and periodically attempt to reconnect. |
The problem with this is that a cluster can sometimes have nodes that will never come up again, in cases such as auto scaling where a machine goes up, and when it goes down it will never return, which will cause ioredis to basically leak nodes. I agree that there are fixes that should be done in the way we handle cluster nodes. I have talked with @luin and we will take a look at it next week. Sorry for the delay. The "moved" event will cause ioredis to re-fetch the nodes list, and now with my PR it will do that periodically as well, but I agree that we should probably trigger it on a |
Again, thanks for digging into this. I have some rather ugly code that periodically performs a CLUSTER NODES and compares the count with cluster.nodes and if the count is different I destroy and recreate the ioredis client. Not the cleanest or more efficient. Similarly, to work around the issue captured here where the client never connects - I simply have a timeout to guard against this condition and again destroy and recreate the ioredis client. Looking forward to ripping that code out and completely rely on ioredis to maintain and automatically recover connections. |
any real workarounds? |
@heri16 v4.0.0-2 adds a fix for this that will refresh the slots periodically. |
This issue has been automatically marked as stale because it has not had recent activity. It will be closed after 7 days if no further activity occurs, but feel free to re-open a closed issue if needed. |
I use ioredis 4.14.1, and node 12.13.0. |
+1 |
1 similar comment
+1 |
If I pass
options
to new Cluster withoptions.redisOptions.enableOfflineQueue = false
, ioredis (3.2.2) fails to connect to the cluster. If I remove setting that specific option, then everything is good (went through them one-by-one).Below is sample code with the output when
options.redisOptions.enableOfflineQueue = false
.Note that I also have set
options.enableOfflineQueue = false
- it is not entirely clear if both need to be set. In my usage, I do NOT want commands to be queued when the cluster or the intended node is not available.If I comment out the one line, then all is good:
The text was updated successfully, but these errors were encountered: