Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Understanding clusterRetryStrategy (after Failed to refresh slots cache) #1062

Open
jeremytm opened this issue Feb 18, 2020 · 3 comments
Open

Comments

@jeremytm
Copy link

ioredis version: 4.15.1
Running on elasticache cluster. Code via lambda.

Everything works flawlessly most of the time. However as our project is scaling up, very occasionally we have started seeing Failed to refresh slots cache errors, especially in our longer running scripts.

It's my understanding that clusterRetryStrategy should be called before ioredis throws any errors. From ioredis readme:

When a number is returned, ioredis will try to reconnect to the startup nodes from scratch after the specified delay (in ms). Otherwise, an error of "None of startup nodes is available" will be returned.

However, our logs are showing an error before clusterRetryStrategy is called (we are logging from the retry function).

image

In addition, we are returning a number from clusterRetryStrategy, but it doesn't seem to have any effect. clusterRetryStrategy is only called once with 1 as the argument, and then the error flow begins and our code fails.

In Summary:

  1. Should we be seeing any errors such as "None of startup nodes is available" before clusterRetryStrategy is ever called? (If not I think there's a bug).
  2. How do we get clusterRetryStrategy to actually cause a reconnection? Are we supposed to be catching these errors somewhere so that ioredis actually has time to retry?
@JordanPawlett
Copy link

I'm experiencing something similar. ClusterAllFailedError: Failed to refresh slots cache error, Then clusterRetryStrategy is called repeatedly in quick succession with the first argument 1 every-time.

I will investigate and let you know if i find anything.

@roim
Copy link

roim commented Apr 16, 2021

Having a similar issue. Eventually our node.js apps start hitting that error repeatedly and get stuck in an infinite loop. After restarting the app, connection to redis is reestablished.

@trademark18
Copy link

trademark18 commented Jun 14, 2021

I too see that the error is being thrown before clusterRetryStrategy is called in version 4.27.6

From the Readme:

When none of the startup nodes are reachable, clusterRetryStrategy will be invoked. When a number is returned, ioredis will try to reconnect to the startup nodes from scratch after the specified delay (in ms). Otherwise, an error of "None of startup nodes is available" will be returned.

This seems clear that if clusterRetryStrategy is defined and returns a number an error will not be generated.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants