Understanding clusterRetryStrategy (after Failed to refresh slots cache) #1062

jeremytm · 2020-02-18T02:18:38Z

ioredis version: 4.15.1
Running on elasticache cluster. Code via lambda.

Everything works flawlessly most of the time. However as our project is scaling up, very occasionally we have started seeing Failed to refresh slots cache errors, especially in our longer running scripts.

It's my understanding that clusterRetryStrategy should be called before ioredis throws any errors. From ioredis readme:

When a number is returned, ioredis will try to reconnect to the startup nodes from scratch after the specified delay (in ms). Otherwise, an error of "None of startup nodes is available" will be returned.

However, our logs are showing an error before clusterRetryStrategy is called (we are logging from the retry function).

In addition, we are returning a number from clusterRetryStrategy, but it doesn't seem to have any effect. clusterRetryStrategy is only called once with 1 as the argument, and then the error flow begins and our code fails.

In Summary:

Should we be seeing any errors such as "None of startup nodes is available" before clusterRetryStrategy is ever called? (If not I think there's a bug).
How do we get clusterRetryStrategy to actually cause a reconnection? Are we supposed to be catching these errors somewhere so that ioredis actually has time to retry?

The text was updated successfully, but these errors were encountered:

JordanPawlett · 2021-04-01T11:54:08Z

I'm experiencing something similar. ClusterAllFailedError: Failed to refresh slots cache error, Then clusterRetryStrategy is called repeatedly in quick succession with the first argument 1 every-time.

I will investigate and let you know if i find anything.

roim · 2021-04-16T02:24:21Z

Having a similar issue. Eventually our node.js apps start hitting that error repeatedly and get stuck in an infinite loop. After restarting the app, connection to redis is reestablished.

trademark18 · 2021-06-14T19:41:15Z

I too see that the error is being thrown before clusterRetryStrategy is called in version 4.27.6

From the Readme:

When none of the startup nodes are reachable, clusterRetryStrategy will be invoked. When a number is returned, ioredis will try to reconnect to the startup nodes from scratch after the specified delay (in ms). Otherwise, an error of "None of startup nodes is available" will be returned.

This seems clear that if clusterRetryStrategy is defined and returns a number an error will not be generated.

vaughandroid mentioned this issue Jun 17, 2021

ClusterAllFailedError on version 4.24.1 #1330

Closed

rarecrumb mentioned this issue Sep 12, 2023

Improve Cluster mode documentation, especially in AWS ElastiCache with TLS #1816

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Understanding clusterRetryStrategy (after Failed to refresh slots cache) #1062

Understanding clusterRetryStrategy (after Failed to refresh slots cache) #1062

jeremytm commented Feb 18, 2020

JordanPawlett commented Apr 1, 2021

roim commented Apr 16, 2021 •

edited

Loading

trademark18 commented Jun 14, 2021 •

edited

Loading

Understanding clusterRetryStrategy (after Failed to refresh slots cache) #1062

Understanding clusterRetryStrategy (after Failed to refresh slots cache) #1062

Comments

jeremytm commented Feb 18, 2020

JordanPawlett commented Apr 1, 2021

roim commented Apr 16, 2021 • edited Loading

trademark18 commented Jun 14, 2021 • edited Loading

roim commented Apr 16, 2021 •

edited

Loading

trademark18 commented Jun 14, 2021 •

edited

Loading