Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve the reliability of the refresh process due to the loss of slots #52

Merged
merged 2 commits into from
Oct 20, 2023
Merged
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
8 changes: 7 additions & 1 deletion cluster.go
Original file line number Diff line number Diff line change
Expand Up @@ -106,7 +106,13 @@ func (c *Cluster) refresh(bg bool) error {
var errMsgs []string
var oldm, newm [HashSlots][]string

addrs, _ := c.getNodeAddrs(false)
// get all node addrs, including replicas
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that this does not return all node addresses, it will return the replicas if it knows about any, otherwise the masters. I'm not sure I like switching the refresh to get cluster slots from replicas - it should be ok, but it might need a READONLY call to make the replica accept the command (I'm not sure since this is not serving a key, it's an administration command), we'd have to test that and it might differ based on redis version...

I'll come back to review it when I have a bit more time to get back in context, but I think we should address that in getNodeAddrs instead and reuse the StartupNodes only if both c.masters and c.replicas are empty, and here in refresh fall back to using the replicas only if getNodeAddrs(false) returns an empty list of addresses.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

here in refresh fall back to using the replicas only if getNodeAddrs(false) returns an empty list of addresses.

And making sure the replicas do serve the CLUSTER SLOTS command without a READONLY request first.

Copy link
Author

@ljfuyuan ljfuyuan Oct 16, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed, I made the mistake of getNodeAddrs(true) will not return all nodes.

And making sure the replicas do serve the CLUSTER SLOTS command without a READONLY request first

I did tests on redis 5.0.8 and 6.2.13 clusters. The replica node can correctly return cluster slots information without READONLY

So, based on the above, can we try to do this? At the same time, the semantics of preferReplicas can be preserved .

        addrs, _ := c.getNodeAddrs(false)
	if len(addrs) == 0 {
		if addrs, _ = c.getNodeAddrs(true); len(addrs) == 0 {
			addrs = c.StartupNodes
		}
	}

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I think that would be good.

addrs, _ := c.getNodeAddrs(true)
// when addrs cannot be obtained, StartupNodes is always used to populate
if len(addrs) == 0 {
addrs = c.StartupNodes
}

for _, addr := range addrs {
m, err := c.getClusterSlots(addr)
if err != nil {
Expand Down