Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

resubscribe logic looks ignoring sharded Pub/Sub (shardChannels Set of PubSubEndpoint) #2940

Closed
saiya opened this issue Aug 5, 2024 · 5 comments · Fixed by #3024
Closed
Assignees
Labels
status: good-first-issue An issue that can only be worked on by brand new contributors status: help-wanted An issue that a contributor can help us with type: bug A general bug
Milestone

Comments

@saiya
Copy link

saiya commented Aug 5, 2024

Bug Report

Thank you for adding support for sharded PubSub in the 6.4.0.
Related to that implementation, I would like to confirm about resubscribe (subscription recovery after reconnection) behavior.
It looks like resubscribe handling only non-sharded PubSub subscriptions.

Current Behavior

Sharded PubSub subscriptions looks not recovered after disconnection + re-connection, despite Lettuce automatically recovers non-sharded PubSub subscriptions (the recovery behavior is very helpful for applications, and also difficult / error-prone to implement by ourselves).

In the 1457486#diff-73f0c5ac0f7d6d94f8d47e1a75564aa344cbbbd4c91ce786711b5c3673044956, PubSubEndpoint.shardChannels (private) have been added but it looks that variable is private inside PubSubEndpoint and no any code of PubSubEndpoint reading that variable's data. Also implementations of StatefulRedisPubSubConnectionImpl.resubscribe and StatefulRedisClusterPubSubConnectionImpl.resubscribe does not accessing shardChannels.

Input Code

Just make sharded Pub/Sub connection, and wait until some disconnection + re-connection occurs on that environment...

Input Code
StatefulRedisPubSubConnection<String, String> connection = client.connectPubSub()
connection.addListener(new RedisPubSubAdapter<String, String>() { ... })

RedisPubSubCommands<String, String> sync = connection.sync();
sync.ssubscribe("channel"); // Sharded subscription

Expected behavior/code

As same as non-sharded Pub/Sub, automatically recover sharded Pub/Sub subscriptions when possible to connect to that shard/node.
( May be a bit different logic from non-sharded Pub/Sub needed. Because the SSUBSCRIBE requires living connection to the particular shard's node(s) )

Environment

  • Lettuce version(s): 6.4.0
  • Redis version: 7.1 (I belive this issue happens even with Redis 7.0)

Possible Solution

(Not tested) Watch the connection's state somehow and manually call ssubscribe again by application's logic.

Additional context

Because currently I don't have full access to a target environment / Redis cluster, this issue is based on best guess by reading code. I hope this code reading makes some sense...

@saiya
Copy link
Author

saiya commented Aug 5, 2024

(Just related topic) Also Javadoc of RedisClusterClient#connectPubSub and StatefulRedisClusterPubSubConnection are bit conflicting each other. I assume the StatefulRedisClusterPubSubConnection one is correct:

RedisClusterClient#connectPubSub:

 * <li>Pub/sub commands are sent to the node with the least number of clients</li>

StatefulRedisClusterPubSubConnection:

The connection provides transparent command routing based on the first command key.

↑ By reading code, this one (routing based on first command key) is the truth and that is why ssubscribe (but also subscribe, despite it can be sent to any shard/node) implementation always sends command to the key slot's shard/node

@tishun tishun added this to the Backlog milestone Aug 5, 2024
@tishun tishun added the type: bug A general bug label Aug 5, 2024
@tishun
Copy link
Collaborator

tishun commented Aug 5, 2024

Hey @saiya , thanks for the report, I think you are correct and sharded subscriptions are not resubscribed right now.
I will put this in our backlog so we can address it as soon as time permits.

@tishun
Copy link
Collaborator

tishun commented Aug 5, 2024

By reading code, this one (routing based on first command key) is the truth and that is why ssubscribe (but also subscribe, despite it can be sent to any shard/node) implementation always sends command to the key slot's shard/node

Both, to the best of my own understanding:

  • channel names are keys and the command will be send to a node handling the slot that handles this key
  • in case more than one shard is handling this slot (replica nodes) we subscribe to the one with the least number of clients

Does that make sense?

@saiya
Copy link
Author

saiya commented Aug 5, 2024

Thank you for the agile response! I will await for the fix.

@saiya
Copy link
Author

saiya commented Aug 5, 2024

Both, to the best of my own understanding:

  • channel names are keys and the command will be send to a node handling the slot that handles this key
  • in case more than one shard is handling this slot (replica nodes) we subscribe to the one with the least number of clients
    Does that make sense?

I see; yes indeed that make sense.

Now I understand the RedisClusterClient#connectPubSub document's intention; Another list item saying <li>Keyless commands are send to the default connection</li>, therefore that list item talking about not-keyless command implictly.

@tishun tishun added status: help-wanted An issue that a contributor can help us with status: good-first-issue An issue that can only be worked on by brand new contributors labels Sep 19, 2024
@tishun tishun modified the milestones: Backlog, 6.5.0.RELEASE Oct 23, 2024
@ggivo ggivo self-assigned this Oct 24, 2024
ggivo added a commit to ggivo/lettuce that referenced this issue Oct 24, 2024
Sharded PubSub subscriptions not recovered after disconnection and re-connection.
ggivo added a commit to ggivo/lettuce that referenced this issue Oct 24, 2024
ggivo added a commit to ggivo/lettuce that referenced this issue Oct 24, 2024
…sconnection and re-connection.

Backport to 6.4.x branch
tishun pushed a commit that referenced this issue Oct 24, 2024
tishun pushed a commit that referenced this issue Oct 24, 2024
…ection and re-connection. (#3026)

Backport to 6.4.x branch
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
status: good-first-issue An issue that can only be worked on by brand new contributors status: help-wanted An issue that a contributor can help us with type: bug A general bug
Projects
None yet
3 participants