Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

rabbitmq-streams delete_replica fails when the node of the replica to be deleted is down. #9282

Closed
kjnilsson opened this issue Sep 4, 2023 · 2 comments · Fixed by #9293
Closed
Assignees
Labels

Comments

@kjnilsson
Copy link
Contributor

Describe the bug

rabbit-3 is down in this case:

Removing a replica of queue s1 on node rabbit-3@nkarlVMD6R...
Error:
node_not_running

Reproduction steps

  1. create stream in a cluster
  2. stop a rabbit node
  3. try to delete the stream replica on that node
    ...

Expected behavior

The replica should be deleted irrespective of node status. I.e. even if the node isn't in rabbit cluster anymore.

Additional context

The stream coordinator would keep trying even if the command was let through, we also would need to report success back if the target node is no longer in the rabbit cluster.

https://groups.google.com/g/rabbitmq-users/c/62eUMwZHvOM/m/dnVou9pLAwAJ

@kjnilsson kjnilsson added the bug label Sep 4, 2023
@mkuratczyk mkuratczyk self-assigned this Sep 4, 2023
@arnaudmorin
Copy link

Extra note:
I was able to add a replica on some affected queues.
But some of them are failing adding a replica:

Adding a replica for queue q-agent-notifier-l2population-update_fanout on node rabbit@rabbit1...
Error:
{:disallowed, :out_of_sync_replica}

I have no idea if this could also be related to the same bug...

mkuratczyk added a commit that referenced this issue Sep 5, 2023
Otherwise we can't forget replicas on nodes that are no longer
cluster members.

Fixes #9282
mkuratczyk added a commit to rabbitmq/osiris that referenced this issue Sep 6, 2023
Without this change, if a replica was delete when the node was down
(and therefore it wasn't immediately deleted), orphaned folders would
pile up. With this change, when the node is started again, it will
clean up the unneeded folder.

Part of rabbitmq/rabbitmq-server#9282
mkuratczyk added a commit that referenced this issue Sep 6, 2023
Otherwise we can't forget replicas on nodes that are no longer
cluster members.

Fixes #9282
@kjnilsson
Copy link
Contributor Author

Extra note: I was able to add a replica on some affected queues. But some of them are failing adding a replica:

Adding a replica for queue q-agent-notifier-l2population-update_fanout on node rabbit@rabbit1...
Error:
{:disallowed, :out_of_sync_replica}

I have no idea if this could also be related to the same bug...

This is part of a protection check that disallows adding another replica if there is already one configured that is too far behind on replication

mkuratczyk added a commit that referenced this issue Sep 6, 2023
Otherwise we can't forget replicas on nodes that are no longer
cluster members.

Fixes #9282
mkuratczyk added a commit that referenced this issue Sep 6, 2023
Otherwise we can't forget replicas on nodes that are no longer
cluster members.

Fixes #9282
mkuratczyk added a commit that referenced this issue Sep 6, 2023
Otherwise we can't forget replicas on nodes that are no longer
cluster members.

Fixes #9282
mkuratczyk added a commit that referenced this issue Sep 6, 2023
Otherwise we can't forget replicas on nodes that are no longer
cluster members.

Fixes #9282
mkuratczyk added a commit that referenced this issue Sep 11, 2023
Otherwise we can't forget replicas on nodes that are no longer
cluster members.

Fixes #9282
mergify bot pushed a commit that referenced this issue Sep 18, 2023
Otherwise we can't forget replicas on nodes that are no longer
cluster members.

Fixes #9282

(cherry picked from commit 1768694)

# Conflicts:
#	MODULE.bazel
#	deps/rabbit/Makefile
mergify bot pushed a commit that referenced this issue Sep 18, 2023
Otherwise we can't forget replicas on nodes that are no longer
cluster members.

Fixes #9282

(cherry picked from commit 1768694)

# Conflicts:
#	MODULE.bazel
#	deps/rabbit/Makefile
(cherry picked from commit 8ace3d4)

# Conflicts:
#	MODULE.bazel
#	deps/rabbit/Makefile
#	deps/rabbit/src/rabbit_stream_coordinator.erl
#	deps/rabbit/src/rabbit_stream_queue.erl
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants