-
Notifications
You must be signed in to change notification settings - Fork 592
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
CI Failure ("can't fetch stable replicas") in PartitionMoveInterruption.test_cancelling_partition_move
#9243
Comments
So this is pretty interesting (dissecting logs for build 30615). Looks like a very subtle bug in force-cancellation. Background:
First order symptom is that force-cancellation was finished, but leadership info diverged between nodes, namely node docker-rp-18 thought that the leader was 4:
while everyone else thought that the leader was 2 (and it was true on the raft level). The reason was that an entry in How did it happen? Approximate timeline:
Now, I guess the immediate symptom can be fixed by something like #9300 but the fact that the configuration revision goes backwards doesn't look right. I think the proper fix is to wait with issuing the |
@ztlpn coming in hot with the analysis. |
When forcibly aborting reconfiguration we should wait for the new leader to be elected in the configuration that the partition was forced to. This way we can be certain that the new configuration will finally be replicated to the majority of nodes even tough the leader may not exists at the time when configuration is replicated. Fixes: redpanda-data#9243 Signed-off-by: Michal Maslanka <michal@redpanda.com>
When forcibly aborting reconfiguration we should wait for the new leader to be elected in the configuration that the partition was forced to. This way we can be certain that the new configuration will finally be replicated to the majority of nodes even tough the leader may not exists at the time when configuration is replicated. Fixes: redpanda-data#9243 Signed-off-by: Michal Maslanka <michal@redpanda.com>
When forcibly aborting reconfiguration we should wait for the new leader to be elected in the configuration that the partition was forced to. This way we can be certain that the new configuration will finally be replicated to the majority of nodes even tough the leader may not exists at the time when configuration is replicated. Fixes: redpanda-data#9243 Signed-off-by: Michal Maslanka <michal@redpanda.com>
When forcibly aborting reconfiguration we should wait for the new leader to be elected in the configuration that the partition was forced to. This way we can be certain that the new configuration will finally be replicated to the majority of nodes even tough the leader may not exists at the time when configuration is replicated. Fixes: redpanda-data#9243 Signed-off-by: Michal Maslanka <michal@redpanda.com> (cherry picked from commit 3948312)
When forcibly aborting reconfiguration we should wait for the new leader to be elected in the configuration that the partition was forced to. This way we can be certain that the new configuration will finally be replicated to the majority of nodes even tough the leader may not exists at the time when configuration is replicated. Fixes: redpanda-data#9243 Signed-off-by: Michal Maslanka <michal@redpanda.com> (cherry picked from commit 3948312)
When forcibly aborting reconfiguration we should wait for the new leader to be elected in the configuration that the partition was forced to. This way we can be certain that the new configuration will finally be replicated to the majority of nodes even tough the leader may not exists at the time when configuration is replicated. Fixes: redpanda-data#9243 Signed-off-by: Michal Maslanka <michal@redpanda.com> (cherry picked from commit 3948312)
When forcibly aborting reconfiguration we should wait for the new leader to be elected in the configuration that the partition was forced to. This way we can be certain that the new configuration will finally be replicated to the majority of nodes even tough the leader may not exists at the time when configuration is replicated. Fixes: redpanda-data#9243 Signed-off-by: Michal Maslanka <michal@redpanda.com> (cherry picked from commit 3948312)
When forcibly aborting reconfiguration we should wait for the new leader to be elected in the configuration that the partition was forced to. This way we can be certain that the new configuration will finally be replicated to the majority of nodes even tough the leader may not exists at the time when configuration is replicated. Fixes: redpanda-data#9243 Signed-off-by: Michal Maslanka <michal@redpanda.com> (cherry picked from commit 3948312)
When forcibly aborting reconfiguration we should wait for the new leader to be elected in the configuration that the partition was forced to. This way we can be certain that the new configuration will finally be replicated to the majority of nodes even tough the leader may not exists at the time when configuration is replicated. Fixes: redpanda-data#9243 Signed-off-by: Michal Maslanka <michal@redpanda.com> (cherry picked from commit 3948312)
When forcibly aborting reconfiguration we should wait for the new leader to be elected in the configuration that the partition was forced to. This way we can be certain that the new configuration will finally be replicated to the majority of nodes even tough the leader may not exists at the time when configuration is replicated. Fixes: redpanda-data#9243 Signed-off-by: Michal Maslanka <michal@redpanda.com> (cherry picked from commit 3948312)
When forcibly aborting reconfiguration we should wait for the new leader to be elected in the configuration that the partition was forced to. This way we can be certain that the new configuration will finally be replicated to the majority of nodes even tough the leader may not exists at the time when configuration is replicated. Fixes: redpanda-data#9243 Signed-off-by: Michal Maslanka <michal@redpanda.com> (cherry picked from commit 3948312)
When forcibly aborting reconfiguration we should wait for the new leader to be elected in the configuration that the partition was forced to. This way we can be certain that the new configuration will finally be replicated to the majority of nodes even tough the leader may not exists at the time when configuration is replicated. Fixes: redpanda-data#9243 Signed-off-by: Michal Maslanka <michal@redpanda.com> (cherry picked from commit 3948312)
When forcibly aborting reconfiguration we should wait for the new leader to be elected in the configuration that the partition was forced to. This way we can be certain that the new configuration will finally be replicated to the majority of nodes even tough the leader may not exists at the time when configuration is replicated. Fixes: redpanda-data#9243 Signed-off-by: Michal Maslanka <michal@redpanda.com> (cherry picked from commit 3948312)
https://buildkite.com/redpanda/redpanda/builds/24275#0186a0e1-3750-40f1-8228-0be893ddf7dc
The text was updated successfully, but these errors were encountered: