-
Notifications
You must be signed in to change notification settings - Fork 1.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fail weight update when decommission ongoing and fail decommission when attribute not weighed away #4839
Fail weight update when decommission ongoing and fail decommission when attribute not weighed away #4839
Changes from all commits
81f3a69
49c4bf2
abf33ce
0bcbac4
4bfd385
05eb55a
042fa3a
eafe468
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -19,6 +19,8 @@ | |
import org.opensearch.cluster.ClusterState; | ||
import org.opensearch.cluster.ClusterStateUpdateTask; | ||
import org.opensearch.cluster.ack.ClusterStateUpdateResponse; | ||
import org.opensearch.cluster.decommission.DecommissionAttributeMetadata; | ||
import org.opensearch.cluster.decommission.DecommissionStatus; | ||
import org.opensearch.cluster.metadata.Metadata; | ||
import org.opensearch.cluster.metadata.WeightedRoutingMetadata; | ||
import org.opensearch.cluster.routing.allocation.decider.AwarenessAllocationDecider; | ||
|
@@ -68,6 +70,8 @@ public void registerWeightedRoutingMetadata( | |
clusterService.submitStateUpdateTask("update_weighted_routing", new ClusterStateUpdateTask(Priority.URGENT) { | ||
@Override | ||
public ClusterState execute(ClusterState currentState) { | ||
// verify currently no decommission action is ongoing | ||
ensureNoOngoingDecommissionAction(currentState); | ||
Metadata metadata = currentState.metadata(); | ||
Metadata.Builder mdBuilder = Metadata.builder(currentState.metadata()); | ||
WeightedRoutingMetadata weightedRoutingMetadata = metadata.custom(WeightedRoutingMetadata.TYPE); | ||
|
@@ -154,4 +158,15 @@ public void verifyAwarenessAttribute(String attributeName) { | |
throw validationException; | ||
} | ||
} | ||
|
||
public void ensureNoOngoingDecommissionAction(ClusterState state) { | ||
DecommissionAttributeMetadata decommissionAttributeMetadata = state.metadata().decommissionAttributeMetadata(); | ||
if (decommissionAttributeMetadata != null && decommissionAttributeMetadata.status().equals(DecommissionStatus.FAILED) == false) { | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. If decommission status is FAILED can certain nodes be out of service or decommissioned while others not as in partially failed? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. No, if the decommission is There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. But then the zone isn't fully equipped to take traffic right like 90% nodes are out of service and we place weights 10:1:1 when 10 is for the zone only operating at 10% capacity? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. No |
||
throw new IllegalStateException( | ||
"a decommission action is ongoing with status [" | ||
+ decommissionAttributeMetadata.status().status() | ||
+ "], cannot update weight during this state" | ||
); | ||
} | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What if status is
SUCCESSFUL
?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Only when Decommission has failed, we can allow the weight update can go through. Not for any other status
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Isn't
SUCCESSFUL
a final state? The error message of thisif
condition says:a decommission action is ongoing with status...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A succesful decommission state means the transport execution has completed. The nodes are decommissioned in this state (not able to start pre voting or join). Hence DecommissionAction ongoing means a zone is decommissioned and the cluster rejects pings from decommissioned node