-
Notifications
You must be signed in to change notification settings - Fork 24.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Handle master failure in NodeSeenService #77220
Handle master failure in NodeSeenService #77220
Conversation
NodeSeenService can miss seeing nodes if the master changes while it's processing the cluster state update which adds the nodes to the cluster. This caused occasional test failures in the test intended to check that NodeSeenService is working as intended. This commit adjusts NodeSeenService's early returns to ensure that, if the master changed, the new master checks for seen nodes even if nodes were not added in that particular cluster state update.
Pinging @elastic/es-core-infra (Team:Core/Infra) |
Prior to the change to |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks for fixing this
@dakrone I realized that I was missing an |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for clarifying it by pulling it into a variable, after removing the logging it LGTM
final boolean thisNodeJustBecameMaster = event.previousState().nodes().isLocalNodeElectedMaster() == false | ||
&& event.state().nodes().isLocalNodeElectedMaster(); | ||
if ((event.nodesAdded() || thisNodeJustBecameMaster) == false) { | ||
logger.error("GWB> Bailing early"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should probably remove this line :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Whoops, I had those in there to help verify my reading that the == false
was missing 🤦
Removed.
@@ -67,6 +70,7 @@ public void clusterChanged(ClusterChangedEvent event) { | |||
.collect(Collectors.toUnmodifiableSet()); | |||
|
|||
if (nodesNotPreviouslySeen.isEmpty() == false) { | |||
logger.error("GWB> Submitting update task for nodes [{}]", nodesNotPreviouslySeen); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And this one :)
@@ -86,6 +90,7 @@ public ClusterState execute(ClusterState currentState) throws Exception { | |||
|
|||
final NodesShutdownMetadata newNodesMetadata = new NodesShutdownMetadata(newShutdownMetadataMap); | |||
if (newNodesMetadata.equals(currentShutdownMetadata)) { | |||
logger.error("GWB> Bailing update task as it's a no-op"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And this one :)
@elasticmachine update branch |
* Handle master failure in NodeSeenService NodeSeenService can miss seeing nodes if the master changes while it's processing the cluster state update which adds the nodes to the cluster. This caused occasional test failures in the test intended to check that NodeSeenService is working as intended. This commit adjusts NodeSeenService's early returns to ensure that, if the master changed, the new master checks for seen nodes even if nodes were not added in that particular cluster state update. * Clarify & correct "just-became-master" check * Remove leftover debug logging (d'oh!) Co-authored-by: Elastic Machine <elasticmachine@users.noreply.github.com>
* Handle master failure in NodeSeenService NodeSeenService can miss seeing nodes if the master changes while it's processing the cluster state update which adds the nodes to the cluster. This caused occasional test failures in the test intended to check that NodeSeenService is working as intended. This commit adjusts NodeSeenService's early returns to ensure that, if the master changed, the new master checks for seen nodes even if nodes were not added in that particular cluster state update. * Clarify & correct "just-became-master" check * Remove leftover debug logging (d'oh!) Co-authored-by: Elastic Machine <elasticmachine@users.noreply.github.com>
* Handle master failure in NodeSeenService NodeSeenService can miss seeing nodes if the master changes while it's processing the cluster state update which adds the nodes to the cluster. This caused occasional test failures in the test intended to check that NodeSeenService is working as intended. This commit adjusts NodeSeenService's early returns to ensure that, if the master changed, the new master checks for seen nodes even if nodes were not added in that particular cluster state update. * Clarify & correct "just-became-master" check * Remove leftover debug logging (d'oh!) Co-authored-by: Elastic Machine <elasticmachine@users.noreply.github.com> Co-authored-by: Elastic Machine <elasticmachine@users.noreply.github.com>
* Handle master failure in NodeSeenService NodeSeenService can miss seeing nodes if the master changes while it's processing the cluster state update which adds the nodes to the cluster. This caused occasional test failures in the test intended to check that NodeSeenService is working as intended. This commit adjusts NodeSeenService's early returns to ensure that, if the master changed, the new master checks for seen nodes even if nodes were not added in that particular cluster state update. * Clarify & correct "just-became-master" check * Remove leftover debug logging (d'oh!) Co-authored-by: Elastic Machine <elasticmachine@users.noreply.github.com> Co-authored-by: Elastic Machine <elasticmachine@users.noreply.github.com>
NodeSeenService can miss seeing nodes if the master changes while it's
processing the cluster state update which adds the nodes to the cluster.
This caused occasional test failures in the test intended to check that
NodeSeenService is working as intended.
This commit adjusts NodeSeenService's early returns to ensure that, if
the master changed, the new master checks for seen nodes even if nodes
were not added in that particular cluster state update.
Follow-up to #75750
Fixes #76689