-
Notifications
You must be signed in to change notification settings - Fork 24.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix NOT_STARTED
statuses appearing inappropirately during node shutdown
#75750
Conversation
NOT_STARTED
statuses appearing when they shouldn't during node shutdownNOT_STARTED
statuses appearing inappropirately during node shutdown
Pinging @elastic/es-core-infra (Team:Core/Infra) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks Gordon, I left a few comments but nothing major
import java.util.function.Function; | ||
import java.util.stream.Collectors; | ||
|
||
public class ShutdownService implements ClusterStateListener { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you add javadocs to this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Now that I look at it, this class name is a little strange since this service doesn't actually do anything related to shutting down the nodes.
I wonder if it would be better named something like NodeSeenService
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Renamed! Good call, I didn't really think about the naming at the time.
x-pack/plugin/shutdown/src/main/java/org/elasticsearch/xpack/shutdown/ShutdownService.java
Outdated
Show resolved
Hide resolved
.values() | ||
.stream() | ||
.map(singleNodeShutdownMetadata -> { | ||
if (nodesNotPreviouslySeen.contains(singleNodeShutdownMetadata.getNodeId())) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should recalculate the nodesNotPreviouslySeen
since there might be other changes since the time this cluster state update task actually runs
return singleNodeShutdownMetadata; | ||
}) | ||
.collect(Collectors.toUnmodifiableMap(SingleNodeShutdownMetadata::getNodeId, Function.identity())); | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it be worth it to add if (newShutdownMetadataMap.equals(shutdownMetadata.getAllNodeMetadataMap()) { return currentState; }
to this so we avoid work if not necessary?
Build failures are both timeouts in tests that appear unrelated and don't reproduce locally. @elasticmachine run elasticsearch-ci/bwc |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, I left one pretty minor comment
import java.util.function.Function; | ||
import java.util.stream.Collectors; | ||
|
||
public class ShutdownService implements ClusterStateListener { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Now that I look at it, this class name is a little strange since this service doesn't actually do anything related to shutting down the nodes.
I wonder if it would be better named something like NodeSeenService
?
Build failures are due to a compilation error on @elasticmachine run elasticsearch-ci/bwc |
…down (elastic#75750) This PR fixes two situations where `NOT_STARTED` can appear as the shard migration status inappropriately: 1. When the node is actually shut down after having all the shards migrate away. 2. When a non-data-node is registered for shutdown. It also adds tests to ensure these cases are handled correctly.
💚 Backport successful
|
…down (#75750) (#76634) * Fix `NOT_STARTED` statuses appearing inappropirately during node shutdown (#75750) This PR fixes two situations where `NOT_STARTED` can appear as the shard migration status inappropriately: 1. When the node is actually shut down after having all the shards migrate away. 2. When a non-data-node is registered for shutdown. It also adds tests to ensure these cases are handled correctly. * Fix compilation for backport * Fix compilation for backport
…down (elastic#75750) (elastic#76634) * Fix `NOT_STARTED` statuses appearing inappropirately during node shutdown (elastic#75750) This PR fixes two situations where `NOT_STARTED` can appear as the shard migration status inappropriately: 1. When the node is actually shut down after having all the shards migrate away. 2. When a non-data-node is registered for shutdown. It also adds tests to ensure these cases are handled correctly. * Fix compilation for backport * Fix compilation for backport
…down (#75750) (#76634) (#76669) * Fix `NOT_STARTED` statuses appearing inappropirately during node shutdown (#75750) This PR fixes two situations where `NOT_STARTED` can appear as the shard migration status inappropriately: 1. When the node is actually shut down after having all the shards migrate away. 2. When a non-data-node is registered for shutdown. It also adds tests to ensure these cases are handled correctly. * Fix compilation for backport * Fix compilation for backport
This PR fixes two situations where
NOT_STARTED
can appear as the shard migration status inappropriately:It also adds tests to ensure these cases are handled correctly.
Follow-up to #75606