Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix NOT_STARTED statuses appearing inappropirately during node shutdown #75750

Merged
merged 23 commits into from
Aug 18, 2021

Conversation

gwbrown
Copy link
Contributor

@gwbrown gwbrown commented Jul 27, 2021

This PR fixes two situations where NOT_STARTED can appear as the shard migration status inappropriately:

  1. When the node is actually shut down after having all the shards migrate away.
  2. When a non-data-node is registered for shutdown.

It also adds tests to ensure these cases are handled correctly.

Follow-up to #75606

@gwbrown gwbrown added >non-issue v8.0.0 :Core/Infra/Node Lifecycle Node startup, bootstrapping, and shutdown v7.15.0 labels Jul 27, 2021
@gwbrown gwbrown changed the title Fix NOT_STARTED statuses appearing when they shouldn't during node shutdown Fix NOT_STARTED statuses appearing inappropirately during node shutdown Jul 27, 2021
@gwbrown gwbrown marked this pull request as ready for review August 12, 2021 22:49
@elasticmachine elasticmachine added the Team:Core/Infra Meta label for core/infra team label Aug 12, 2021
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-core-infra (Team:Core/Infra)

@gwbrown gwbrown requested a review from dakrone August 12, 2021 22:49
@gwbrown gwbrown removed the request for review from dakrone August 12, 2021 22:52
@gwbrown gwbrown requested a review from dakrone August 12, 2021 23:01
Copy link
Member

@dakrone dakrone left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks Gordon, I left a few comments but nothing major

import java.util.function.Function;
import java.util.stream.Collectors;

public class ShutdownService implements ClusterStateListener {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add javadocs to this?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now that I look at it, this class name is a little strange since this service doesn't actually do anything related to shutting down the nodes.

I wonder if it would be better named something like NodeSeenService?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Renamed! Good call, I didn't really think about the naming at the time.

.values()
.stream()
.map(singleNodeShutdownMetadata -> {
if (nodesNotPreviouslySeen.contains(singleNodeShutdownMetadata.getNodeId())) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should recalculate the nodesNotPreviouslySeen since there might be other changes since the time this cluster state update task actually runs

return singleNodeShutdownMetadata;
})
.collect(Collectors.toUnmodifiableMap(SingleNodeShutdownMetadata::getNodeId, Function.identity()));

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it be worth it to add if (newShutdownMetadataMap.equals(shutdownMetadata.getAllNodeMetadataMap()) { return currentState; } to this so we avoid work if not necessary?

@gwbrown
Copy link
Contributor Author

gwbrown commented Aug 13, 2021

Build failures are both timeouts in tests that appear unrelated and don't reproduce locally.

@elasticmachine run elasticsearch-ci/bwc
@elasticmachine run elasticsearch-ci/part-2

@gwbrown gwbrown requested a review from dakrone August 17, 2021 21:34
Copy link
Member

@dakrone dakrone left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, I left one pretty minor comment

import java.util.function.Function;
import java.util.stream.Collectors;

public class ShutdownService implements ClusterStateListener {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now that I look at it, this class name is a little strange since this service doesn't actually do anything related to shutting down the nodes.

I wonder if it would be better named something like NodeSeenService?

@gwbrown gwbrown added the auto-backport Automatically create backport pull requests when merged label Aug 17, 2021
@gwbrown
Copy link
Contributor Author

gwbrown commented Aug 18, 2021

Build failures are due to a compilation error on 7.x earlier today, which has now been fixed.

@elasticmachine run elasticsearch-ci/bwc
@elasticmachine run elasticsearch-ci/part-2

@gwbrown gwbrown merged commit bc5295b into elastic:master Aug 18, 2021
gwbrown added a commit to gwbrown/elasticsearch that referenced this pull request Aug 18, 2021
…down (elastic#75750)

This PR fixes two situations where `NOT_STARTED` can appear as the shard migration status inappropriately:
1. When the node is actually shut down after having all the shards migrate away.
2. When a non-data-node is registered for shutdown.

It also adds tests to ensure these cases are handled correctly.
@elasticsearchmachine
Copy link
Collaborator

💚 Backport successful

Status Branch Result
7.x

elasticsearchmachine pushed a commit that referenced this pull request Aug 18, 2021
…down (#75750) (#76634)

* Fix `NOT_STARTED` statuses appearing inappropirately during node shutdown (#75750)

This PR fixes two situations where `NOT_STARTED` can appear as the shard migration status inappropriately:
1. When the node is actually shut down after having all the shards migrate away.
2. When a non-data-node is registered for shutdown.

It also adds tests to ensure these cases are handled correctly.

* Fix compilation for backport

* Fix compilation for backport
gwbrown added a commit to gwbrown/elasticsearch that referenced this pull request Aug 18, 2021
…down (elastic#75750) (elastic#76634)

* Fix `NOT_STARTED` statuses appearing inappropirately during node shutdown (elastic#75750)

This PR fixes two situations where `NOT_STARTED` can appear as the shard migration status inappropriately:
1. When the node is actually shut down after having all the shards migrate away.
2. When a non-data-node is registered for shutdown.

It also adds tests to ensure these cases are handled correctly.

* Fix compilation for backport

* Fix compilation for backport
elasticsearchmachine pushed a commit that referenced this pull request Aug 18, 2021
…down (#75750) (#76634) (#76669)

* Fix `NOT_STARTED` statuses appearing inappropirately during node shutdown (#75750)

This PR fixes two situations where `NOT_STARTED` can appear as the shard migration status inappropriately:
1. When the node is actually shut down after having all the shards migrate away.
2. When a non-data-node is registered for shutdown.

It also adds tests to ensure these cases are handled correctly.

* Fix compilation for backport

* Fix compilation for backport
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
auto-backport Automatically create backport pull requests when merged :Core/Infra/Node Lifecycle Node startup, bootstrapping, and shutdown >non-issue Team:Core/Infra Meta label for core/infra team v7.15.0 v8.0.0-alpha2
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants