Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Wait on readiness service on start #102325

Open
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

rjernst
Copy link
Member

@rjernst rjernst commented Nov 16, 2023

Node.start() currently does not return until the node is serving http. However, other parts of the system may not be quite ready. For example, file settings may not have been applied, and we may not yet have a master node. The readiness service's purpose is to identify when the node is actually ready for serving requests.

This commit adjusts the end of Node start to wait on the readiness service being ready. The ramifications are that the main Elasticsearch thread will not exit until the node is actually ready to serve requests, and the cli will not exit (when in daemon mode) until the node is ready.

Node.start() currently does not return until the node is serving http.
However, other parts of the system may not be quite ready. For example,
file settings may not have been applied, and we may not yet have a
master node. The readiness service's purpose is to identify when the
node is actually ready for serving requests.

This commit adjusts the end of Node start to wait on the readiness
service being ready. The ramifications are that the main Elasticsearch
thread will not exit until the node is actually ready to serve requests,
and the cli will not exit (when in daemon mode) until the node is ready.
@rjernst rjernst added >refactoring :Core/Infra/Node Lifecycle Node startup, bootstrapping, and shutdown labels Nov 16, 2023
@rjernst rjernst requested a review from a team November 16, 2023 23:48
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-core-infra (Team:Core/Infra)

@elasticsearchmachine elasticsearchmachine added v8.12.0 Team:Core/Infra Meta label for core/infra team labels Nov 16, 2023
Copy link
Contributor

@ldematte ldematte left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, only a minor Q

CountDownLatch ready = new CountDownLatch(1);
readinessService.addBoundAddressListener(address -> ready.countDown());
try {
ready.await();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We are not introducing a timeout here - I suppose there are plenty of other places where we already timeout so there is no need for another one here, correct?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep exactly, there are other timeouts on startup eg systemd has a 75 second default timeout. Right now we will actually report to systemd that the node is ready before the readiness probe responds true, but this change will align them.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Core/Infra/Node Lifecycle Node startup, bootstrapping, and shutdown >refactoring Team:Core/Infra Meta label for core/infra team v9.0.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants