Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Heartbeat improvements and handling failures during establishing leadership #3890

Merged
merged 9 commits into from
Mar 12, 2018

Conversation

dadgar
Copy link
Contributor

@dadgar dadgar commented Feb 20, 2018

The changes in this PR are a result of going through logs of a cluster quickly transition between leaders. As such there are various semi-related fixes:

  1. When a client detects a change to the node, register the node in status ready after the initial registration (previously would revert back to initializing until the next heartbeat)
  2. If leadership is lost while the deployment watcher is creating an eval, there was no mechanism to no-op the evaluation creation.
  3. Guard the heartbeat system from adding timers or invalidating timers when it is not the leader.
  4. Revoke leadership if establishing leadership failed. Prior to this, two servers could be "functioning as a leader"!

Fixes #3035
Fixes #3840

Fix a bug in which if the node attributes/meta changed, we would
re-register the node in status initializing. This would incorrectly
trigger the client to log that it missed its heartbeat.

It would change the status of the Node to initializing until the next
heartbeat occured.
@dadgar dadgar requested a review from armon February 20, 2018 21:06
@dadgar
Copy link
Contributor Author

dadgar commented Feb 21, 2018

Needs hashicorp/consul#3908 to be merged so it can be vendored

// the race in which leadership is lost but a timer is created on this
// server since it was servicing an RPC during a leadership loss.
if !s.IsLeader() {
return
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we log a message here, just to indicate that this happened?

@armon
Copy link
Member

armon commented Feb 21, 2018

LGTM!

@github-actions
Copy link

I'm going to lock this pull request because it has been closed for 120 days ⏳. This helps our maintainers find and focus on the active contributions.
If you have found a problem that seems related to this change, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Mar 11, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
3 participants