Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Periodic healthcheck improvements #46

Merged
merged 1 commit into from
Sep 7, 2023

Conversation

ahreehong
Copy link
Member

Issue #, if available:

Description of changes:

  1. Fix bug where if machine passes healthcheck, is not deleted from queue to be deleted. Add delete(currClusterHFConfig.unhealthyMembersToRemove, endpoint) to line 150 in periodic_healthcheck.
  2. Limit number of machines to be deleted in healthcheck to delete one machine at a time and always wait for the new replica to be created before deleting another one. This should be easy to do by just checking what's the desired number of replicas.
  3. To make sure we preserve quorum, check if it's safe to remove a machine as a member: currentTotalMembers - unhealthyMembers >= currentTotalMembers/2 + 1. Here currentTotalMembers and unhealthyMembers needs to take into account all machines that part of the etcd cluster and not only the owned ones.
  4. Perform healthchecks iterating over the all machines instead of status.endpoints. Endpoints in the status might not always be updated. In fact with the way the code is structured, the status is not updated until all the machines marked for deletion are successfully deleted. Ignore machines in bootstrapping phase.

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

@ahreehong ahreehong changed the title Periodic healthcheck fix Periodic healthcheck fixes Sep 6, 2023
@ahreehong ahreehong force-pushed the periodic-healthcheck-fix branch 2 times, most recently from 19c4d4e to b1c8f79 Compare September 6, 2023 18:57
@ahreehong ahreehong changed the title Periodic healthcheck fixes Periodic healthcheck improvements Sep 6, 2023
@ahreehong ahreehong force-pushed the periodic-healthcheck-fix branch 3 times, most recently from d28dfec to 7664c1c Compare September 7, 2023 00:02
@ahreehong ahreehong force-pushed the periodic-healthcheck-fix branch 3 times, most recently from 2f7a74f to e5dd4d4 Compare September 7, 2023 01:53
@ahreehong ahreehong force-pushed the periodic-healthcheck-fix branch 2 times, most recently from 287dfba to 585f886 Compare September 7, 2023 02:37
Copy link
Member

@abhinavmpandey08 abhinavmpandey08 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/approve
/lgtm
/woof
/goaty-goat-goat

@ahreehong ahreehong merged commit 8d827d1 into aws:main Sep 7, 2023
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants