Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Control plane healthchecks #1902

Closed
randomvariable opened this issue Dec 16, 2019 · 13 comments
Closed

Control plane healthchecks #1902

randomvariable opened this issue Dec 16, 2019 · 13 comments
Assignees
Labels
area/control-plane Issues or PRs related to control-plane lifecycle management lifecycle/active Indicates that an issue or PR is actively being worked on by a contributor. priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release.
Milestone

Comments

@randomvariable
Copy link
Member

Implement control plane healthchecks as part of #1756

@randomvariable
Copy link
Member Author

/assign
/lifecycle active

@k8s-ci-robot k8s-ci-robot added the lifecycle/active Indicates that an issue or PR is actively being worked on by a contributor. label Dec 16, 2019
@ncdc ncdc added this to the v0.3.0 milestone Dec 18, 2019
@ncdc ncdc added priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. area/control-plane Issues or PRs related to control-plane lifecycle management labels Dec 18, 2019
@dlipovetsky
Copy link
Contributor

@randomvariable Are you creating some etcd utils as part of this? The reason I ask is that, the control plane controller needs to talk to etcd to remove a member as part of deleting a control plane replica. I'd like that to talk to etcd using the same mechanism as the healthchecks. The control plane CAEP mentions two options

Running PodExec etcdctl, or port-forwarding to etcd to get etcd cluster health information

Do you have a preference? I think pod exec'ing makes it easier to access the necessary certificates, but port-forwarding lets us avoid shelling out to etcdctl.

@ncdc
Copy link
Contributor

ncdc commented Jan 8, 2020

@dlipovetsky I believe @randomvariable is planning on using port-forward, and he's working on a library and the plan is to open up a PR real soon.

@randomvariable
Copy link
Member Author

randomvariable commented Jan 29, 2020

/remove lifecycle-active

Currently working on kubernetes-sigs/cluster-api-provider-aws#1490
in case someone else wants to finish of consuming #2031 and #2030

@randomvariable
Copy link
Member Author

/remove lifecycle-active

@chuckha
Copy link
Contributor

chuckha commented Jan 29, 2020

@randomvariable I can take this! Why was #2031 closed? It seems to be closed with no note

@randomvariable
Copy link
Member Author

Oh, I thought I left a comment. Mainly was going to a single PR with the consumption included rather than having it as an abstract package the API of which might need to be changed.

@chuckha
Copy link
Contributor

chuckha commented Jan 29, 2020

ack

/assign

@randomvariable
Copy link
Member Author

@dlipovetsky had some additional comments in that because none of the alarms actually report connectivity, you can still have a network partition and not get an error.
Suggested to do as etcdctl does, and issue a get to a known key.

@vincepri
Copy link
Member

@chuckha @randomvariable Can this be closed in favor of #2243?

@chuckha
Copy link
Contributor

chuckha commented Feb 12, 2020

whoops, yep, duplicate, replaced by #2243

/closing

@chuckha
Copy link
Contributor

chuckha commented Feb 12, 2020

/close

😑

@k8s-ci-robot
Copy link
Contributor

@chuckha: Closing this issue.

In response to this:

/close

😑

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/control-plane Issues or PRs related to control-plane lifecycle management lifecycle/active Indicates that an issue or PR is actively being worked on by a contributor. priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release.
Projects
None yet
Development

No branches or pull requests

6 participants