Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rolling update puts nodes into "not ready" #4946

Closed
recollir opened this issue Apr 9, 2018 · 29 comments
Closed

Rolling update puts nodes into "not ready" #4946

recollir opened this issue Apr 9, 2018 · 29 comments
Labels
lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed.

Comments

@recollir
Copy link
Contributor

recollir commented Apr 9, 2018

  1. Kops version 1.8.0

  2. Kubernetes version 1.8.6

  3. AWS (3 masters and 3 nodes)

  4. kops edit followed by kops update and kops rolling-update. kops edit to add configuration flags for the apiserver (dex related). Also tried kops rolling-update --instance-group <master...> to only update one master at a time.

  5. Nodes become "not ready" in an unpredictable way. Sometimes no node is affected. Sometimes one node becomes "not ready" and recovers after a few minutes. Sometimes all nodes are "not ready" for a longer period. Up to 15 minutes. While the masters report ready. During this time the workload on the cluster is not accessible.

  6. Nothing: a non-breaking rolling update without affecting nodes or the workload.

  7. Starting config: https://gist.github.com/recollir/9e9b4b0b426ef77014083f1839c123d6
    Added via kops edit before the rolliing-update: https://gist.github.com/recollir/da9fd8a123b58f555f2e4321093e9d46

  8. https://gist.github.com/recollir/5b19d543adaa50b1889aabafeb77b847

  9. A couple of times I observed that after the rolling update the ELB for the API server was missing AZ attached to it.

@johanneswuerbach
Copy link
Contributor

In our case manually restarting the kubelet helped, do you have the logs of an affected node?

@recollir
Copy link
Contributor Author

recollir commented Apr 9, 2018

Not from the current test runs (cluster has been deleted and created a couple of times). But I can recreate this, keep the logs and attach them to this issue.

Nevertheless, I would be interested how to prevent this from happening. We need to do some changes to a production cluster where a restart of a kubelet seems rather inappropriate.

@johanneswuerbach
Copy link
Contributor

johanneswuerbach commented Apr 9, 2018

Understandably, we are also started seeing this upgrading to kubernetes 1.8 (1.8.10 currently) and I’m currently debugging what could cause this.

It looks like in our case the kubelet tries to connect to an old API server IP so either its caching the dns resolution somehow too long (TTL should only allow 60s) or the record wasn’t updated correctly.

@recollir
Copy link
Contributor Author

recollir commented Apr 9, 2018

Thanks for the pointing this out. I will see if this is the same for us first thing in the morning. My TZ is CEST.

@justinsb
Copy link
Member

justinsb commented Apr 9, 2018

Thanks for reporting & sorry about the problem.

Was this with a gossip DNS (.k8s.local) or a "real" Route53 DNS name?

@johanneswuerbach
Copy link
Contributor

johanneswuerbach commented Apr 9, 2018

In our case a "real" Route53 record. Also restarting the kubelet almost immediately fixed the issue, while just waiting took up to 15mins for the node to be marked as Ready again.

We are running kops 1.9.0-beta.2.

@recollir
Copy link
Contributor Author

recollir commented Apr 9, 2018

Real Route53 DNS name.

@recollir
Copy link
Contributor Author

@johanneswuerbach seems the same for us - kubelet trying to connect to an old API server IP. Trying to verify this now.

@johanneswuerbach
Copy link
Contributor

Could you check whether the internal master DNS contains the IPs of the new masters or is still returning an old one?

@johanneswuerbach
Copy link
Contributor

johanneswuerbach commented Apr 10, 2018

We also hit this on another node again:

kubelet[1270]: E0410 13:01:58.506984    1270 kubelet_node_status.go:390] Error updating node status, will retry: error getting node "ip-xxx.ec2.internal": Get https://api.internal.xxx/api/v1/nodes/ip-xxx.ec2.internal?resourceVersion=0: net/http: request canceled (Client.Timeout exceeded while awaiting headers)
kubelet[1270]: E0410 13:02:08.507374    1270 kubelet_node_status.go:390] Error updating node status, will retry: error getting node "ip-xxx.ec2.internal": Get https://api.internal.xxx/api/v1/nodes/ip-xxx.ec2.internal: net/http: request canceled (Client.Timeout exceeded while awaiting headers)
and eventually
kubelet[1270]: E0410 13:02:38.508388    1270 kubelet_node_status.go:382] Unable to update node status: update node status exceeds retry count

The IP is the IP of the node itself.

@recollir
Copy link
Contributor Author

That is exactly the error msg I see. It starts to appear when the old IP address is removed from the A record for api.internal.xxx and the new IP address for the new master is added. Sometimes after the first master, sometimes after the second master.

@lkysow
Copy link

lkysow commented Apr 10, 2018

It's probably due to: kubernetes/kubernetes#41916 (comment) where the kubelet caches the IP of the old master nodes. That's why a restart fixes it.

@sstarcher
Copy link
Contributor

I had the same issue today with 1.9.0-beta-2

@lkysow
Copy link

lkysow commented Apr 10, 2018

I think the best practice is to set up an internal ELB that fronts the masters and have the API url point to that, the same way it's done for the external api. Is that possible with kops right now?

@recollir
Copy link
Contributor Author

I don’t think so. The type load balancer for the api in the spec refers to the client (kubectl) AFAIK. At least in a quick test I still got DNS round robin based entries for the api that the kubelet used.

I think as well an ELB for the kubelet to connect to would be the “right” way to go. At least it is the way kubeadm does HA nowadays (if though manually still). The ELB would detect that a master is gone through the health check, break the connection and force the kubelet to reconnect, wouldn’t it?

What would be needed? How much work would it be? Are there any pointers to start? I wouldn’t mind to give it a try. But would need instructions.

@sstarcher
Copy link
Contributor

sstarcher commented Apr 12, 2018

Just did a rolling update from 1.9.0-beta-2 to 1.9.0 and the same issue all of my nodes go from Ready to not ready.

@chrislovecnm @justinsb have you tried a master rolling update with 1.9.0 by chance on AWS?

The first time I ever noticed this issue was with 1.9.0-beta-2, but all nodes go into Not Ready which takes down every service in the cluster.

@sstarcher
Copy link
Contributor

I can confirm it happens for 15mins or until the kubelet is restarted.

@lkysow
Copy link

lkysow commented Apr 12, 2018

15m is what is expected for the issue with kubelet caching IPs: kubernetes/kubernetes#41916 (comment)

@recollir
Copy link
Contributor Author

Just realised that the same problem affects the kube-proxy, btw.

@recollir
Copy link
Contributor Author

Our current hypothesis for a workaround is to create new temp nodes and lock these to only one of the masters by overriding the dns name of the api server in /etc/hosts. Then migrate all the pods to these new temp nodes by draining the old nodes. These will free up two of the master nodes for a rolling update without causing interruption due to the old nodes becoming “not ready”. Once the 2 masters are done the old nodes can be lock to the new 2 masters and the pods moved back to them. Freeing up master 3 for a rolling update. And finally the temp nodes can be deleted. Cumbersome and ugly... but it works.

Nevertheless we should consider doing the LB for the node to master communication as it is also the nowadays recommended way for doing HA with kubeadm, for example.

@jaredallard
Copy link
Contributor

Ran into this today with a beta environment deploy -- thankfully nothing broke in our production env, but certainly not a good sign... Any detailed fixes for this? kubelet restart sure, but on each node?

@recollir
Copy link
Contributor Author

Just to add another occasion where this can happen: when "updating" from kops 1.8 to kops 1.9 and performing the required rolling-update. As first all masters are restarted/recreated, the nodes can become not ready if kubelet/kube-proxy was/is talking to the corresponding, restarting master.

@zachaller
Copy link
Contributor

zachaller commented May 9, 2018

We are also being hit by this and its causing our own api's to have downtime when the masters come back up from a termination. I do think that putting an elb on the internal api endpoint would help in this case as well.

@mattatcha
Copy link

Looks like a fix has been merged and a pr is open to backport to 1.9
fix: kubernetes/kubernetes#63492
1.9 backport: kubernetes/kubernetes#63832

@recollir
Copy link
Contributor Author

Approved and cherry-picked as well.

@fejta-bot
Copy link

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Sep 23, 2018
@fejta-bot
Copy link

Stale issues rot after 30d of inactivity.
Mark the issue as fresh with /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle rotten

@k8s-ci-robot k8s-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Oct 23, 2018
@fejta-bot
Copy link

Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen.
Mark the issue as fresh with /remove-lifecycle rotten.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/close

@k8s-ci-robot
Copy link
Contributor

@fejta-bot: Closing this issue.

In response to this:

Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen.
Mark the issue as fresh with /remove-lifecycle rotten.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed.
Projects
None yet
Development

No branches or pull requests

10 participants