Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

高可用apiserver中,Node短暂出现Not Ready #257

Closed
nevermore-muyi opened this issue Jul 10, 2018 · 14 comments
Closed

高可用apiserver中,Node短暂出现Not Ready #257

nevermore-muyi opened this issue Jul 10, 2018 · 14 comments

Comments

@nevermore-muyi
Copy link

No description provided.

@nevermore-muyi
Copy link
Author

我在使用两个节点的keepalived和haproxy做高可用时,运行正常;
但是当我断开任意一台机器上的keepalived后,机器上的Node短暂性出现不可用,变成Ready,kubelet访问不到keepalived,但是虚机上是可以curl通的;大约5分钟之后,Node又自动恢复Ready。

@nevermore-muyi
Copy link
Author

如果有遇到相同问题的,一起交流下。

@nevermore-muyi nevermore-muyi changed the title 高可用apiserver中,Node短暂出现Not 高可用apiserver中,Node短暂出现Not Ready Jul 10, 2018
@gjmzj
Copy link
Collaborator

gjmzj commented Jul 10, 2018

keepalived切换按理说很快的,秒级切换的;Node出现NotReady状态,应该第一时间查看故障node的kubelet日志;
如能重现,请补充下部署方式和有关日志

@nevermore-muyi
Copy link
Author

我说下我的环境,我使用了两台节点做了master
ip1: 5.0.0.191 ip2: 5.0.0.0192 vip:5.0.0.200 vipport:8443;
其中5.0.0.191上keepalived是MASTER,5.0.0.192是BACKUP;
开始搭建一切正常,但我stop掉5.0.0.191,也就是MASTER的keepalived时,查看5.0.0.192的keepalived,是会马上切换过来的,通过curl 5.0.0.200:8443也是可以通的,查看kubelet日志一直包连接错误。
等待大概15分钟后,kubelet又自动恢复好了。

如果我把kubelet注册apiserver使用http的方式一切就正常,不会出现Not Ready的情况。

kubelet日志我待会贴上来。

@nevermore-muyi
Copy link
Author

补充下日志:

E0710 20:19:31.039064 2739179 kubelet_node_status.go:390] Error updating node status, will retry: error getting node "5.0.0.191": Get https://5.0.0.200:8443/api/v1/nodes/5.
0.0.191?resourceVersion=0: net/http: request canceled (Client.Timeout exceeded while awaiting headers)
E0710 20:19:41.039385 2739179 kubelet_node_status.go:390] Error updating node status, will retry: error getting node "5.0.0.191": Get https://5.0.0.200:8443/api/v1/nodes/5.
0.0.191: net/http: request canceled (Client.Timeout exceeded while awaiting headers)
E0710 20:19:51.039652 2739179 kubelet_node_status.go:390] Error updating node status, will retry: error getting node "5.0.0.191": Get https://5.0.0.200:8443/api/v1/nodes/5.
0.0.191: net/http: request canceled (Client.Timeout exceeded while awaiting headers)
E0710 20:20:01.039934 2739179 kubelet_node_status.go:390] Error updating node status, will retry: error getting node "5.0.0.191": Get https://5.0.0.200:8443/api/v1/nodes/5.
0.0.191: net/http: request canceled (Client.Timeout exceeded while awaiting headers)
E0710 20:20:11.040250 2739179 kubelet_node_status.go:390] Error updating node status, will retry: error getting node "5.0.0.191": Get https://5.0.0.200:8443/api/v1/nodes/5.
0.0.191: net/http: request canceled (Client.Timeout exceeded while awaiting headers)
E0710 20:20:11.040262 2739179 kubelet_node_status.go:382] Unable to update node status: update node status exceeds retry count
E0710 20:20:31.040757 2739179 kubelet_node_status.go:390] Error updating node status, will retry: error getting node "5.0.0.191": Get https://5.0.0.200:8443/api/v1/nodes/5.
0.0.191?resourceVersion=0: net/http: request canceled (Client.Timeout exceeded while awaiting headers)
E0710 20:20:41.040998 2739179 kubelet_node_status.go:390] Error updating node status, will retry: error getting node "5.0.0.191": Get https://5.0.0.200:8443/api/v1/nodes/5.
0.0.191: net/http: request canceled (Client.Timeout exceeded while awaiting headers)
E0710 20:20:51.041251 2739179 kubelet_node_status.go:390] Error updating node status, will retry: error getting node "5.0.0.191": Get https://5.0.0.200:8443/api/v1/nodes/5.
0.0.191: net/http: request canceled (Client.Timeout exceeded while awaiting headers)
E0710 20:21:01.041544 2739179 kubelet_node_status.go:390] Error updating node status, will retry: error getting node "5.0.0.191": Get https://5.0.0.200:8443/api/v1/nodes/5.
0.0.191: net/http: request canceled (Client.Timeout exceeded while awaiting headers)
E0710 20:21:11.041759 2739179 kubelet_node_status.go:390] Error updating node status, will retry: error getting node "5.0.0.191": Get https://5.0.0.200:8443/api/v1/nodes/5.
0.0.191: net/http: request canceled (Client.Timeout exceeded while awaiting headers)

@nevermore-muyi
Copy link
Author

另外我看到keepalived备节点中没有对haproxy做检测,如果主节点的keepalived挂掉了,应该不能再去检测haproxy的检测状态了吧。

@kobehaha
Copy link

在异常或则切换的时候,你直接到NOT_READY机器上去curl 或则telnet 能通吗?

@nevermore-muyi
Copy link
Author

@kobehaha 是可以通的。 而且kubelet重启下也是可以马上变成Ready状态的。

@nevermore-muyi
Copy link
Author

@gjmzj @kobehaha 这是相应的日志。

[root@hz-test1 ansible]# kubectl get node
NAME STATUS ROLES AGE VERSION
5.0.0.191 NotReady node 7m v1.9.1
5.0.0.192 Ready node 7m v1.9.1
5.0.0.193 Ready node 7m v1.9.1

[root@hz-test1 ansible]# curl -k https://5.0.0.200:8443
{
"kind": "Status",
"apiVersion": "v1",
"metadata": {

},
"status": "Failure",
"message": "Unauthorized",
"reason": "Unauthorized",
"code": 401
}

E0711 10:58:06.463689 2833600 kubelet_node_status.go:383] Error updating node status, will retry: error getting node "5.0.0.191": Get https://5.0.0.200:8443/api/v1/nodes/5.
0.0.191?resourceVersion=0: net/http: request canceled (Client.Timeout exceeded while awaiting headers)
E0711 10:58:16.464000 2833600 kubelet_node_status.go:383] Error updating node status, will retry: error getting node "5.0.0.191": Get https://5.0.0.200:8443/api/v1/nodes/5.
0.0.191: net/http: request canceled (Client.Timeout exceeded while awaiting headers)

@gjmzj
Copy link
Collaborator

gjmzj commented Jul 11, 2018

"如果我把kubelet注册apiserver使用http的方式一切就正常,不会出现Not Ready的情况。"
这个你实验过吗? 具体使用什么参数?@RuanChen
不确定这个是不是版本相关的bug,我这边试下1.10版本

@nevermore-muyi
Copy link
Author

@gjmzj 试过http的方式,我在1.8.6上的kubelet.kubeconfig把apiserver改成http://5.0.0.200:8080,发现是不会变成Not Ready的。
另外过了这么久,我什么都没操作,又好了。

[root@hz-test1 ansible]# kubectl get node
NAME STATUS ROLES AGE VERSION
5.0.0.191 Ready node 35m v1.9.1
5.0.0.192 Ready node 35m v1.9.1
5.0.0.193 Ready node 35m v1.9.1

@gjmzj
Copy link
Collaborator

gjmzj commented Jul 12, 2018

我这边已经测试v1.10.2也会出现同样问题,而v1.11.0版本没有问题;
而且出问题v1.10.2过段时间也会自行恢复

@nevermore-muyi
Copy link
Author

那看来是k8s自身的问题,我去找找它的更新日志。

@nevermore-muyi
Copy link
Author

kubelet: fix hangs in updating Node status after network interruptions/changes between the kubelet and API server (#63492, @liggitt)

kubernetes/kubernetes#63492

应该就是这个问题了!感谢 @gjmzj

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants