Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

1.11的集群出现DNS查询的问题 #30

Closed
BSWANG opened this issue Jul 16, 2016 · 17 comments
Closed

1.11的集群出现DNS查询的问题 #30

BSWANG opened this issue Jul 16, 2016 · 17 comments
Assignees
Labels

Comments

@BSWANG
Copy link
Member

BSWANG commented Jul 16, 2016

目前我们排查定位
到是docker daemon的DNS server的问题,运行一段时间之后DNS的回复的的目的端口错乱,目前建议几种方法可以保证业务正常使用,

  1. 通过tcp的方式DNS查询
  2. 出现这个DNS解析错误后直接让进程退出,配合restart:always保证容器出问题后能重启
  3. 通过extra_host的方式绑定外部域名和IP,这样会把域名和IP放到容器的/etc/hosts文件中,保证不会解析失败
@let5sne
Copy link

let5sne commented Jul 16, 2016

我来认领了,准备按意见实施,会随时反馈结果的,谢谢

@BSWANG
Copy link
Member Author

BSWANG commented Jul 18, 2016

docker/libnetwork官方的issue: moby/moby#22185, 给出的时间是1.12中解决
另外增加一种绕过的方法:

# cat /etc/resolv.conf
nameserver 127.0.0.11
options use-vc ndots:0

在容器的/etc/resolv.conf中增加use-vc配置强制使用tcp方式查询

@denverdino
Copy link
Contributor

@BSWANG 考虑一下在我们的agent中增加这个配置功能

@let5sne
Copy link

let5sne commented Jul 20, 2016

感谢 @BSWANG ,晚上升级尝试

@let5sne
Copy link

let5sne commented Jul 21, 2016

按照 @BSWANG 的建议,升级了docker engine ,截至目前已运行13小时,没有再次出现DNS解析不到的问题

@let5sne
Copy link

let5sne commented Jul 21, 2016

@BSWANG 很遗憾,问题好像没有修复,还是无法解析DNS,只是这次挺住了十多个小时

@BSWANG
Copy link
Member Author

BSWANG commented Jul 21, 2016

@let5sne 我在这个版本的上面测试了官方的复现方法,是解决了的,你安装完之后是否重启的docker daemon,能否看到docker version的信息?

@let5sne
Copy link

let5sne commented Jul 21, 2016

我一开始也以为解决了,是昨晚按照补丁升级的,也是完成了重启,以下是 docker version
`[root@c457f0fd13aba4124b2673d720e14d15a-node1 ~]# docker version
Client:
Version: 1.11.2
API version: 1.23
Go version: go1.5.4
Git commit: 82b5050
Built: Wed Jul 20 08:48:53 2016
OS/Arch: linux/amd64

Server:
Version: 1.11.2
API version: 1.23
Go version: go1.5.4
Git commit: 82b5050
Built: Wed Jul 20 08:48:53 2016
OS/Arch: linux/amd64`
再次出现问题是大约20小时候

@let5sne
Copy link

let5sne commented Jul 21, 2016

@BSWANG

@BSWANG
Copy link
Member Author

BSWANG commented Jul 21, 2016

@let5sne 这次失败之后还会一直没办法查询吗?还是只是一次的DNS查询失败,有可能是外部的DNS的服务的波动

@let5sne
Copy link

let5sne commented Jul 21, 2016

@BSWANG 间歇性大约每三次失败一次,重启后才能恢复

@boyd4y
Copy link

boyd4y commented Jul 22, 2016

请问这个最终怎么fix?docker能修改resolve.conf文件么?

@denverdino
Copy link
Contributor

我们还在检查Docker Engine代码中的问题,暂时可以通过 comment 1的方法来解决

@BSWANG
Copy link
Member Author

BSWANG commented Jul 25, 2016

@boyd4y @let5sne
通过官方最新的FIX PR重新构建了docker engine的包,在这个PR中去除了对DNS Client的缓存(每次请求会新创建DNS Client), 能保证不会出现UDP包的错乱问题,另外iptables的设置在之前的RP中已经用了reexec去执行,应该不会再有在释放掉的容器的命名空间的goroutine的发生。
可以通过安装下面的新构建的包解决DNS的问题,安装脚本如下:
注意:安装docker engine的时候docker engine会重启,请注意安装的时间:
Centos:

rpm --force -ivh http://acs-public-mirror.oss-cn-hangzhou.aliyuncs.com/docker-engine/daemon-build/centos/docker-engine-selinux-1.11.2-0.0.20160719.060320.gitfda9df0.el7.centos.noarch.rpm http://acs-public-mirror.oss-cn-hangzhou.aliyuncs.com/docker-engine/daemon-build/centos/docker-engine-1.11.2-0.0.20160719.060320.gitfda9df0.el7.centos.x86_64.rpm && service docker restart 

Ubuntu:

curl -L http://acs-public-mirror.oss-cn-hangzhou.aliyuncs.com/docker-engine/daemon-build/ubuntu/docker-engine_1.11.2~git20160719.060320.0.fda9df0-0~trusty_amd64.deb -o docker-engine.deb && dpkg -i docker-engine.deb && rm -f docker-engine.deb

@let5sne
Copy link

let5sne commented Jul 25, 2016

@BSWANG
刚完成升级并且将挂载到了负载均衡,再观察希望能解决。感谢!

@BSWANG BSWANG closed this as completed Jul 26, 2016
@BSWANG BSWANG reopened this Jul 26, 2016
@let5sne
Copy link

let5sne commented Aug 1, 2016

@BSWANG 经过长时间的观察,问题应该算是解决了,日志里面似乎没有再出现解析失败的记录了

@BSWANG
Copy link
Member Author

BSWANG commented Aug 4, 2016

目前已发布解决这个DNS问题的docker engine 的更新,如果在1.11.2的版本遇到问题的话,可以使用在集群上选择升级docker的方式升级集群的docker engine版本到1.11.2.1来解决DNS的问题。

@BSWANG BSWANG closed this as completed Aug 4, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

4 participants