Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fixes issue with etcd DNS resolution via locally provided nameserver. #69

Closed
wants to merge 1 commit into from

Conversation

shift
Copy link
Contributor

@shift shift commented Dec 4, 2017

Issue

When restarting masters, etcd-member.service fails to be able to reverse lookup the names of the TLS nodes as DNS hasn't been defined yet. As a side effect causes some issues with CLUO deployed ;)

Dec 04 07:27:16 node0.cluster.com etcd-wrapper[875]: 2017-12-04 07:27:16.092239 I | etcdmain: rejected connection from "192.168.15.13:44308" (tls: "192.168.15.13" does not match any of DNSNames ["node0.int.cluster.com" "node3.int.cluster.com" "*.kube-etcd.kube-system.svc.cluster.local" "kube-etcd-client.kube-system.svc.cluster.local"])

I've only used this on bare-metal so not sure if it effects other module types.

Change Details

This change makes it so that the wait-for-dns.service uses the RequiredBy directive for etcd-member.service.

Testing

I've used this fix previously from prior to Typhoon and has been running successfully in more than one cluster.

@dghubble
Copy link
Member

dghubble commented Dec 5, 2017

Thanks! I've seen this race occur on rare occasions, but didn't get to changing it. If etcd-member.service starts before /etc/resolv.conf is populated the container is left with an empty /etc/resolv.conf and cannot resolve its peers. The quick fix is to systemctl restart etcd-member.servce.

Its unfortunate, but I think having both kubelet.service and now etcd-member.service require the little wait-for-dns.service shim unit is a suitable fix. It can occur on any platform so the fix should be applied in all controller container linux configs. Want to tweak the others?

When restarting masters, `etcd-member.service` fails to be able to reverse lookup the names of the TLS nodes as DNS hasn't been defined yet. As a side effect causes some issues with CLUO deployed ;)

```
Dec 04 07:27:16 node0.cluster.com etcd-wrapper[875]: 2017-12-04 07:27:16.092239 I | etcdmain: rejected connection from "192.168.15.13:44308" (tls: "192.168.15.13" does not match any of DNSNames ["node0.int.cluster.com" "node3.int.cluster.com" "*.kube-etcd.kube-system.svc.cluster.local" "kube-etcd-client.kube-system.svc.cluster.local"])
```
@shift shift force-pushed the hotfix/etcd-dns-bug branch from 6d88e4a to 6a40cec Compare December 7, 2017 17:22
@shift
Copy link
Contributor Author

shift commented Dec 7, 2017

Updated PR to include all 4 profiles :)

@dghubble
Copy link
Member

Thanks!

@dghubble
Copy link
Member

Merged as ce49a93 to normalize commit message

@dghubble dghubble closed this Dec 10, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants