-
Notifications
You must be signed in to change notification settings - Fork 2.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
dns resolution fails on k8s where api server is down, no caching at all? #2629
Comments
Initial thoughts... Cache should be working here, and increasing ttl in kubernetes plugin should make those entries last longer in cache. Sometimes it takes a couple of minutes for a Configmap change to be pushed to the Pods. Is CoreDNS crashing and getting restarted at the time of the API stoppage? What version of CoreDNS are you using. Possibly related to #2464 |
hello @chrisohaver,
I've tried with ttl 30, delete the pod after to reload CoreDNS, same behavior.
Indeed. Before API stop:
After stoppping API server
1.1.3 |
So, CoreDNS is crashing when then API restarts. That would explain the empty cache. Cache does not persist past CoreDNS restarts. |
Yes with docker cli:
|
OK I've just built and tested the current master version, I confirm the issue is gone. I've tested with and without the extra 'ttl 60' config in kubernetes section, in both case dns resolution is still working after 100+ seconds. I don't really get the use of this ttl parameter to be honest, but I suppose cache is never invalidated when api server is down which is fine for me (and match with the old kubedns behavior). Thanks! |
Just FYI, ttl controls the ttl on the record, and thus how long the entry is cached in the coredns cache plugin (and any other intermediate caching dns server). kubedns behaves the same way in this regard because it uses the same kubernetes api client. |
Hi, I just deployed a cluster with Rancher and coredns refuses to start with |
@vitobotta, If it's the same issue, it is fixed in any version after 1.3.1. |
Hi @chrisohaver , I have mounted /tmp and it seems to help but now in the logs I see |
Is the fact that I am using Weave relevant? |
@vitobotta, that error usually means there is something wrong with the CNI, or other network problem preventing Pods (in this case coredns) from connecting to a Service (in this case the kubernetes api). |
I recreated the cluster using Canal instead of Weave and the CoreDNS issue didn't occur |
After an upgrade of k8s control plane that lead to lots of dns erros I've noticed that when api server is down dns resolution are not working. Seem like there are no cache at all whereas configuration suggests otherwise.
CoreDNS config is the one constructed by the migration script there https://github.com/coredns/deployment/tree/master/kubernetes
You can reproduce this behavior with the following Vagrant box:
N.B. microk8s is still using kubedns, a migration to coredns is performed then...
On ssh session n°1 we spam dns resolution:
On ssh session n°2 we stop api server:
You should instantly see dns errors on session n°1.
I've performed the same test:
cat deploy.sh | sed -e 's|kubectl|microk8s.kubectl|g' | bash | microk8s.kubectl apply -f -
in Vagrantfile. Dns resolution keep working (fun fact, cache seems to never expires!?)I found this behavior really disturbing from HA point of view, plus it's not matching the behavior of old kubedns. Am I missing something in the configuration or anything else?
The text was updated successfully, but these errors were encountered: