You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
In my OKD cluster the Corefile in Node master-1 is faulty. Instead of a cluster external DNS resolver it has 127.0.0.53 in the forward declaration.
One of my customers has the very same symptom (wrong Corefile on master-1) in their cluster and experiences very high CPU load (~2.3 cores) for this exact pod with frequent "i/o timeout" messages in coredns container logs.
When manually correcting the Corefile by replacing 127.0.0.53 with an actual DNS resolver IP (in my case 10.1.0.1), these messages disappear and the cpu load normalized to 0.002 cores.
[root@localhost ocp-install]# oc get clusterversion
NAME VERSION AVAILABLE PROGRESSING SINCE STATUS
version 4.9.0-0.okd-2021-11-28-035710 True False 45d Cluster version is 4.9.0-0.okd-2021-11-28-035710
How reproducible
CPU load stays constantly high even after Pod restart. Only on one master node
Log bundle
master-1 (bad config)
Pod logs of master-1 coredns-monitor shows that its runtimecfg util is rendering a faulty Corefile with 127.0.0.53 in forward rule.
Describe the bug
In my OKD cluster the Corefile in Node master-1 is faulty. Instead of a cluster external DNS resolver it has 127.0.0.53 in the forward declaration.
One of my customers has the very same symptom (wrong Corefile on master-1) in their cluster and experiences very high CPU load (~2.3 cores) for this exact pod with frequent "i/o timeout" messages in coredns container logs.
When manually correcting the Corefile by replacing 127.0.0.53 with an actual DNS resolver IP (in my case 10.1.0.1), these messages disappear and the cpu load normalized to 0.002 cores.
Related to /issues/978.
Version
I am running OKD 4.9.0 IPI on vSphere 6.7:
[root@localhost ocp-install]# oc get clusterversion NAME VERSION AVAILABLE PROGRESSING SINCE STATUS version 4.9.0-0.okd-2021-11-28-035710 True False 45d Cluster version is 4.9.0-0.okd-2021-11-28-035710
How reproducible
CPU load stays constantly high even after Pod restart. Only on one master node
Log bundle
master-1 (bad config)
Pod logs of master-1 coredns-monitor shows that its runtimecfg util is rendering a faulty Corefile with
127.0.0.53
in forward rule.When I run the command of the coredns-monitor pod in the running container, it renders a correct configuration:
master-0 (good config) (same for master-2)
For comparison, this is what the logs tell me for coredns-monitor on the other masters. The configuration looks good.
The text was updated successfully, but these errors were encountered: