dns-liveness pod restarts failing k8s 1.14 tests #919

mboersma · 2019-03-29T18:19:12Z

Recently—since the introduction of CoreDNS 1.3.1?—there have been many test failures indicating restarts of the dns-liveness pod. We should fix this or skip the test.

CecileRobertMichon · 2019-03-29T18:21:01Z

Is this specific to 1.14 or is it more general?

mboersma · 2019-03-29T18:26:43Z

I think I have seen it with 1.12 and 1.13 as well, but I'm not sure now. I'll collect some data.

mboersma · 2019-03-29T18:34:37Z

Based on the E2E tests of the last 48 hours, this appears only to happen with Kubernetes 1.14 on Linux. That would make sense if the new CoreDNS is the problem.

I tried to use CoreDNS 1.4.0 based on the following comment which may be relevant to this, but the Docker image is not published:

There is a known issue coredns/coredns#2629 in CoreDNS 1.3.1, wherein if the Kubernetes API shuts down while CoreDNS is connected, CoreDNS will crash. The issue is fixed in CoreDNS 1.4.0 in coredns/coredns#2529.

CecileRobertMichon · 2019-04-02T17:12:30Z

This is failing more than 50% of 1.14 E2E tests. Should we pause the 1.14 E2E while we wait for the new CoreDNS image? The result is more often red than green and we're not actively troubleshooting failures. @jackfrancis thoughts?

jackfrancis · 2019-04-02T17:35:04Z

Let's just skip the "dns liveness validation tests" for >= 1.14

mboersma · 2019-04-03T15:29:17Z

Closed by #931

CecileRobertMichon · 2019-04-03T18:40:34Z

should we keep the issue open since we still need to fix the root cause? My PR just removed the test...

mboersma · 2019-04-03T18:48:11Z

Yes, let's do keep it open—thanks for paying attention. I think there's a fighting chance that k8s.gcr.io/coredns:1.4.0 fixes this behavior, whenever that actually gets published.

gjtempleton · 2019-04-04T16:02:01Z

Possibly worth noting that 1.4.0 of coreDNS has been published to coredns/coredns:1.4.0, just not to the k8s.gcr.io registry yet.

Although worth noting the discussion here: kubernetes/kubernetes#75414 (comment) - v1.4.0 looks unlikely to ever end up in vanilla k8s.

johnbelamaric · 2019-04-04T19:18:54Z

@fturib have you seen this?

mboersma · 2019-04-04T19:37:32Z

FWIW the emptyDir workaround suggested for coreDNS 1.3.1 does seem to have fixed the AKS Engine test case in #949.

fturib · 2019-04-04T21:46:01Z

@chrisohaver, @rajansandeep - could you help understanding the cause and advising a solution ?

chrisohaver · 2019-04-05T14:20:42Z

Since the emptyDir workaround resolved the test case, the api watches may be trying to log something.

Assuming that they are, applying the emptyDir fix seems like the best fix for now. When 1.5.0 is released (date TBD), it wont require the emptyDir fix, but it contains feature deprecations that make it incompatible with the 1.3.1 CoreDNS config file.

If we want to dig into what is being logged, clues could be in coredns logs or api-server logs. Knowing why the message is logged would help understand the scope.

mboersma · 2019-04-12T16:24:36Z

I'm closing this since the emptyDir fix is the best option for now. We can revisit this for aks-engine when CoreDNS 1.5.0 or later is blessed by a Kubernetes release.

CecileRobertMichon added the area/test label Mar 29, 2019

mboersma changed the title ~~dns-liveness pod restarts failing tests~~ dns-liveness pod restarts failing k8s 1.14 tests Mar 29, 2019

CecileRobertMichon mentioned this issue Apr 1, 2019

perf: Network Monitor v0.0.6 integration #912

Merged

4 tasks

CecileRobertMichon mentioned this issue Apr 2, 2019

test: skip dns-liveness restarts check for k8s 1.14 #931

Merged

4 tasks

mboersma closed this as completed Apr 3, 2019

mboersma reopened this Apr 3, 2019

mboersma closed this as completed Apr 12, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

dns-liveness pod restarts failing k8s 1.14 tests #919

dns-liveness pod restarts failing k8s 1.14 tests #919

mboersma commented Mar 29, 2019

CecileRobertMichon commented Mar 29, 2019

mboersma commented Mar 29, 2019

mboersma commented Mar 29, 2019

CecileRobertMichon commented Apr 2, 2019

jackfrancis commented Apr 2, 2019

mboersma commented Apr 3, 2019

CecileRobertMichon commented Apr 3, 2019

mboersma commented Apr 3, 2019

gjtempleton commented Apr 4, 2019

johnbelamaric commented Apr 4, 2019

mboersma commented Apr 4, 2019

fturib commented Apr 4, 2019

chrisohaver commented Apr 5, 2019 •

edited

Loading

mboersma commented Apr 12, 2019

dns-liveness pod restarts failing k8s 1.14 tests #919

dns-liveness pod restarts failing k8s 1.14 tests #919

Comments

mboersma commented Mar 29, 2019

CecileRobertMichon commented Mar 29, 2019

mboersma commented Mar 29, 2019

mboersma commented Mar 29, 2019

CecileRobertMichon commented Apr 2, 2019

jackfrancis commented Apr 2, 2019

mboersma commented Apr 3, 2019

CecileRobertMichon commented Apr 3, 2019

mboersma commented Apr 3, 2019

gjtempleton commented Apr 4, 2019

johnbelamaric commented Apr 4, 2019

mboersma commented Apr 4, 2019

fturib commented Apr 4, 2019

chrisohaver commented Apr 5, 2019 • edited Loading

mboersma commented Apr 12, 2019

chrisohaver commented Apr 5, 2019 •

edited

Loading