hostDNS incompatibility #9143

jhogendorn · 2024-08-09T22:46:22Z

Bug Report

Description

I was having some issues applying the config, as soon as I ran it, the machine would start failing network connectivity. I was helped by Justin Garrison in slack and we set hostDNS: disabled to resolve it but filing the bug in case theres further investigation needed. The upstream dns servers are .2 (coredns) and .4 (adguard). It seems the cache system was having some incompatibility. No logs showing errors in the upstream dns.

Logs

support.zip

Environment

Talos version: 1.7.5
Platform: proxmox 8.2.0

The text was updated successfully, but these errors were encountered:

smira · 2024-08-12T12:47:47Z

Unfortunately (it's a bug), but the support bundle doesn't contain the log for dns-resolve-cache which would be the one which has the clue to the problem.

If you could reproduce and grab talosct logs dns-resolve-cache, that would be perfect. Thank you!

See siderolabs/talos#9143 Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>

jhogendorn · 2024-08-13T04:38:12Z

I went to replicate this using the existing bootstrapped cluster and it worked with hostDNS: true.
I have changed some settings in my coredns instance in the meantime, so that could be a confounding factor.
I made a new talos cluster config and spun up an instance to apply-config to and it also worked with hostDNS: true.

The change I think is relevant in the local dns servers is that previously, coredns (.2) (the first dns in the dhcp list) was not configured to pass requests it didnt match upstream anywhere, and it now is. This was ok before because clients would get a miss on .2 and then hit .4 and get a hit. Perhaps some config in talos image is unable to do the same?

To test this i disable the upstream dns config in coredns and retried the new cluster, and got the same failure as before.

I've attached the requested log file.

dns-resolve-cache.log

the coredns configuration difference is just having a block similar to:

. { # this upstream forward rule makes talos dns work
    forward . dns://192.168.0.4
    forward . dns://1.1.1.1
}

Do not return response to the client if we got SERVFAIL or REFUSED, until we run out of upstreams. Fixes siderolabs#9143 Signed-off-by: Dmitriy Matrenichev <dmitry.matrenichev@siderolabs.com>

Do not return response to the client if we got SERVFAIL or REFUSED, until we run out of upstreams. Fixes siderolabs#9143 Signed-off-by: Dmitriy Matrenichev <dmitry.matrenichev@siderolabs.com> (cherry picked from commit a5bd770)

DmitriyMV self-assigned this Aug 11, 2024

smira added a commit to smira/go-talos-support that referenced this issue Aug 12, 2024

fix: add dns-resolve-cache to the list of logs gathered

f9d46fd

See siderolabs/talos#9143 Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>

smira mentioned this issue Aug 12, 2024

fix: add dns-resolve-cache to the list of logs gathered siderolabs/go-talos-support#4

Merged

DmitriyMV mentioned this issue Aug 14, 2024

fix: retry with another upstream if the previous failed. #9179

Merged

talos-bot closed this as completed in a5bd770 Aug 15, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

hostDNS incompatibility #9143

hostDNS incompatibility #9143

jhogendorn commented Aug 9, 2024

smira commented Aug 12, 2024

jhogendorn commented Aug 13, 2024

hostDNS incompatibility #9143

hostDNS incompatibility #9143

Comments

jhogendorn commented Aug 9, 2024

Bug Report

Description

Logs

Environment

smira commented Aug 12, 2024

jhogendorn commented Aug 13, 2024