Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Errors on resolution of long domain names with forwardKubeDNSToHost #8763

Closed
Sad-Soul-Eater opened this issue May 19, 2024 · 4 comments · Fixed by #8768
Closed

Errors on resolution of long domain names with forwardKubeDNSToHost #8763

Sad-Soul-Eater opened this issue May 19, 2024 · 4 comments · Fixed by #8768
Assignees

Comments

@Sad-Soul-Eater
Copy link

Bug Report

Description

I get errors when I try to resolve long domain name (e.g. video-edge-3e7abd.pdx01.abs.hls.ttvnw.net) with forwardKubeDNSToHost enabled

Logs

nslookup to kube-dns service

/# nslookup video-edge-3e7abd.pdx01.abs.hls.ttvnw.net 10.10.0.10

Server:         10.10.0.10
Address:        10.10.0.10:53

video-edge-3e7abd.pdx01.abs.hls.ttvnw.net       canonical name = spade.sci.twitch.tv
spade.sci.twitch.tv     canonical name = science-edge-external-prod-73889260.us-west-2.elb.amazonaws.com

*** Can't find video-edge-3e7abd.pdx01.abs.hls.ttvnw.net: No answer

coredns logs:

[INFO] 10.10.129.53:57190 - 40677 "A IN video-edge-3e7abd.pdx01.abs.hls.ttvnw.net. udp 59 false 512" - - 0 5.001672788s 
[ERROR] plugin/errors: 2 video-edge-3e7abd.pdx01.abs.hls.ttvnw.net. A: dns: buffer size too small                       

nslookup to host-dns service

/ # nslookup video-edge-3e7abd.pdx01.abs.hls.ttvnw.net 10.10.0.9
Server:         10.10.0.9
Address:        10.10.0.9:53

video-edge-3e7abd.pdx01.abs.hls.ttvnw.net       canonical name = spade.sci.twitch.tv
spade.sci.twitch.tv     canonical name = science-edge-external-prod-73889260.us-west-2.elb.amazonaws.com

*** Can't find video-edge-3e7abd.pdx01.abs.hls.ttvnw.net: Parse error

nslookup to upstream localhost openwrt

/ # nslookup video-edge-3e7abd.pdx01.abs.hls.ttvnw.net 192.168.1.1
Server:         192.168.1.1
Address:        192.168.1.1:53

Non-authoritative answer:
video-edge-3e7abd.pdx01.abs.hls.ttvnw.net       canonical name = spade.sci.twitch.tv
spade.sci.twitch.tv     canonical name = science-edge-external-prod-73889260.us-west-2.elb.amazonaws.com

Non-authoritative answer:
video-edge-3e7abd.pdx01.abs.hls.ttvnw.net       canonical name = spade.sci.twitch.tv
spade.sci.twitch.tv     canonical name = science-edge-external-prod-73889260.us-west-2.elb.amazonaws.com
Name:   science-edge-external-prod-73889260.us-west-2.elb.amazonaws.com
Address: 52.41.185.214
Name:   science-edge-external-prod-73889260.us-west-2.elb.amazonaws.com
Address: 54.212.246.80
Name:   science-edge-external-prod-73889260.us-west-2.elb.amazonaws.com
Address: 100.20.138.12
Name:   science-edge-external-prod-73889260.us-west-2.elb.amazonaws.com
Address: 54.203.132.89
Name:   science-edge-external-prod-73889260.us-west-2.elb.amazonaws.com
Address: 35.163.181.185
Name:   science-edge-external-prod-73889260.us-west-2.elb.amazonaws.com
Address: 44.228.227.151
Name:   science-edge-external-prod-73889260.us-west-2.elb.amazonaws.com
Address: 52.26.189.192
Name:   science-edge-external-prod-73889260.us-west-2.elb.amazonaws.com
Address: 50.112.234.206

upstream dns in talos

talosctl get dnsupstreams.net.talos.dev
NODE             NAMESPACE   TYPE          ID            VERSION   HEALTHY   ADDRESS
192.168.10.200   network     DNSUpstream   192.168.1.1   1         true      192.168.1.1:53

Environment

  • Talos version: v1.7.2
  • Kubernetes version: v1.30.1
  • Platform: Proxmox

hostDNS settings:

machine:
    hostDNS:
      enabled: true
      forwardKubeDNSToHost: true
      resolveMemberNames: true
@smira
Copy link
Member

smira commented May 20, 2024

@DmitriyMV this clearly looks like a bug

@DmitriyMV
Copy link
Member

DmitriyMV commented May 20, 2024

I think I can explain what's happening here: in your last example you are bypassing coredns server and ask your own dns on openwrt, so it works. The culprit here is CoreDNS itself which doesn't support dns compression nor does it support setting custom value for limiting udp packet size for dns messages. It uses the default limit of 512 bytes. Increasing bufsize to 4096 (maximum allowed limit per RFC 6891) doesn't help here either since it happens much later and already uses reasonable default of 1232.

There are solution to this: you can force CoreDNS to use TCP for dns queries:

    forward . /etc/resolv.conf {
      force_tcp
    }

Or maybe even with max_concurrent set to prevent resource exhaustion.

    forward . /etc/resolv.conf {
      force_tcp
      max_concurrent 1000
    }

I will also add support for bigger dns messages on our side, but until CoreDNS fixes this, our fix will not help.

With CoreDNS config above it starts working:

/ ~ kubectl run alpine --image alpine -it -- ash                                   
If you don't see a command prompt, try pressing enter.
/ #  nslookup video-edge-3e7abd.pdx01.abs.hls.ttvnw.net
Server:		10.96.0.10
Address:	10.96.0.10:53

Non-authoritative answer:
video-edge-3e7abd.pdx01.abs.hls.ttvnw.net	canonical name = spade.sci.twitch.tv
spade.sci.twitch.tv	canonical name = science-edge-external-prod-73889260.us-west-2.elb.amazonaws.com

Non-authoritative answer:
video-edge-3e7abd.pdx01.abs.hls.ttvnw.net	canonical name = spade.sci.twitch.tv
spade.sci.twitch.tv	canonical name = science-edge-external-prod-73889260.us-west-2.elb.amazonaws.com
Name:	science-edge-external-prod-73889260.us-west-2.elb.amazonaws.com
Address: 52.40.165.114
Name:	science-edge-external-prod-73889260.us-west-2.elb.amazonaws.com
Address: 52.26.221.247
Name:	science-edge-external-prod-73889260.us-west-2.elb.amazonaws.com
Address: 100.21.1.130
Name:	science-edge-external-prod-73889260.us-west-2.elb.amazonaws.com
Address: 44.232.190.254
Name:	science-edge-external-prod-73889260.us-west-2.elb.amazonaws.com
Address: 54.185.219.138
Name:	science-edge-external-prod-73889260.us-west-2.elb.amazonaws.com
Address: 34.216.231.5
Name:	science-edge-external-prod-73889260.us-west-2.elb.amazonaws.com
Address: 52.41.215.207
Name:	science-edge-external-prod-73889260.us-west-2.elb.amazonaws.com
Address: 54.68.106.123

DmitriyMV added a commit to DmitriyMV/talos that referenced this issue May 20, 2024
By default, github.com/miekg/dns uses `dns.MinMsgSize` for UDP messages, which is 512 bytes. This is too small for some
DNS request/responses, and can cause truncation and errors. This change sets the buffer size to `dns.DefaultMsgSize`
4096 bytes, which is the maximum size of a dns packet payload per RFC 6891.

Also increase CoreDNS dns request payload limit to 4096 from the default 1232 bytes.

For siderolabs#8763

Signed-off-by: Dmitriy Matrenichev <dmitry.matrenichev@siderolabs.com>
@Sad-Soul-Eater
Copy link
Author

The culprit here is CoreDNS itself which doesn't support dns compression

@DmitriyMV

If I understand correctly, CoreDNS shouldn't resolve long domain names no matter hostDNS and forwardKubeDNSToHost are enabled or not, because it's only a CoreDNS limitation.

But it successfully resolved video-edge-3e7abd.pdx01.abs.hls.ttvnw.net with hostDNS: false and forwardKubeDNSToHost: false but it fails when these parameters are true.

I may wrong, but for me, it's looks like a problem appears when we have this chain CoreDNS -> forwardKubeDNSToHost -> upstream router but there are not problems when CoreDNS -> upstream router

DmitriyMV added a commit to DmitriyMV/talos that referenced this issue May 20, 2024
- By default, github.com/miekg/dns uses `dns.MinMsgSize` for UDP messages, which is 512 bytes. This is too small for some
DNS request/responses, and can cause truncation and errors. This change sets the buffer size to `dns.DefaultMsgSize`
4096 bytes, which is the maximum size of a dns packet payload per RFC 6891.
- Another thing we do is increasing CoreDNS dns request payload limit to 4096 from the default 1232 bytes.
- We also retry the request if the response is truncated or previous connection was closed.
- And finally we properly handle the case where the response is larger than the client buffer size,
and we return a truncated correct response.

Closes siderolabs#8763

Signed-off-by: Dmitriy Matrenichev <dmitry.matrenichev@siderolabs.com>
DmitriyMV added a commit to DmitriyMV/talos that referenced this issue May 20, 2024
- By default, github.com/miekg/dns uses `dns.MinMsgSize` for UDP messages, which is 512 bytes. This is too small for some
DNS request/responses, and can cause truncation and errors. This change sets the buffer size to `dns.DefaultMsgSize`
4096 bytes, which is the maximum size of a dns packet payload per RFC 6891.
- Another thing we do is increasing CoreDNS dns request payload limit to 4096 from the default 1232 bytes.
- We also retry the request if the response is truncated or previous connection was closed.
- And finally we properly handle the case where the response is larger than the client buffer size,
and we return a truncated correct response.

Closes siderolabs#8763

Signed-off-by: Dmitriy Matrenichev <dmitry.matrenichev@siderolabs.com>
DmitriyMV added a commit to DmitriyMV/talos that referenced this issue May 20, 2024
- By default, github.com/miekg/dns uses `dns.MinMsgSize` for UDP messages, which is 512 bytes. This is too small for some
DNS request/responses, and can cause truncation and errors. This change sets the buffer size to `dns.DefaultMsgSize`
4096 bytes, which is the maximum size of a dns packet payload per RFC 6891.
- Another thing we do is increasing CoreDNS dns request payload limit to 4096 from the default 1232 bytes.
- We also retry the request if the response is truncated or previous connection was closed.
- And finally we properly handle the case where the response is larger than the client buffer size,
and we return a truncated correct response.

Closes siderolabs#8763

Signed-off-by: Dmitriy Matrenichev <dmitry.matrenichev@siderolabs.com>
DmitriyMV added a commit to DmitriyMV/talos that referenced this issue May 23, 2024
- By default, github.com/miekg/dns uses `dns.MinMsgSize` for UDP messages, which is 512 bytes. This is too small for some
DNS request/responses, and can cause truncation and errors. This change sets the buffer size to `dns.DefaultMsgSize`
4096 bytes, which is the maximum size of a dns packet payload per RFC 6891.
- We also retry the request if the response is truncated or previous connection was closed.
- And finally we properly handle the case where the response is larger than the client buffer size,
and we return a truncated correct response.

We test long responses using "video-edge-3e7abd.pdx01.abs.hls.ttvnw.net" which is Twitch subdomain.

Closes siderolabs#8763

Signed-off-by: Dmitriy Matrenichev <dmitry.matrenichev@siderolabs.com>
DmitriyMV added a commit to DmitriyMV/talos that referenced this issue May 23, 2024
- By default, github.com/miekg/dns uses `dns.MinMsgSize` for UDP messages, which is 512 bytes. This is too small for some
DNS request/responses, and can cause truncation and errors. This change sets the buffer size to `dns.DefaultMsgSize`
4096 bytes, which is the maximum size of a dns packet payload per RFC 6891.
- We also retry the request if the response is truncated or previous connection was closed.
- And finally we properly handle the case where the response is larger than the client buffer size,
and we return a truncated correct response.

We test long responses using "video-edge-3e7abd.pdx01.abs.hls.ttvnw.net" which is Twitch subdomain.

Closes siderolabs#8763

Signed-off-by: Dmitriy Matrenichev <dmitry.matrenichev@siderolabs.com>
DmitriyMV added a commit to DmitriyMV/talos that referenced this issue May 23, 2024
- By default, github.com/miekg/dns uses `dns.MinMsgSize` for UDP messages, which is 512 bytes. This is too small for some
DNS request/responses, and can cause truncation and errors. This change sets the buffer size to `dns.DefaultMsgSize`
4096 bytes, which is the maximum size of a dns packet payload per RFC 6891.
- We also retry the request if the response is truncated or previous connection was closed.
- And finally we properly handle the case where the response is larger than the client buffer size,
and we return a truncated correct response.

We test long responses using "video-edge-3e7abd.pdx01.abs.hls.ttvnw.net" which is Twitch subdomain.

Closes siderolabs#8763

Signed-off-by: Dmitriy Matrenichev <dmitry.matrenichev@siderolabs.com>
DmitriyMV added a commit to DmitriyMV/talos that referenced this issue May 23, 2024
- By default, github.com/miekg/dns uses `dns.MinMsgSize` for UDP messages, which is 512 bytes. This is too small for some
DNS request/responses, and can cause truncation and errors. This change sets the buffer size to `dns.DefaultMsgSize`
4096 bytes, which is the maximum size of a dns packet payload per RFC 6891.
- We also retry the request if the response is truncated or previous connection was closed.
- And finally we properly handle the case where the response is larger than the client buffer size,
and we return a truncated correct response.

We test long responses using "video-edge-3e7abd.pdx01.abs.hls.ttvnw.net" which is Twitch subdomain.

Closes siderolabs#8763

Signed-off-by: Dmitriy Matrenichev <dmitry.matrenichev@siderolabs.com>
DmitriyMV added a commit to DmitriyMV/talos that referenced this issue May 23, 2024
- By default, github.com/miekg/dns uses `dns.MinMsgSize` for UDP messages, which is 512 bytes. This is too small for some
DNS request/responses, and can cause truncation and errors. This change sets the buffer size to `dns.DefaultMsgSize`
4096 bytes, which is the maximum size of a dns packet payload per RFC 6891.
- We also retry the request if the response is truncated or previous connection was closed.
- And finally we properly handle the case where the response is larger than the client buffer size,
and we return a truncated correct response.

Closes siderolabs#8763

Signed-off-by: Dmitriy Matrenichev <dmitry.matrenichev@siderolabs.com>
@DmitriyMV
Copy link
Member

DmitriyMV commented May 23, 2024

@Sad-Soul-Eater so after further investigation we found out that our dns server didn't properly truncate responses, so CoreDNS didn't retry queries for the long and truncated responses using TCP (as it should). #8768 should fix that.

DmitriyMV added a commit to DmitriyMV/talos that referenced this issue May 24, 2024
- By default, github.com/miekg/dns uses `dns.MinMsgSize` for UDP messages, which is 512 bytes. This is too small for some
DNS request/responses, and can cause truncation and errors. This change sets the buffer size to `dns.DefaultMsgSize`
4096 bytes, which is the maximum size of a dns packet payload per RFC 6891.
- We also retry the request if the response is truncated or previous connection was closed.
- And finally we properly handle the case where the response is larger than the client buffer size,
and we return a truncated correct response.

Closes siderolabs#8763

Signed-off-by: Dmitriy Matrenichev <dmitry.matrenichev@siderolabs.com>
@DmitriyMV DmitriyMV reopened this May 24, 2024
mrclrchtr added a commit to hcloud-talos/terraform-hcloud-talos that referenced this issue May 28, 2024
smira pushed a commit to smira/talos that referenced this issue May 28, 2024
- By default, github.com/miekg/dns uses `dns.MinMsgSize` for UDP messages, which is 512 bytes. This is too small for some
DNS request/responses, and can cause truncation and errors. This change sets the buffer size to `dns.DefaultMsgSize`
4096 bytes, which is the maximum size of a dns packet payload per RFC 6891.
- We also retry the request if the response is truncated or previous connection was closed.
- And finally we properly handle the case where the response is larger than the client buffer size,
and we return a truncated correct response.

Closes siderolabs#8763

Signed-off-by: Dmitriy Matrenichev <dmitry.matrenichev@siderolabs.com>
(cherry picked from commit a9cf9b7)
mrclrchtr added a commit to hcloud-talos/terraform-hcloud-talos that referenced this issue May 28, 2024
smira pushed a commit to smira/talos that referenced this issue May 29, 2024
- By default, github.com/miekg/dns uses `dns.MinMsgSize` for UDP messages, which is 512 bytes. This is too small for some
DNS request/responses, and can cause truncation and errors. This change sets the buffer size to `dns.DefaultMsgSize`
4096 bytes, which is the maximum size of a dns packet payload per RFC 6891.
- We also retry the request if the response is truncated or previous connection was closed.
- And finally we properly handle the case where the response is larger than the client buffer size,
and we return a truncated correct response.

Closes siderolabs#8763

Signed-off-by: Dmitriy Matrenichev <dmitry.matrenichev@siderolabs.com>
(cherry picked from commit a9cf9b7)
@github-actions github-actions bot locked as resolved and limited conversation to collaborators Jul 24, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants