Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] Issues with DNS lookups from within the container #3643

Closed
mmurhamm opened this issue May 9, 2023 · 15 comments · Fixed by #3703
Closed

[BUG] Issues with DNS lookups from within the container #3643

mmurhamm opened this issue May 9, 2023 · 15 comments · Fixed by #3703
Assignees
Labels
kind/bug Something isn't working status/need more information Issue needs more information before it will be looked at

Comments

@mmurhamm
Copy link

mmurhamm commented May 9, 2023

General information

  • OS: Windows
  • Hypervisor: Hyper-V
  • Did you run crc setup before starting it (Yes/No)? Yes
  • Running CRC on: Laptop

CRC version

CRC version: 2.18.0+4ea3a1
OpenShift version: 4.12.13
Podman version: 4.4.1

CRC status

CRC VM:          Running
OpenShift:       Running (v4.12.13)
RAM Usage:       10.34GB of 16.8GB
Disk Usage:      24.58GB of 32.74GB (Inside the CRC VM)
Cache Usage:     39.85GB
Cache Directory: C:\Users\021731618\.crc\cache

CRC config

- consent-telemetry                     : no
- cpus                                  : 8
- memory                                : 16384

Host Operating System

Host Name:                 <hostname>
OS Name:                   Microsoft Windows 11 Enterprise
OS Version:                10.0.22621 N/A Build 22621
OS Manufacturer:           Microsoft Corporation
OS Configuration:          Standalone Workstation
OS Build Type:             Multiprocessor Free
Registered Owner:          N/A
Registered Organization:   N/A
Product ID:                00330-80000-00000-AA629
Original Install Date:     26/03/2023, 02:46:09
System Boot Time:          09/05/2023, 08:25:45
System Manufacturer:       LENOVO
System Model:              <model>
System Type:               x64-based PC
Processor(s):              1 Processor(s) Installed.
                           [01]: Intel64 Family 6 Model 154 Stepping 3 GenuineIntel ~2400 Mhz
BIOS Version:              LENOVO N3JET32W (1.16 ), 02/03/2023
Windows Directory:         C:\Windows
System Directory:          C:\Windows\system32
Boot Device:               \Device\HarddiskVolume1
System Locale:             en-us;English (United States)
Input Locale:              de;German (Germany)
Time Zone:                 (UTC+01:00) Amsterdam, Berlin, Bern, Rome, Stockholm, Vienna
Total Physical Memory:     32.434 MB
Available Physical Memory: 5.742 MB
Virtual Memory: Max Size:  36.547 MB
Virtual Memory: Available: 5.091 MB
Virtual Memory: In Use:    31.456 MB
Page File Location(s):     C:\pagefile.sys
Domain:                    <domain>
Logon Server:              \\<hostname>
Hotfix(s):                 5 Hotfix(s) Installed.
                           [01]: KB5022497
                           [02]: KB5012170
                           [03]: KB5025800
                           [04]: KB5025239
                           [05]: KB5025749
Network Card(s):           7 NIC(s) Installed.
                           [01]: Intel(R) Wi-Fi 6E AX211 160MHz
                                 Connection Name: Wi-Fi
                                 Status:          Media disconnected
                           [02]: Realtek USB 2.5GbE Family Controller
                                 Connection Name: Ethernet 2
                                 DHCP Enabled:    Yes
                                 DHCP Server:     N/A
                                 IP address(es)
                           [03]: Cisco AnyConnect Virtual Miniport Adapter for Windows x64
                                 Connection Name: Ethernet 3
                                 Status:          Hardware not present
                           [04]: Hyper-V Virtual Ethernet Adapter
                                 Connection Name: vEthernet (ext-switch)
                                 DHCP Enabled:    Yes
                                 DHCP Server:     192.168.20.5
                                 IP address(es)
                                 [01]: 192.168.20.70
                                 [02]: fe80::6f39:f86b:d745:b50a
                           [05]: Microsoft Network Adapter Multiplexor Driver
                                 Connection Name: Network Bridge
                                 Status:          Media disconnected
                           [06]: Hyper-V Virtual Ethernet Adapter
                                 Connection Name: vEthernet (wlan-switch)
                                 Status:          Media disconnected
                           [07]: Array Networks SSL VPN Adapter
                                 Connection Name: Ethernet 4
                                 Status:          Hardware not present
Hyper-V Requirements:      A hypervisor has been detected. Features required for Hyper-V will not be displayed.

Steps to reproduce

  1. start a simple container: oc run -it busybox --image=busybox:latest -- sh
    (this (and other) container's /etc/resolv.conf points to dns-default service IP)
  2. run nslookup secure-feeds-production-us-east-1-761931097553.s3.us-east-1.amazonaws.com
  3. run nslookup against other hosts, e.g. cnn.com
  4. run both 2. and 3. inside pod in openshift-dns project (which backs dns-default service) and inside the node (both have their /etc/resolv.conf pointed at an IP that presumably points/maps outside the cluster on tap0 interface), or on the VM host or another VM on that host:
    both names are resolved

Expected

both names should be resolved from within a container

Actual

resolving the amazon s3 link fails:
Server: 10.217.4.10
Address: 10.217.4.10:53
Non-authoritative answer:
*** Can't find secure-feeds-production-us-east-1-761931097553.s3.us-east-1.amazonaws.com: No answer

resolving other names works:
Server: 10.217.4.10
Address: 10.217.4.10:53
Non-authoritative answer:
Non-authoritative answer:
Name: cnn.com
Address: 151.101.195.5
Name: cnn.com
Address: 151.101.3.5
Name: cnn.com
Address: 151.101.67.5
Name: cnn.com
Address: 151.101.131.5

Logs

I'll follow up with the below later if needed, as I need to keep the cluster alive for another purpose for a while ...

Before gather the logs try following if that fix your issue

$ crc delete -f
$ crc cleanup
$ crc setup
$ crc start --log-level debug

Please consider posting the output of crc start --log-level debug on http://gist.github.com/ and post the link in the issue.

@mmurhamm mmurhamm added kind/bug Something isn't working status/need triage labels May 9, 2023
@gbraad gbraad changed the title [BUG] [BUG] Issues with DNS lookups from within the container May 9, 2023
@gbraad gbraad added status/need more information Issue needs more information before it will be looked at and removed status/need triage labels May 9, 2023
@gbraad
Copy link
Contributor

gbraad commented May 9, 2023

Are you connected to a VPN ?

Can you ping or dig from the VM itself to the address secure-feeds-production-us-east-1-761931097553.s3.us-east-1.amazonaws.com ?

@mmurhamm
Copy link
Author

mmurhamm commented May 9, 2023

yes (VM), no (VPN)

@mmurhamm
Copy link
Author

I tried the same on a full cluster and did not experience the problem there. Any more ideas what I can try on CRC to find out what the cause might be?

@gbraad
Copy link
Contributor

gbraad commented May 12, 2023

do you mean this is a nested virtualization setup?

haven't been able to repro. next week we have a very busy schedule, but will try to see if someone can test this.

@praveenkumar
Copy link
Member

I am able to reproduce it and not know what is causing it, the only hint I get is from dns pod where it have error when try to do nslookup secure-feeds-production-us-east-1-761931097553.s3.us-east-1.amazonaws.com

 $ oc logs -f dns-default-stpf2 -c dns
.:5353
hostname.bind.:5353
[INFO] plugin/reload: Running configuration SHA512 = a0675954ef061cdf7342cc77803e4fe99c34049afbd7456cdfc471edfc45e6253b34260978853ea42d8a1e58337b195da375e255e77bc7bb0885eeddf134bcf5
CoreDNS-1.10.0
linux/arm64, go1.19.6, 
[INFO] 10.42.0.9:57210 - 19454 "A IN secure-feeds-production-us-east-1-761931097553.s3.us-east-1.amazonaws.com. udp 91 false 512" NXDOMAIN qr,rd,ra 91 3.445948665s
[INFO] 10.42.0.9:57210 - 19454 "A IN secure-feeds-production-us-east-1-761931097553.s3.us-east-1.amazonaws.com. udp 91 false 512" - - 0 5.001306853s
[ERROR] plugin/errors: 2 secure-feeds-production-us-east-1-761931097553.s3.us-east-1.amazonaws.com. A: dns: buffer size too small

I still get same error even I put higher bufsize for coredns config :( , I will need to check with openshift eng team.

@mmurhamm
Copy link
Author

Hi Praveen, did you get any updates on this? Thanks.

@praveenkumar
Copy link
Member

@mmurhamm not yet, most of the team members are busy with RH summit, I might able to provide you some update next week.

@praveenkumar
Copy link
Member

@mmurhamm Hi, quick update looks like busybox image uses different type of nslookup implementations and I tried different image like registry.k8s.io/e2e-test-images/jessie-dnsutils:1.3 which is mentioned in https://kubernetes.io/docs/tasks/administer-cluster/dns-debugging-resolution/ . Is it a requirement to use busybox?

I did some experiments with podman in the VM

$ podman run --rm  registry.k8s.io/e2e-test-images/jessie-dnsutils:1.3 nslookup secure-feeds-production-us-east-1-761931097553.s3.us-east-1.amazonaws.com
Server:		192.168.127.1
Address:	192.168.127.1#53

Non-authoritative answer:
Name:	secure-feeds-production-us-east-1-761931097553.s3.us-east-1.amazonaws.com
Address: 52.216.221.178
Name:	secure-feeds-production-us-east-1-761931097553.s3.us-east-1.amazonaws.com
Address: 52.217.201.90
Name:	secure-feeds-production-us-east-1-761931097553.s3.us-east-1.amazonaws.com
Address: 52.217.68.72
Name:	secure-feeds-production-us-east-1-761931097553.s3.us-east-1.amazonaws.com
Address: 52.217.132.34
Name:	secure-feeds-production-us-east-1-761931097553.s3.us-east-1.amazonaws.com
Address: 52.216.214.42
Name:	secure-feeds-production-us-east-1-761931097553.s3.us-east-1.amazonaws.com
Address: 3.5.7.175
Name:	secure-feeds-production-us-east-1-761931097553.s3.us-east-1.amazonaws.com
Address: 52.217.122.218
Name:	secure-feeds-production-us-east-1-761931097553.s3.us-east-1.amazonaws.com
Address: 52.217.97.176

$ podman run --rm  busybox:latest nslookup secure-feeds-production-us-east-1-761931097553.s3.us-east-1.amazonaws.com
Server:		192.168.127.1
Address:	192.168.127.1:53

Non-authoritative answer:

Non-authoritative answer:
*** Can't find secure-feeds-production-us-east-1-761931097553.s3.us-east-1.amazonaws.com: Parse error

praveenkumar added a commit to praveenkumar/gvisor-tap-vsock that referenced this issue Jun 9, 2023
Recently we observed, dns messages from our dns service is not compress
which sometime makes message size more than 512B and some client tools
fails to process it.

In this PR, message compression is used to reduce the length of DNS
messages by removing duplicated information.

- https://spathis.medium.com/how-dns-got-its-messages-on-diet-c49568b234a2

Without this PR if we query `dig
secure-feeds-production-us-east-1-761931097553.s3.us-east-1.amazonaws.com`
then message size is 803B and with this PR now it is 242B.

- Issue: crc-org/crc#3643

Signed-off-by: Praveen Kumar <kumarpraveen.nitdgp@gmail.com>
cfergeau pushed a commit to containers/gvisor-tap-vsock that referenced this issue Jun 9, 2023
Recently we observed, dns messages from our dns service is not compress
which sometime makes message size more than 512B and some client tools
fails to process it.

In this PR, message compression is used to reduce the length of DNS
messages by removing duplicated information.

- https://spathis.medium.com/how-dns-got-its-messages-on-diet-c49568b234a2

Without this PR if we query `dig
secure-feeds-production-us-east-1-761931097553.s3.us-east-1.amazonaws.com`
then message size is 803B and with this PR now it is 242B.

- Issue: crc-org/crc#3643

Signed-off-by: Praveen Kumar <kumarpraveen.nitdgp@gmail.com>
praveenkumar added a commit to praveenkumar/crc that referenced this issue Jun 9, 2023
@mmurhamm
Copy link
Author

mmurhamm commented Jun 9, 2023

@praveenkumar Hi, thanks for the pointer. I ran the dnsutils container but the nslookup command fails again:

Server: 10.217.4.10
Address: 10.217.4.10#53
** server can't find secure-feeds-production-us-east-1-761931097553.s3.us-east-1.amazonaws.com: SERVFAIL

I am running this on crc VM, not podman VM.
Btw, starting dnsutils container ends up in crash-loop-back-off due to pod security but I was able to fish the above from the logs before the crash.

I have upgraded crc to latest version but result is the same for both busybox and dnsutils versions of nslookup - they just won't resolve this name. I've also added an external DNS to crc config but that did not change things either.

crc version:
CRC version: 2.20.0+f3a947
OpenShift version: 4.13.0
Podman version: 4.4.4

crc status:
CRC VM: Running
OpenShift: Running (v4.13.0)
RAM Usage: 8.774GB of 16.77GB
Disk Usage: 21.67GB of 32.68GB (Inside the CRC VM)
Cache Usage: 20.78GB
Cache Directory: C:\Users<user>.crc\cache

crc config view:

  • consent-telemetry : no
  • cpus : 8
  • memory : 16384
  • nameserver : 8.8.8.8

@mmurhamm
Copy link
Author

@praveenkumar, in addition: the use of busybox is not required, this was just a solution to test recommended by the vendor (Sysdig), like you suggested the use of dnsutils. The actual image that needs to resolve that host name is based on Red Hat 8.8. And it is still not resolving.

@gbraad
Copy link
Contributor

gbraad commented Jun 12, 2023

there is an issue with fragmenting/truncation. the lookup of that particular address uses >512bytes for the response. this means it won't fit a single reply... we are still looking why this occurs.

@praveenkumar
Copy link
Member

@mmurhamm #3703 should fix that issue.

@praveenkumar praveenkumar moved this from Scheduled to Ready for review in Project planning: crc Jun 12, 2023
praveenkumar added a commit to praveenkumar/crc that referenced this issue Jun 12, 2023
anjannath pushed a commit that referenced this issue Jun 12, 2023
@github-project-automation github-project-automation bot moved this from Ready for review to Done in Project planning: crc Jun 12, 2023
@cfergeau
Copy link
Contributor

The actual image that needs to resolve that host name is based on Red Hat 8.8. And it is still not resolving.

Can you give more details about this image? What are you using for the DNS resolution on this image?

@mmurhamm
Copy link
Author

I can confirm that this issue has been fixed with v2.22 of crc. The Sysdig Registry Scanner now runs fine.
Thank you!

@mmurhamm
Copy link
Author

mmurhamm commented Jul 26, 2023 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Something isn't working status/need more information Issue needs more information before it will be looked at
Projects
Status: Done
Development

Successfully merging a pull request may close this issue.

4 participants