Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DNS names that can't be resolved in Colima, possibly only with gvproxy network driver #466

Closed
2 of 3 tasks
rfay opened this issue Nov 7, 2022 · 37 comments
Closed
2 of 3 tasks
Milestone

Comments

@rfay
Copy link
Contributor

rfay commented Nov 7, 2022

Description

I'm starting this issue so we can start to track down the specific DNS addresses that fail in colima/lima, and the sources of information. I get this question all the time, and tell people to use --dns 1.1.1.1 and it almost always fixes. But I think we should start to track what they are so maybe we can solve this someday.

Issue hostname
ddev/ddev#4372 mavtek-840225427682.d.codeartifact.us-east-1.amazonaws.com
ddev/ddev#4413 www.youtube.com (seems to be youtube-ui.l.google.com)
#466 (comment) test12345.s3.ap-northeast-1.amazonaws.com

Version

Colima Version: Various
Lima Version:
Qemu Version:

Operating System

  • macOS Intel
  • macOS M1
  • Linux

Workarounds

Many people have reported in the comments that changing to the slirp network driver resolved the issue.

@paihu
Copy link

paihu commented Nov 26, 2022

I have the same issue.

Reproduction Steps

  1. docker network create test
  2. docker run --rm -it --network test alpine
  3. apk add --no-cache curl && curl test12345.s3.ap-northeast-1.amazonaws.com
  4. docker network rm test

Not Reproduction Steps

  1. docker network create test
  2. docker run --rm -it --network test alpine
  3. apk add --no-cache curl && curl test1234.s3.ap-northeast-1.amazonaws.com
  4. docker network rm test

Reproduce if fqdn is more than 41 characters and non default docker network

workaround

In my case...

open ~/.colima/default/colima.yml

edit network.driver

network:
  driver: slirp

@rfay
Copy link
Contributor Author

rfay commented Nov 29, 2022

Added youtube.com to the list in OP

@renatho
Copy link

renatho commented Nov 29, 2022

Added youtube.com to the list in OP

Notice that youtube.com worked for me, but www.youtube.com didn't work. 😉

@rfay
Copy link
Contributor Author

rfay commented Nov 29, 2022

Edited, thanks @renatho

@abiosoft
Copy link
Owner

If indeed using slirp as the network driver fixes it, this should be resolved by the next release v0.5.0.

@Schrank
Copy link

Schrank commented Dec 15, 2022

I can add sbp-plugin-binaries.s3.eu-west-1.amazonaws.com

@abiosoft
Copy link
Owner

I would like to know if this is still the case for v0.5.0.

@paihu
Copy link

paihu commented Dec 15, 2022

Thanks.

Fixed in my environment.

@adrienthebo
Copy link

adrienthebo commented Feb 27, 2023

I've observed sporadic failures with golang.org; I'm running on a 2021 Mac M1 Silicon using the vz virtualization driver. This manifests when using the devcontainer cli to build workspace images.

 $ yq '.network.driver' "$(colima template --print)"
gvproxy
$ colima version
colima version 0.5.2
git commit: 6b5b6fe0540e708f0c9d6e8919fab292c671fc72

runtime: docker
arch: aarch64
client: v23.0.1
server: v20.10.20

@taylorchu
Copy link

this is still not fixed in 0.5.4

@rfay rfay changed the title DNS names that can't be resolved in Colima DNS names that can't be resolved in Colima, possibly only with gvproxy network driver Mar 16, 2023
@rfay rfay mentioned this issue Mar 29, 2023
5 tasks
@abiosoft
Copy link
Owner

I got bitten by this today as well and I can confirm it only happens with gvproxy network.

It appears some DNS queries fail for whatever reason.

I am still investigating.

@gpsa
Copy link

gpsa commented Mar 29, 2023

Same here:

When I:

nslookup test.s3-website-us-east-1.amazonaws.com

Server:		192.168.107.1
Address:	192.168.107.1:53

Non-authoritative answer:

**server can't find test.s3-website-us-east-1.amazonaws.com: NXDOMAIN**

But if I use Google's 8.8.8.8:

nslookup  test.s3-website-us-east-1.amazonaws.com 8.8.8.8
Server:		8.8.8.8
Address:	8.8.8.8:53

Non-authoritative answer:
test.s3-website-us-east-1.amazonaws.com	canonical name = s3-website.us-east-1.amazonaws.com

Non-authoritative answer:
test.s3-website-us-east-1.amazonaws.com	canonical name = s3-website.us-east-1.amazonaws.com
Name:	s3-website.us-east-1.amazonaws.com
Address: 52.217.87.195
Name:	s3-website.us-east-1.amazonaws.com
Address: 52.216.27.3
Name:	s3-website.us-east-1.amazonaws.com
Address: 52.216.98.42
Name:	s3-website.us-east-1.amazonaws.com
Address: 52.216.243.67
Name:	s3-website.us-east-1.amazonaws.com
Address: 52.217.140.13
Name:	s3-website.us-east-1.amazonaws.com
Address: 52.216.57.53
Name:	s3-website.us-east-1.amazonaws.com
Address: 52.217.10.139
Name:	s3-website.us-east-1.amazonaws.com
Address: 52.217.137.173

If I change nw driver for: slirp then now the problem is that host.docker.internal is being resolved via /etc/hosts but I need to be resolved via DNS Lookup:

nslookup host.docker.internal
Server:		127.0.0.11
Address:	127.0.0.11:53

** server can't find host.docker.internal: NXDOMAIN

** server can't find host.docker.internal: NXDOMAIN

@abiosoft
Copy link
Owner

If I change nw driver for: slirp then now the problem is that host.docker.internal is being resolved via /etc/hosts but I need to be resolved via DNS Lookup

@gpsa can you kindly open another issue for this? This is likely a bug.

@gpsa
Copy link

gpsa commented Mar 29, 2023

If I change nw driver for: slirp then now the problem is that host.docker.internal is being resolved via /etc/hosts but I need to be resolved via DNS Lookup

@gpsa can you kindly open another issue for this? This is likely a bug.

I could, but just to clarify, is the slirp driver expected to resolve host.docker.internal via DNS Lookup?

@abiosoft
Copy link
Owner

I could, but just to clarify, is the slirp driver expected to resolve host.docker.internal via DNS Lookup?

@gpsa I suspect your issue was changing the network driver of an existing VM.

This is what I get for slirp, it uses DNS lookup as well.

nslookup host.docker.internal
Server:		192.168.5.3
Address:	192.168.5.3:53

Non-authoritative answer:
Name:	host.docker.internal
Address: 192.168.5.2

Non-authoritative answer:

@gpsa
Copy link

gpsa commented Mar 29, 2023

I could, but just to clarify, is the slirp driver expected to resolve host.docker.internal via DNS Lookup?

@gpsa I suspect your issue was changing the network driver of an existing VM.

This is what I get for slirp, it uses DNS lookup as well.

nslookup host.docker.internal
Server:		192.168.5.3
Address:	192.168.5.3:53

Non-authoritative answer:
Name:	host.docker.internal
Address: 192.168.5.2

Non-authoritative answer:

@abiosoft Is there a way to recreate it without destroying everything? I could try to see if by recreating would work

@abiosoft
Copy link
Owner

@abiosoft Is there a way to recreate it without destroying everything? I could try to see if by recreating would work

@gpsa yeah. It's a regression actually, used to work before. You can edit the /etc/resolv.conf file in the VM and set the nameserver IP to 192.168.5.3.

In fact, it is the only entry in the file so you can simply replace it

colima ssh -- sudo sh -c 'echo "nameserver 192.168.5.3" > /etc/resolv.conf'

@gpsa
Copy link

gpsa commented Mar 30, 2023

@abiosoft Is there a way to recreate it without destroying everything? I could try to see if by recreating would work

@gpsa yeah. It's a regression actually, used to work before. You can edit the /etc/resolv.conf file in the VM and set the nameserver IP to 192.168.5.3.

In fact, it is the only entry in the file so you can simply replace it

colima ssh -- sudo sh -c 'echo "nameserver 192.168.5.3" > /etc/resolv.conf'

@abiosoft thank you so much, that worked like a breeze. Now both internal Docker DNS and external domains work just fine on SLIRP.

@henrik242
Copy link

Could the DNS issues somehow be related to Alpine?

From https://martinheinz.dev/blog/92:

Usually, you would not notice this difference, because most of the time a single UDP packet (512 bytes) is enough to resolve hostnames... until it isn't enough and your application (running on Kubernetes) that previously worked completely fine for months suddenly starts throwing "Unknown Host" exceptions for one particular (very critical) hostname. The worst part is that this can manifest randomly, anytime when some external network change causes the resolution of some particular domain to require more than the 512 bytes available in single UDP packet.

@abiosoft
Copy link
Owner

Could the DNS issues somehow be related to Alpine?

@henrik242 I have actually read something similar before but I do not think this situation is related to Alpine, considering that slirp works fine.

As for why Alpine is the choice for Colima, you can check this comment #291 (comment).

@gpsa
Copy link

gpsa commented Mar 30, 2023

Could the DNS issues somehow be related to Alpine?

@henrik242 I have actually read something similar before but I do not think this situation is related to Alpine, considering that slirp works fine.

As for why Alpine is the choice for Colima, you can check this comment #291 (comment).

SLIRP mode is now "crashing" the same way Lima alone was behaving. So, basically the mounting points stop working and:
On the Host

docker ps
Cannot connect to the Docker daemon at unix:///Users/user/.colima/default/docker.sock. Is the docker daemon running?

@rfay
Copy link
Contributor Author

rfay commented Mar 30, 2023

@gpsa you're making a bit of a mess of this issue. Could you please open one that's on-topic for your issues?

@gpsa
Copy link

gpsa commented Apr 4, 2023

@gpsa you're making a bit of a mess of this issue. Could you please open one that's on-topic for your issues?

Sorry about that, I've then created a separated issue for the SLIRP one

@AndreasA
Copy link

AndreasA commented Apr 26, 2023

when starting colima with VZ vmtype and virtiofs and providing --dns 192.168.5.3 then AWS hostname resolution seems to fail as well. without it seems to work but results in the pulling speed issues #648 - no matter if slirp or gvproxy is used though i think for VZ vm type the network driver setting is probably ignored..

@taylorchu
Copy link

@abiosoft
https://wiki.musl-libc.org/functional-differences-from-glibc.html

Multiple reports on weird musl dns incompatibility with glibc. I think it is safer to use base image like debian for this.

@ryancurrah
Copy link
Contributor

I would like to use Debian as well to see if it resolves this issue for us. Is that possible?

@gchait
Copy link

gchait commented Aug 9, 2023

After some messing around, this seems to be the fix:

colima delete
colima start --edit

Change gvproxy to slirp.
With such a limitation/bug, I wonder why it's not the default.

@mandrasch
Copy link

If anyone wants to switch, the following should also possible

colima start --edit
# change value with "i" insert mode, switch to slirp
# save via ":wq:"

Or edit ~/.colima/default/colima.yaml and re-start colima via colima stop and colima start.

No need for colima delete (as far as I know).

@gchait
Copy link

gchait commented Aug 9, 2023

If anyone wants to switch, the following should also possible

colima start --edit
# change value with "i" insert mode, switch to slirp
# save via ":wq:"

Or edit ~/.colima/default/colima.yaml and re-start colima via colima stop and colima start.

No need for colima delete (as far as I know).

For me, after simply restarting nothing seemed to be working.
To be more specific, a docker build failed right at the beginning, because it could not even resolve registry-1.docker.io. It was an i/o timeout right there, suggesting all/most networking was broken in the VM.
I got the idea for the delete from here.

@mandrasch
Copy link

mandrasch commented Aug 9, 2023

Hi! I started with colima version 0.5.5 two months ago and changing the config + restart worked fine for me today (without deleting).

@rfay just mentioned in DDEV discord the following:

If you have had your colima instance through many updates, it's a worthwhile thing to delete it and recreate it. (After saving away databases of course via ddev snapshot -a)

So depends on how many updates happened in the meantime I guess?

@AndreasA
Copy link

AndreasA commented Aug 9, 2023

Change gvproxy to slirp. With such a limitation/bug, I wonder why it's not the default.

Just wondering, but colima start --network-driver slirp should work as well, shouldn't it? It would be easier to use in a command for setup (no need to search/replace in the config file).

Though the last time I tried it, it made no difference with virtual machine type vz, but I admit I did not delete the instance, so maybe that helps, though not sure if the network driver is even relevant for vz but it is worth a try.

@mandrasch
Copy link

Just wondering, but colima start --network-driver slirp should work as well, shouldn't it? It would be easier to use in a command for setup (no need to search/replace in the config file).

Does this replace and save things in the current configuration before starting? Would be cool! (I'll try later, thanks for hint).

@skirsdeda
Copy link

I hit this while running a container which does a lot of AWS service requests. DNS resolution would fail after some time when using vz vm, then subsequent run would fail almost immediately and only colima restart helped to get more time without DNS failures. And with qemu and slirp network driver it was actually even worse. So I resorted to Docker Desktop which runs without problems. Sad.

@admxxi
Copy link

admxxi commented Oct 25, 2023

Same here, having issue while using vmType: vz and network drivers gvproxy or slirp still getting loads of error while trying to solve DNS, but I would say 50% of the requests fail.

@AndreasA
Copy link

AndreasA commented Nov 2, 2023

Hi, just wondering but which lima version are you using because #648 seems to be fixed - at least it looks like it so far - with the latest lima 0.18.x update and it was related to DNS as well, so it might also fix these issues?

@jdmarshall
Copy link

jdmarshall commented Dec 12, 2023

I'm still getting connection refused on 127.0.0.11

I don't have a local dns server on dev machines and I can't figure out what the solution is here. How do we avoid this?

The latest version of Colima doesn't even have a driver field in the yaml file and I'm still having this problem.

@xuwhite
Copy link

xuwhite commented Aug 3, 2024

I'm still getting connection refused on 127.0.0.11

I don't have a local dns server on dev machines and I can't figure out what the solution is here. How do we avoid this?

The latest version of Colima doesn't even have a driver field in the yaml file and I'm still having this problem.

same here
sadly the only workaround that works for me is to add a dns address with colima start --dns 8.8.8.8 or in the config file ~/.colima/default/colima.yaml if the dns changes I have to restart colima colima restart to make the dns work again
see #711

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests