Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bug: dns over tls timing out on latest image (TLS handshake) #2533

Open
Dreadwolf91 opened this issue Oct 20, 2024 · 60 comments
Open

Bug: dns over tls timing out on latest image (TLS handshake) #2533

Dreadwolf91 opened this issue Oct 20, 2024 · 60 comments

Comments

@Dreadwolf91
Copy link

Dreadwolf91 commented Oct 20, 2024

Is this urgent?

No

Host OS

Ubuntu 64-bit

CPU arch

x86_64

VPN service provider

Surfshark

What are you using to run the container

docker-compose

What is the version of Gluetun

v3.39.1

What's the problem 🤔

When using the latest image i get no internet connection. I don't know what the exact problem is but when i use for example v3.39.0 everything works fine.

Share your logs (at least 10 lines)

========================================
========================================
=============== gluetun ================
========================================
=========== Made with ❤️ by ============
======= https://github.com/qdm12 =======
========================================
========================================

Running version latest built on 2024-09-29T18:12:41.313Z (commit 7ebbaf4)

📣 All control server routes will become private by default after the v3.41.0 release

🔧 Need help? ☕ Discussion? https://github.com/qdm12/gluetun/discussions/new/choose
🐛 Bug? ✨ New feature? https://github.com/qdm12/gluetun/issues/new/choose
💻 Email? quentin.mcgaw@gmail.com
💰 Help me? https://www.paypal.me/qmcgaw https://github.com/sponsors/qdm12
2024-10-20T23:58:07+02:00 INFO [routing] default route found: interface eth0, gateway 172.20.0.1, assigned IP 172.20.0.2 and family v4
2024-10-20T23:58:07+02:00 INFO [routing] local ethernet link found: eth0
2024-10-20T23:58:07+02:00 INFO [routing] local ipnet found: 172.20.0.0/16
2024-10-20T23:58:07+02:00 INFO [firewall] enabling...
2024-10-20T23:58:07+02:00 INFO [firewall] enabled successfully
2024-10-20T23:58:07+02:00 INFO [storage] merging by most recent 20553 hardcoded servers and 18299 servers read from /gluetun/servers.json
2024-10-20T23:58:07+02:00 INFO Alpine version: 3.20.3
2024-10-20T23:58:07+02:00 INFO OpenVPN 2.5 version: 2.5.10
2024-10-20T23:58:07+02:00 INFO OpenVPN 2.6 version: 2.6.11
2024-10-20T23:58:07+02:00 INFO IPtables version: v1.8.10
2024-10-20T23:58:07+02:00 INFO Settings summary:
├── VPN settings:
|   ├── VPN provider settings:
|   |   ├── Name: surfshark
|   |   └── Server selection settings:
|   |       ├── VPN type: openvpn
|   |       ├── Countries: Switzerland, Spain, Slovakia, Slovenia
|   |       └── OpenVPN server selection settings:
|   |           └── Protocol: UDP
|   └── OpenVPN settings:
|       ├── OpenVPN version: 2.6
|       ├── User: [set]
|       ├── Password: [set]
|       ├── Network interface: tun0
|       ├── Run OpenVPN as: root
|       └── Verbosity level: 1
├── DNS settings:
|   ├── Keep existing nameserver(s): no
|   ├── DNS server address to use: 127.0.0.1
|   └── DNS over TLS settings:
|       ├── Enabled: yes
|       ├── Update period: every 24h0m0s
|       ├── Upstream resolvers:
|       |   └── cloudflare
|       ├── Caching: yes
|       ├── IPv6: no
|       └── DNS filtering settings:
|           ├── Block malicious: yes
|           ├── Block ads: no
|           ├── Block surveillance: no
|           └── Blocked IP networks:
|               ├── 127.0.0.1/8
|               ├── 10.0.0.0/8
|               ├── 172.16.0.0/12
|               ├── 192.168.0.0/16
|               ├── 169.254.0.0/16
|               ├── ::1/128
|               ├── fc00::/7
|               ├── fe80::/10
|               ├── ::ffff:127.0.0.1/104
|               ├── ::ffff:10.0.0.0/104
|               ├── ::ffff:169.254.0.0/112
|               ├── ::ffff:172.16.0.0/108
|               └── ::ffff:192.168.0.0/112
├── Firewall settings:
|   └── Enabled: yes
├── Log settings:
|   └── Log level: info
├── Health settings:
|   ├── Server listening address: 127.0.0.1:9999
|   ├── Target address: cloudflare.com:443
|   ├── Duration to wait after success: 5s
|   ├── Read header timeout: 100ms
|   ├── Read timeout: 500ms
|   └── VPN wait durations:
|       ├── Initial duration: 6s
|       └── Additional duration: 5s
├── Shadowsocks server settings:
|   └── Enabled: no
├── HTTP proxy settings:
|   └── Enabled: no
├── Control server settings:
|   ├── Listening address: :8000
|   ├── Logging: yes
|   └── Authentication file path: /gluetun/auth/config.toml
├── Storage settings:
|   └── Filepath: /gluetun/servers.json
├── OS Alpine settings:
|   ├── Process UID: 1000
|   ├── Process GID: 1000
|   └── Timezone: redacted
├── Public IP settings:
|   ├── Fetching: every 12h0m0s
|   ├── IP file path: /tmp/gluetun/ip
|   └── Public IP data API: ipinfo
├── Server data updater settings:
|   ├── Update period: 24h0m0s
|   ├── DNS address: 1.1.1.1:53
|   ├── Minimum ratio: 0.8
|   └── Providers to update: surfshark
└── Version settings:
    └── Enabled: yes
2024-10-20T23:58:07+02:00 INFO [routing] default route found: interface eth0, gateway 172.20.0.1, assigned IP 172.20.0.2 and family v4
2024-10-20T23:58:07+02:00 INFO [routing] adding route for 0.0.0.0/0
2024-10-20T23:58:07+02:00 INFO [firewall] setting allowed subnets...
2024-10-20T23:58:07+02:00 INFO [routing] default route found: interface eth0, gateway 172.20.0.1, assigned IP 172.20.0.2 and family v4
2024-10-20T23:58:07+02:00 INFO [dns] using plaintext DNS at address 1.1.1.1
2024-10-20T23:58:07+02:00 INFO [http server] http server listening on [::]:8000
2024-10-20T23:58:07+02:00 INFO [healthcheck] listening on 127.0.0.1:9999
2024-10-20T23:58:07+02:00 INFO [firewall] allowing VPN connection...
2024-10-20T23:58:07+02:00 INFO [openvpn] OpenVPN 2.6.11 x86_64-alpine-linux-musl [SSL (OpenSSL)] [LZO] [LZ4] [EPOLL] [MH/PKTINFO] [AEAD]
2024-10-20T23:58:07+02:00 INFO [openvpn] library versions: OpenSSL 3.3.2 3 Sep 2024, LZO 2.10
2024-10-20T23:58:07+02:00 INFO [openvpn] TCP/UDP: Preserving recently used remote address: [AF_INET]89.37.95.212:1194
2024-10-20T23:58:07+02:00 INFO [openvpn] UDPv4 link local: (not bound)
2024-10-20T23:58:07+02:00 INFO [openvpn] UDPv4 link remote: [AF_INET]89.37.95.212:1194
2024-10-20T23:58:08+02:00 INFO [openvpn] [es-mad-v055.prod.surfshark.com] Peer Connection Initiated with [AF_INET]89.37.95.212:1194
2024-10-20T23:58:09+02:00 ERROR [openvpn] Unrecognized option or missing or extra parameter(s) in [PUSH-OPTIONS]:7: block-outside-dns (2.6.11)
2024-10-20T23:58:09+02:00 INFO [openvpn] TUN/TAP device tun0 opened
2024-10-20T23:58:09+02:00 INFO [openvpn] /sbin/ip link set dev tun0 up mtu 1500
2024-10-20T23:58:09+02:00 INFO [openvpn] /sbin/ip link set dev tun0 up
2024-10-20T23:58:09+02:00 INFO [openvpn] /sbin/ip addr add dev tun0 10.8.8.6/24
2024-10-20T23:58:09+02:00 INFO [openvpn] UID set to nonrootuser
2024-10-20T23:58:09+02:00 INFO [openvpn] Initialization Sequence Completed
2024-10-20T23:58:09+02:00 INFO [dns] downloading hostnames and IP block lists
2024-10-20T23:58:09+02:00 INFO [healthcheck] healthy!
2024-10-20T23:58:24+02:00 WARN [dns] cannot update filter block lists: Get "https://raw.githubusercontent.com/qdm12/files/master/malicious-hostnames.updated": context deadline exceeded (Client.Timeout exceeded while awaiting headers), context deadline exceeded (Client.Timeout or context cancellation while reading body)
2024-10-20T23:58:24+02:00 INFO [dns] attempting restart in 10s
2024-10-20T23:58:25+02:00 INFO [ip getter] Public IP address is 89.37.95.213 (Spain, Madrid, Madrid)
2024-10-20T23:58:34+02:00 INFO [dns] downloading hostnames and IP block lists
2024-10-20T23:58:40+02:00 ERROR [vpn] cannot get version information: context deadline exceeded (Client.Timeout or context cancellation while reading body)
2024-10-20T23:58:49+02:00 WARN [dns] cannot update filter block lists: Get "https://raw.githubusercontent.com/qdm12/files/master/malicious-hostnames.updated": context deadline exceeded (Client.Timeout exceeded while awaiting headers), Get "https://raw.githubusercontent.com/qdm12/files/master/malicious-ips.updated": context deadline exceeded (Client.Timeout exceeded while awaiting headers)
2024-10-20T23:58:49+02:00 INFO [dns] attempting restart in 20s
2024-10-20T23:59:09+02:00 INFO [dns] downloading hostnames and IP block lists
2024-10-20T23:59:24+02:00 WARN [dns] cannot update filter block lists: Get "https://raw.githubusercontent.com/qdm12/files/master/malicious-hostnames.updated": context deadline exceeded (Client.Timeout exceeded while awaiting headers), Get "https://raw.githubusercontent.com/qdm12/files/master/malicious-ips.updated": context deadline exceeded (Client.Timeout exceeded while awaiting headers)
2024-10-20T23:59:24+02:00 INFO [dns] attempting restart in 40s
2024-10-21T00:00:04+02:00 INFO [dns] downloading hostnames and IP block lists
2024-10-21T00:00:19+02:00 WARN [dns] cannot update filter block lists: Get "https://raw.githubusercontent.com/qdm12/files/master/malicious-hostnames.updated": context deadline exceeded (Client.Timeout exceeded while awaiting headers), Get "https://raw.githubusercontent.com/qdm12/files/master/malicious-ips.updated": context deadline exceeded (Client.Timeout exceeded while awaiting headers)
2024-10-21T00:00:19+02:00 INFO [dns] attempting restart in 1m20s
2024-10-21T00:01:39+02:00 INFO [dns] downloading hostnames and IP block lists
2024-10-21T00:01:49+02:00 WARN [dns] cannot update filter block lists: Get "https://raw.githubusercontent.com/qdm12/files/master/malicious-hostnames.updated": net/http: TLS handshake timeout, Get "https://raw.githubusercontent.com/qdm12/files/master/malicious-ips.updated": net/http: TLS handshake timeout
2024-10-21T00:01:49+02:00 INFO [dns] attempting restart in 2m40s
2024-10-21T00:04:29+02:00 INFO [dns] downloading hostnames and IP block lists
2024-10-21T00:04:39+02:00 WARN [dns] cannot update filter block lists: Get "https://raw.githubusercontent.com/qdm12/files/master/malicious-hostnames.updated": net/http: TLS handshake timeout, Get "https://raw.githubusercontent.com/qdm12/files/master/malicious-ips.updated": net/http: TLS handshake timeout
...

Share your configuration

gluetun:
    env_file:
      - ../.env-global
    image: qmcgaw/gluetun
    container_name: gluetun

    cap_add:
      - NET_ADMIN
    devices:
      - /dev/net/tun:/dev/net/tun
    ports:
      - 8085:8085
      - 5800:5800
      - 8989:8989
      - 7878:7878
      - 9696:9696
      - 6767:6767

    volumes:
      - ./gluetun/:/gluetun
    environment:
      - VPN_SERVICE_PROVIDER=surfshark
      - VPN_TYPE=openvpn
      - OPENVPN_USER=${OPENVPN_USER}
      - OPENVPN_PASSWORD=${OPENVPN_PASSWORD}
      - SERVER_COUNTRIES=Switzerland,Spain,Slovakia,Slovenia
      - UPDATER_PERIOD=24h
    restart: unless-stopped
Copy link
Contributor

@qdm12 is more or less the only maintainer of this project and works on it in his free time.
Please:

@epic0421
Copy link

epic0421 commented Oct 21, 2024

I have a similar (and probably related) bug. Also using Surfshark. For me though, explicitly setting it to version 3.39.1 works but setting it to the latest seems to make it break.

gluetun      | 2024-10-20T23:54:47-07:00 WARN [dns] exchanging over DoT connection: read tcp 10.14.0.2:38422->1.0.0.1:853: i/o timeout
gluetun      | 2024-10-20T23:54:47-07:00 WARN [dns] exchanging over DoT connection: read tcp 10.14.0.2:51636->1.1.1.1:853: i/o timeout
...
gluetun      | 2024-10-20T23:54:53-07:00 WARN [dns] exchanging over DoT connection: read tcp 10.14.0.2:38580->1.0.0.1:853: i/o timeout
gluetun      | 2024-10-20T23:54:53-07:00 ERROR [vpn] getting public IP address information: fetching information: Get "https://ipinfo.io/": dial tcp: lookup ipinfo.io on 127.0.0.1:53: server misbehaving
gluetun      | 2024-10-20T23:54:53-07:00 WARN [dns] exchanging over DoT connection: read tcp 10.14.0.2:51770->1.1.1.1:853: i/o timeout
gluetun      | 2024-10-20T23:54:53-07:00 WARN [dns] exchanging over DoT connection: read tcp 10.14.0.2:51784->1.1.1.1:853: i/o timeout
...
gluetun      | 2024-10-20T23:54:56-07:00 WARN [dns] exchanging over DoT connection: read tcp 10.14.0.2:44474->1.0.0.1:853: i/o timeout
gluetun      | 2024-10-20T23:54:56-07:00 WARN [dns] exchanging over DoT connection: read tcp 10.14.0.2:44760->1.1.1.1:853: i/o timeout
gluetun      | 2024-10-20T23:54:57-07:00 INFO [healthcheck] program has been unhealthy for 11s: restarting VPN
gluetun      | 2024-10-20T23:54:57-07:00 INFO [healthcheck] 👉 See https://github.com/qdm12/gluetun-wiki/blob/main/faq/healthcheck.md
gluetun      | 2024-10-20T23:54:57-07:00 INFO [healthcheck] DO NOT OPEN AN ISSUE UNLESS YOU READ AND TRIED EACH POSSIBLE SOLUTION
gluetun      | 2024-10-20T23:54:57-07:00 INFO [vpn] stopping
gluetun      | 2024-10-20T23:54:57-07:00 WARN [dns] exchanging over DoT connection: read tcp 10.14.0.2:44476->1.0.0.1:853: i/o timeout
gluetun      | 2024-10-20T23:54:57-07:00 INFO [vpn] starting
gluetun      | 2024-10-20T23:54:57-07:00 INFO [firewall] allowing VPN connection...
gluetun      | 2024-10-20T23:54:57-07:00 INFO [wireguard] Using available kernelspace implementation
gluetun      | 2024-10-20T23:54:57-07:00 INFO [wireguard] Connecting to ###########
gluetun      | 2024-10-20T23:54:57-07:00 INFO [wireguard] Wireguard setup is complete. Note Wireguard is a silent protocol and it may or may not work, without giving any error message. Typically i/o timeout errors indicate the Wireguard connection is not working.
gluetun      | 2024-10-20T23:54:57-07:00 WARN [dns] exchanging over DoT connection: read tcp 10.14.0.2:44776->1.1.1.1:853: i/o timeout
gluetun      | 2024-10-20T23:54:57-07:00 WARN [dns] exchanging over DoT connection: read tcp 10.14.0.2:44490->1.0.0.1:853: i/o timeout
...
gluetun      | 2024-10-20T23:55:05-07:00 WARN [dns] exchanging over DoT connection: read tcp 10.14.0.2:46218->1.0.0.1:853: i/o timeout
gluetun      | 2024-10-20T23:55:05-07:00 WARN [dns] exchanging over DoT connection: read tcp 10.14.0.2:43486->1.1.1.1:853: i/o timeout
gluetun      | 2024-10-20T23:55:05-07:00 ERROR [vpn] getting public IP address information: fetching information: Get "https://ipinfo.io/": dial tcp: lookup ipinfo.io on 127.0.0.1:53: server misbehaving

@haitham506
Copy link

haitham506 commented Oct 22, 2024

I have a similar (and probably related) bug. Also using Surfshark. For me though, explicitly setting it to version 3.39.1 works but

I have the same issue too

@Dreadwolf91
Copy link
Author

Dreadwolf91 commented Oct 22, 2024

is it surfshark for you too ?

@haitham506
Copy link

haitham506 commented Oct 23, 2024

v3.39 works fine but latest doesn't work

@frepke
Copy link
Collaborator

frepke commented Oct 24, 2024

I have the same issue too

Same issue for me with Surfshark/wireguard

But when I run the latest, my log says You are running 2 commits behind the most recent latest
When I run v3.39.1, the log says You are running the latest release v3.39.1

@screamjojo

This comment was marked as off-topic.

@frepke
Copy link
Collaborator

frepke commented Oct 25, 2024

@screamjojo you can try to solve the problem by yourself for now. Try a specific version tag instead of the latest tag (image: ghcr.io/qdm12/gluetun:v3.39.1 is working for me). When you don't have time to check a changelog, or check the container log, it's probably not advisable to run with the latest tag all the time because it's always possible that somethings broke (because @qdm12 does everything at his own and he simply cannot check everything after every change he makes).

I don't think you check your logs often, otherwise those warning should have your attention and should be solved allready to:

WARN You are using the old environment variable VPN_ENDPOINT_PORT, please consider changing it to WIREGUARD_ENDPOINT_PORT
WARN You are using the old environment variable VPN_ENDPOINT_IP, please consider changing it to WIREGUARD_ENDPOINT_IP
WARN You are using the old environment variable VPN_ENDPOINT_PORT, please consider changing it to OPENVPN_ENDPOINT_PORT
WARN You are using the old environment variable VPN_ENDPOINT_IP, please consider changing it to OPENVPN_ENDPOINT_IP

So, you could solve your problem by changing the version and wait for @qdm12 to solve the problem in a later update.

kr.,
Patrick

@screamjojo

This comment was marked as off-topic.

@qdm12
Copy link
Owner

qdm12 commented Nov 1, 2024

Hello there, thanks @frepke for the help! By the way @frepke are you using surfshark as well? Does it work for both v3.39.1 and the latest image?

The v3.39.1 should closely work the same as v3.39.0, but the latest image has substantial changes especially the dns server/forwarder is completely changed, so that could be a reason? Maybe try with DOT=off on the latest image?

Regarding

But when I run the latest, my log says You are running 2 commits behind the most recent latest

This happens when the last commits are not triggering an image build, for example documentation or development setup commits. I could eventually fix it, but it does rarely happen 😉

Ps: Also just rechecked it works fine on my side with Mullvad wireguard for the sake of narrowing this down

@qdm12 qdm12 changed the title Bug: Dont get a connection when using the current image, on v3.29.0 it works tho Bug: surfshark latest image not working Nov 1, 2024
@qdm12

This comment was marked as off-topic.

@frepke
Copy link
Collaborator

frepke commented Nov 2, 2024

Yeah, still using Surfshark (unfortunately AdguardVPN isn't working with Gluetun 😔)

  • v3.39.1 is running with DOT=on and DOT=off
  • latest is only running when DOT=off

If I have to check/test something, let me know 😉

@the-jeffski
Copy link

I'm having the same issue with Surfshark - v3.39 tag works fine, beyond does not and I get the same. Using Wireguard as the protocol.

@epic0421
Copy link

epic0421 commented Nov 2, 2024

I can say that I also see this behavior

@qdm12
Copy link
Owner

qdm12 commented Nov 2, 2024

Reading all this all over again, there seem to be 2 issues, most likely unrelated:


@Dreadwolf91

These two errors

context deadline exceeded (Client.Timeout exceeded while awaiting headers), context deadline exceeded (Client.Timeout or context cancellation while reading body)
net/http: TLS handshake timeout

Despite the VPN connection actually working to get the public IP address and the TCP dial to cloudflare.com (aka health check):

2024-10-20T23:58:09+02:00 INFO [healthcheck] healthy!
2024-10-20T23:58:24+02:00 WARN [dns] cannot update filter block lists: Get "https://raw.githubusercontent.com/qdm12/files/master/malicious-hostnames.updated": context deadline exceeded (Client.Timeout exceeded while awaiting headers), context deadline exceeded (Client.Timeout or context cancellation while reading body)
2024-10-20T23:58:24+02:00 INFO [dns] attempting restart in 10s
2024-10-20T23:58:25+02:00 INFO [ip getter] Public IP address is 89.37.95.213 (Spain, Madrid, Madrid)

I've seen this behavior, and it's most likely due to your MTU, so either try:

  1. fiddling with OPENVPN_MSSFIX (see openvpn mssfix option)
  2. move to use Wireguard, and, maybe, fiddle with WIREGUARD_MTU

Also please double check if you can make it work with the image tag :v3.39 (and not v3.29 as you mentioned).
This is very unrelated to the other issue below, and has near 100% chance nothing to do with the DNS forwarder code.


@epic0421 @haitham506 @frepke @the-jeffski (and more to come likely):

It looks like your error is really just/mostly exchanging over DoT connection: read tcp localip:localport->1.0.0.1:853: i/o timeout for example, indicating the Cloudflare (1.1.1.1 and 1.0.0.1) DNS server just doesn't reply back over dns over tls for whatever reason.

Now a few things on this:

  • The new DNS forwarder is quite verbose on i/o timeout errors, whereas the previous (unbound) would not log them out.
  • On my side (mullvad+wireguard) I spoke a bit too fast saying it was working fine. It does in a way, but does internally restart the VPN quite a bit (4 times in 24 hours). Is this the problem you face as well? Or is it always failing and never able to setup a connection at all?

PS: what you can try is the following to see if it works outside the custom DNS forwarder code:

docker exec gluetun apk add knot-utils
docker exec kdig -d @1.1.1.1 +tls-ca +tls-host=cloudflare-dns.com github.com 

This would run a DNS over TLS query to cloudflare (1.1.1.1) to resolve github.com: does this work when gluetun fails to resolve things?

@frepke
Copy link
Collaborator

frepke commented Nov 2, 2024

For me, with DOT=on with v3.39.1, it's not possible to setup a connection at all.

@epic0421
Copy link

epic0421 commented Nov 2, 2024

For me, v3.39.1 works fine (DOT on/off).
Latest fails to establish a connection and spams that error message repeatedly when DOT is on.

@epic0421
Copy link

epic0421 commented Nov 2, 2024

Actually now that I am testing it further, the connection does get established and is initially healthy, but becomes unhealthy very quickly, and then becomes healthy about a minute later. That error message keeps getting spammed though.

ver tls connection: read tcp 10.14.0.2:50528->1.0.0.1:853: i/o timeout
gluetun      | 2024-11-02T12:28:54-07:00 WARN [dns] exchanging over dns over tls connection: read tcp 10.14.0.2:50540->1.0.0.1:853: i/o timeout
gluetun      | 2024-11-02T12:28:54-07:00 ERROR [dns] stopping DoT server: stopping DNS udp server: context deadline exceeded
gluetun      | 2024-11-02T12:28:54-07:00 INFO [dns] falling back on plaintext DNS at address 1.1.1.1
gluetun      | 2024-11-02T12:28:54-07:00 WARN [dns] DNS is not working: after 10 tries: lookup github.com on 127.0.0.1:53: server misbehaving
gluetun      | 2024-11-02T12:28:54-07:00 INFO [dns] attempting restart in 10s
gluetun      | 2024-11-02T12:28:54-07:00 INFO [ip getter] Public IP address is ##### (####### - source: ipinfo)
gluetun      | 2024-11-02T12:28:55-07:00 INFO [vpn] You are running on the bleeding edge of latest!
gluetun      | 2024-11-02T12:28:56-07:00 WARN [dns] exchanging over dns over tls connection: read tcp 10.14.0.2:50960->1.1.1.1:853: i/o timeout
gluetun      | 2024-11-02T12:28:58-07:00 INFO [healthcheck] healthy!
gluetun      | 2024-11-02T12:29:04-07:00 INFO [dns] downloading hostnames and IP block lists
gluetun      | 2024-11-02T12:29:05-07:00 INFO [dns] DNS server listening on [::]:53
gluetun      | 2024-11-02T12:29:07-07:00 WARN [dns] exchanging over dns over tls connection: read tcp 10.14.0.2:34758->1.1.1.1:853: i/o timeout
gluetun      | 2024-11-02T12:29:07-07:00 WARN [dns] exchanging over dns over tls connection: read tcp 10.14.0.2:34744->1.1.1.1:853: i/o timeout

At the end, it does this and then the error messages stop. It then starts doing it again, making the container unhealthy and the cycle repeats.

gluetun      | 2024-11-02T12:30:33-07:00 WARN [dns] exchanging over dns over tls connection: read tcp 10.14.0.2:51990->1.0.0.1:853: i/o timeout
gluetun      | 2024-11-02T12:30:33-07:00 WARN [dns] exchanging over dns over tls connection: read tcp 10.14.0.2:51984->1.0.0.1:853: i/o timeout
gluetun      | 2024-11-02T12:30:34-07:00 WARN [dns] exchanging over dns over tls connection: read tcp 10.14.0.2:46982->1.1.1.1:853: i/o timeout
gluetun      | 2024-11-02T12:30:35-07:00 ERROR [dns] stopping DoT server: stopping DNS udp server: context deadline exceeded
gluetun      | 2024-11-02T12:30:35-07:00 INFO [dns] falling back on plaintext DNS at address 1.1.1.1
gluetun      | 2024-11-02T12:30:35-07:00 WARN [dns] DNS is not working: after 10 tries: lookup github.com on 127.0.0.1:53: server misbehaving
gluetun      | 2024-11-02T12:30:35-07:00 INFO [dns] attempting restart in 20s
gluetun      | 2024-11-02T12:30:36-07:00 WARN [dns] exchanging over dns over tls connection: read tcp 10.14.0.2:51998->1.0.0.1:853: i/o timeout
gluetun      | 2024-11-02T12:30:39-07:00 INFO [healthcheck] healthy!
gluetun      | 2024-11-02T12:30:55-07:00 INFO [dns] downloading hostnames and IP block lists
gluetun      | 2024-11-02T12:30:55-07:00 INFO [dns] DNS server listening on [::]:53
gluetun      | 2024-11-02T12:30:57-07:00 WARN [dns] exchanging over dns over tls connection: read tcp 10.14.0.2:32772->1.0.0.1:853: i/o timeout
gluetun      | 2024-11-02T12:30:57-07:00 WARN [dns] exchanging over dns over tls connection: read tcp 10.14.0.2:32768->1.0.0.1:853: i/o timeout

@haitham506
Copy link

haitham506 commented Nov 2, 2024

The connection gets established (healthy) and than becomes (unhealthy) after seconds, it restarted 6 times after that it stayed connected but the dns errors keeps showing up but not spammed.

:latest

========================================
========================================
=============== gluetun ================
========================================
=========== Made with ❤️ by ============
======= https://github.com/qdm12 =======
========================================
========================================

Running version latest built on 2024-10-28T09:25:35.847Z (commit f1f3472)

📣 All control server routes will become private by default after the v3.41.0 release

🔧 Need help? ☕ Discussion? https://github.com/qdm12/gluetun/discussions/new/choose
🐛 Bug? ✨ New feature? https://github.com/qdm12/gluetun/issues/new/choose
💻 Email? quentin.mcgaw@gmail.com
💰 Help me? https://www.paypal.me/qmcgaw https://github.com/sponsors/qdm12
2024-11-02T22:56:18+00:00 INFO [routing] default route found: interface eth0, gateway 172.27.0.1, assigned IP 172.27.0.2 and family v4
2024-11-02T22:56:18+00:00 INFO [routing] local ethernet link found: eth0
2024-11-02T22:56:18+00:00 INFO [routing] local ipnet found: 172.27.0.0/16
2024-11-02T22:56:19+00:00 INFO [firewall] enabling...
2024-11-02T22:56:19+00:00 INFO [firewall] enabled successfully
2024-11-02T22:56:20+00:00 INFO [storage] creating /gluetun/servers.json with 20553 hardcoded servers
2024-11-02T22:56:21+00:00 INFO Alpine version: 3.20.3
2024-11-02T22:56:21+00:00 INFO OpenVPN 2.5 version: 2.5.10
2024-11-02T22:56:21+00:00 INFO OpenVPN 2.6 version: 2.6.11
2024-11-02T22:56:21+00:00 INFO IPtables version: v1.8.10
2024-11-02T22:56:21+00:00 INFO Settings summary:
├── VPN settings:
|   ├── VPN provider settings:
|   |   ├── Name: surfshark
|   |   └── Server selection settings:
|   |       ├── VPN type: wireguard
|   |       ├── Countries: ####
|   |       └── Wireguard selection settings:
|   └── Wireguard settings:
|       ├── Private key: #####
|       ├── Interface addresses:
|       |   └── 10.14.0.2/16
|       ├── Allowed IPs:
|       |   ├── 0.0.0.0/0
|       |   └── ::/0
|       └── Network interface: tun0
|           └── MTU: 1420
├── DNS settings:
|   ├── Keep existing nameserver(s): no
|   ├── DNS server address to use: 127.0.0.1
|   └── DNS over TLS settings:
|       ├── Enabled: yes
|       ├── Update period: every 24h0m0s
|       ├── Upstream resolvers:
|       |   ├── cloudflare
|       |   ├── google
|       |   └── quad9
|       ├── Caching: yes
|       ├── IPv6: no
|       └── DNS filtering settings:
|           ├── Block malicious: yes
|           ├── Block ads: no
|           ├── Block surveillance: no
|           └── Blocked IP networks:
|               ├── 127.0.0.1/8
|               ├── 10.0.0.0/8
|               ├── 172.16.0.0/12
|               ├── 192.168.0.0/16
|               ├── 169.254.0.0/16
|               ├── ::1/128
|               ├── fc00::/7
|               ├── fe80::/10
|               ├── ::ffff:127.0.0.1/104
|               ├── ::ffff:10.0.0.0/104
|               ├── ::ffff:169.254.0.0/112
|               ├── ::ffff:172.16.0.0/108
|               └── ::ffff:192.168.0.0/112
├── Firewall settings:
|   └── Enabled: yes
├── Log settings:
|   └── Log level: info
├── Health settings:
|   ├── Server listening address: 127.0.0.1:9999
|   ├── Target address: cloudflare.com:443
|   ├── Duration to wait after success: 5s
|   ├── Read header timeout: 100ms
|   ├── Read timeout: 500ms
|   └── VPN wait durations:
|       ├── Initial duration: 6s
|       └── Additional duration: 5s
├── Shadowsocks server settings:
|   └── Enabled: no
├── HTTP proxy settings:
|   └── Enabled: no
├── Control server settings:
|   ├── Listening address: :8000
|   ├── Logging: yes
|   └── Authentication file path: /gluetun/auth/config.toml
├── Storage settings:
|   └── Filepath: /gluetun/servers.json
├── OS Alpine settings:
|   ├── Process UID: 1000
|   ├── Process GID: 1000
|   └── Timezone: ####
├── Public IP settings:
|   ├── IP file path: /tmp/gluetun/ip
|   ├── Public IP data base API: ipinfo
|   └── Public IP data backup APIs:
|       ├── ifconfigco
|       ├── ip2location
|       └── cloudflare
├── Server data updater settings:
|   ├── Update period: 24h0m0s
|   ├── DNS address: 1.1.1.1:53
|   ├── Minimum ratio: 0.8
|   └── Providers to update: surfshark
└── Version settings:
    └── Enabled: yes
2024-11-02T22:56:21+00:00 INFO [routing] default route found: interface eth0, gateway 172.27.0.1, assigned IP 172.27.0.2 and family v4
2024-11-02T22:56:21+00:00 INFO [routing] adding route for 0.0.0.0/0
2024-11-02T22:56:21+00:00 INFO [firewall] setting allowed subnets...
2024-11-02T22:56:21+00:00 INFO [routing] default route found: interface eth0, gateway 172.27.0.1, assigned IP 172.27.0.2 and family v4
2024-11-02T22:56:21+00:00 INFO [dns] using plaintext DNS at address 1.1.1.1
2024-11-02T22:56:21+00:00 INFO [http server] http server listening on [::]:8000
2024-11-02T22:56:21+00:00 INFO [firewall] allowing VPN connection...
2024-11-02T22:56:21+00:00 INFO [healthcheck] listening on 127.0.0.1:9999
2024-11-02T22:56:21+00:00 INFO [wireguard] Using available kernelspace implementation
2024-11-02T22:56:21+00:00 INFO [wireguard] Connecting to ####:51820
2024-11-02T22:56:21+00:00 INFO [wireguard] Wireguard setup is complete. Note Wireguard is a silent protocol and it may or may not work, without giving any error message. Typically i/o timeout errors indicate the Wireguard connection is not working.
2024-11-02T22:56:21+00:00 INFO [dns] downloading hostnames and IP block lists
2024-11-02T22:56:21+00:00 INFO [healthcheck] healthy!
2024-11-02T22:56:24+00:00 INFO [dns] DNS server listening on [::]:53
2024-11-02T22:56:26+00:00 WARN [dns] exchanging over dns over tls connection: read tcp 10.14.0.2:49004->149.112.112.112:853: i/o timeout
2024-11-02T22:56:26+00:00 WARN [dns] exchanging over dns over tls connection: read tcp 10.14.0.2:59252->1.0.0.1:853: i/o timeout
...
2024-11-02T22:56:33+00:00 WARN [dns] exchanging over dns over tls connection: read tcp 10.14.0.2:58824->1.0.0.1:853: i/o timeout
2024-11-02T22:56:34+00:00 INFO [healthcheck] program has been unhealthy for 6s: restarting VPN
2024-11-02T22:56:34+00:00 INFO [healthcheck] 👉 See https://github.com/qdm12/gluetun-wiki/blob/main/faq/healthcheck.md
2024-11-02T22:56:34+00:00 INFO [healthcheck] DO NOT OPEN AN ISSUE UNLESS YOU READ AND TRIED EACH POSSIBLE SOLUTION
2024-11-02T22:56:34+00:00 INFO [vpn] stopping
2024-11-02T22:56:34+00:00 ERROR [vpn] getting public IP address information: context canceled
2024-11-02T22:56:34+00:00 ERROR [vpn] cannot get version information: Get "https://api.github.com/repos/qdm12/gluetun/commits": context canceled
2024-11-02T22:56:34+00:00 INFO [vpn] starting
2024-11-02T22:56:34+00:00 INFO [firewall] allowing VPN connection...
2024-11-02T22:56:34+00:00 WARN [dns] exchanging over dns over tls connection: read tcp 10.14.0.2:45652->8.8.8.8:853: i/o timeout
2024-11-02T22:56:34+00:00 INFO [wireguard] Using available kernelspace implementation
2024-11-02T22:56:34+00:00 INFO [wireguard] Connecting to ####:51820
2024-11-02T22:56:34+00:00 INFO [wireguard] Wireguard setup is complete. Note Wireguard is a silent protocol and it may or may not work, without giving any error message. Typically i/o timeout errors indicate the Wireguard connection is not working.
2024-11-02T22:56:36+00:00 WARN [dns] exchanging over dns over tls connection: read tcp 10.14.0.2:58832->1.0.0.1:853: i/o timeout
2024-11-02T22:56:36+00:00 WARN [dns] exchanging over dns over tls connection: read tcp 10.14.0.2:50044->8.8.4.4:853: i/o timeout
...
2024-11-02T22:56:39+00:00 WARN [dns] exchanging over dns over tls connection: read tcp 10.14.0.2:57626->1.1.1.1:853: i/o timeout
2024-11-02T22:56:39+00:00 ERROR [vpn] getting public IP address information: fetching information: Get "https://ipinfo.io/": dial tcp: lookup ipinfo.io on 127.0.0.1:53: server misbehaving
2024-11-02T22:56:40+00:00 WARN [dns] exchanging over dns over tls connection: read tcp 10.14.0.2:50080->8.8.4.4:853: i/o timeout
...
2024-11-02T22:56:49+00:00 WARN [dns] exchanging over dns over tls connection: read tcp 10.14.0.2:37400->1.1.1.1:853: i/o timeout
2024-11-02T22:56:49+00:00 INFO [healthcheck] program has been unhealthy for 11s: restarting VPN
2024-11-02T22:56:49+00:00 INFO [healthcheck] 👉 See https://github.com/qdm12/gluetun-wiki/blob/main/faq/healthcheck.md
2024-11-02T22:56:49+00:00 INFO [healthcheck] DO NOT OPEN AN ISSUE UNLESS YOU READ AND TRIED EACH POSSIBLE SOLUTION
2024-11-02T22:56:49+00:00 INFO [vpn] stopping
2024-11-02T22:56:49+00:00 INFO [vpn] starting
2024-11-02T22:56:49+00:00 INFO [firewall] allowing VPN connection...
2024-11-02T22:56:49+00:00 INFO [wireguard] Using available kernelspace implementation
2024-11-02T22:56:49+00:00 INFO [wireguard] Connecting to ####:51820
2024-11-02T22:56:49+00:00 INFO [wireguard] Wireguard setup is complete. Note Wireguard is a silent protocol and it may or may not work, without giving any error message. Typically i/o timeout errors indicate the Wireguard connection is not working.
2024-11-02T22:56:50+00:00 WARN [dns] exchanging over dns over tls connection: read tcp 10.14.0.2:47620->8.8.4.4:853: i/o timeout
...
2024-11-02T22:56:53+00:00 WARN [dns] exchanging over dns over tls connection: read tcp 10.14.0.2:57584->1.0.0.1:853: i/o timeout
2024-11-02T22:56:53+00:00 ERROR [vpn] getting public IP address information: fetching information: Get "https://ipinfo.io/": dial tcp: lookup ipinfo.io on 127.0.0.1:53: server misbehaving
2024-11-02T22:56:55+00:00 WARN [dns] exchanging over dns over tls connection: read tcp 10.14.0.2:42446->8.8.8.8:853: i/o timeout
...
2024-11-02T22:57:08+00:00 WARN [dns] exchanging over dns over tls connection: read tcp 10.14.0.2:55996->8.8.8.8:853: i/o timeout
2024-11-02T22:57:08+00:00 INFO [healthcheck] program has been unhealthy for 16s: restarting VPN
2024-11-02T22:57:08+00:00 INFO [healthcheck] 👉 See https://github.com/qdm12/gluetun-wiki/blob/main/faq/healthcheck.md
2024-11-02T22:57:08+00:00 INFO [healthcheck] DO NOT OPEN AN ISSUE UNLESS YOU READ AND TRIED EACH POSSIBLE SOLUTION
2024-11-02T22:57:08+00:00 INFO [vpn] stopping
2024-11-02T22:57:08+00:00 INFO [vpn] starting
2024-11-02T22:57:08+00:00 INFO [firewall] allowing VPN connection...
2024-11-02T22:57:08+00:00 INFO [wireguard] Using available kernelspace implementation
2024-11-02T22:57:08+00:00 INFO [wireguard] Connecting to ####:51820
2024-11-02T22:57:08+00:00 INFO [wireguard] Wireguard setup is complete. Note Wireguard is a silent protocol and it may or may not work, without giving any error message. Typically i/o timeout errors indicate the Wireguard connection is not working.
2024-11-02T22:57:08+00:00 WARN [dns] exchanging over dns over tls connection: read tcp 10.14.0.2:54496->9.9.9.9:853: i/o timeout
...

@Dreadwolf91
Copy link
Author

Dreadwolf91 commented Nov 2, 2024

Thanks for the reply, my homelab is currently out of order because of some infrastructure changes im making here at home, once its back in action in a couple of days i will do what you propose

@qdm12
Copy link
Owner

qdm12 commented Nov 2, 2024

@Dreadwolf91 in my case lowering WIREGUARD_MTU from the default 1400 to 1320 fixed it. For Openvpn, you could try OPENVPN_MSSFIX=1320 I think (not exactly the same as the WIREGUARD_MTU but it should work). I'm also running over Wifi right now, so it may be related to that.

Now, I also noticed the error came up in v3.39.x releases, it's just that a block list failed update would be logged as warning and not considered as "failed to setup the dns server" thing, unlike in the latest image. Before it was just an (obscure) warning logged:

WARN [dns] context deadline exceeded (Client.Timeout or context cancellation while reading body)

And now it's

WARN [dns] cannot update filter block lists: Get "https://raw.githubusercontent.com/qdm12/files/master/malicious-hostnames.updated": net/http: TLS handshake timeout, Get "https://raw.githubusercontent.com/qdm12/files/master/malicious-ips.updated": net/http: TLS handshake timeout

Plus an attempt to re-setup the DNS server completely.

Others: please try lowering your MTU (WIREGUARD_MTU or OPENVPN_MSSFIX) to see if it helps??

@frepke
Copy link
Collaborator

frepke commented Nov 2, 2024

With WIREGUARD_MTU=1320 the latest version is working for me

@epic0421
Copy link

epic0421 commented Nov 3, 2024

WIREGUARD_MTU=1320 also works for me on latest. I was able to raise it to 1370 without any issues.

@qdm12
Copy link
Owner

qdm12 commented Nov 3, 2024

That's a pretty strange fix, given it was working fine with an MTU of 1400 (for wireguard) with Unbound.
Also my bad, this two issues I was previously separating look related in the end!!

Plaintext DNS (aka DOT=off) most likely works fine because it uses a lot less data (just UDP traffic without all the TLS stuff).
I'll dig into my DNS code and how to deal with fragmentation (for the curious it's these few lines), most likely end up asking on forums because I have no idea right now 😄 At least we have a workaround (lower the MTU).

@frepke
Copy link
Collaborator

frepke commented Nov 3, 2024

Maybe this is nonsense (if so, @qdm12, please delete this comment) , but is it possible to make an automatic MTU adjuster:

package main

import (
	"context"
	"crypto/tls"
	"fmt"
	"net"
	"os/exec"
	"strconv"
	"strings"
	"time"
)

func findOptimalMTU(serverAddress string) int {
	minMTU, maxMTU := 1200, 1500 // Typical VPN MTU range; adjust as needed
	for minMTU <= maxMTU {
		midMTU := (minMTU + maxMTU) / 2
		if isMTUSupported(serverAddress, midMTU) {
			minMTU = midMTU + 1 // Try larger MTU
		} else {
			maxMTU = midMTU - 1 // Try smaller MTU
		}
	}
	return maxMTU
}

func isMTUSupported(serverAddress string, mtu int) bool {
	// Runs a ping command with the specified MTU
	// Adjust the command for your system if necessary
	cmd := exec.Command("ping", serverAddress, "-c", "1", "-M", "do", "-s", strconv.Itoa(mtu-28))
	output, err := cmd.CombinedOutput()
	if err != nil {
		return false
	}
	return strings.Contains(string(output), "1 packets transmitted, 1 received")
}

func dialWithOptimalMTU(ctx context.Context, serverAddress, serverName string) (*tls.Conn, error) {
	// Step 1: Find optimal MTU
	optimalMTU := findOptimalMTU(serverAddress)
	fmt.Printf("Optimal MTU found: %d\n", optimalMTU)

	// Step 2: Configure network dialer with MTU if necessary
	// This example doesn’t apply MTU directly to the connection, as Go’s net package does not support direct MTU settings
	// Alternative libraries may be required for true MTU control on dialed connections

	dialer := &net.Dialer{Timeout: 10 * time.Second}
	conn, err := dialer.DialContext(ctx, "tcp", serverAddress)
	if err != nil {
		return nil, err
	}

	// Step 3: Wrap connection with TLS
	tlsConf := &tls.Config{
		MinVersion: tls.VersionTLS12,
		ServerName: serverName,
	}
	return tls.Client(conn, tlsConf), nil
}

func main() {
	ctx := context.Background()
	serverAddress := "example.com:443" // Replace with actual server address
	serverName := "example.com"        // Replace with actual server name

	conn, err := dialWithOptimalMTU(ctx, serverAddress, serverName)
	if err != nil {
		fmt.Println("Failed to connect:", err)
		return
	}
	defer conn.Close()
	fmt.Println("Connection successful with optimal MTU")
}

@qdm12
Copy link
Owner

qdm12 commented Nov 3, 2024

@frepke I thought about it like 10 minutes ago 😄 That would be a nice addition, even without that bug we are facing. We could do this as soon as the VPN is up and restart the VPN (with the same exact settings, only the MTU changed), that would be cool but would require quite a bit of code changes.

Anyway, before jumping into this (btw nice code!), I would prefer (ideally, if possibly at all) to understand why Unbound was okay communicating with DNS over TLS fine but the new Go code (really just TCP dial with TLS 🤷) doesn't make it, both with the same MTU. Since I cannot reproduce the exact error you have (the i/o timeout ones), can you run a :latest Gluetun container, DOT=off, MTU left to its default (1400) and then, once the VPN is up, run the commands:

docker exec gluetun apk add knot-utils
docker exec gluetun kdig -d @1.1.1.1 +tls-ca +tls-host=cloudflare-dns.com github.com 

To see if it works (and also how long it takes??? - the read timeout now is setup to 2 seconds, maybe that's too low)

@frepke
Copy link
Collaborator

frepke commented Nov 3, 2024

Thanks for the code compliment, but all credits belongs to ChatGPT 😔

@qdm12
Copy link
Owner

qdm12 commented Nov 4, 2024

Perfect, thanks @frepke !
Ok so TLS handshaking is faulty for some reason, maybe I need to change the settings of the TLS config; there are similar issues such as kubernetes-sigs/metrics-server#145 resorting to just lowering the MTU.

It however really does itch me that https (=http with tls) works (downloading hostnames and IP block lists), but dns over tls doesn't, even within 5 seconds. And also Unbound was working fine with that MTU of 1400.

I've asked the Golang subreddit with this post, hopefully a hero comes to rescue us, because I'm kind of stuck here to be honest (except lowering the MTU, but that's so unsatisfying 😄)

@epic0421
Copy link

epic0421 commented Nov 5, 2024

@qdm12 It is likely not just Surfshark. Just noticed a few discussion posts discussing similar TLS issues.

#2548
#2555
#2568

Also, off topic but please check this thread too. #2493

@huskaramok
Copy link

Facing the same issue with windscribe wireguard and changing mtu to 1380 fixes it, but the dns response is very slow

@qdm12 qdm12 changed the title Bug: surfshark latest image not working Bug: dns over tls timing out on latest image (TLS handshake) Nov 5, 2024
@qdm12
Copy link
Owner

qdm12 commented Nov 5, 2024

TLDR: Please try running the latest image with GODEBUG=tlskyber=0,x509keypairleaf=0 and see if it works??? Just yes or not should suffice, no need for logs 😉 The final solution would still be to lower default MTU values, but at least we would have a solid explanation.


More details:

Root causes found at least on my side: Go 1.23 crypto/tls library changes , more precisely:

The experimental post-quantum key exchange mechanism X25519Kyber768Draft00 is now enabled by default

and

Go 1.23 changed the behavior of X509KeyPair and LoadX509KeyPair to populate the Certificate.Leaf field of the returned Certificate.

Running Gluetun with the environment variable GODEBUG=tlskyber=0,x509keypairleaf=0 solves the issue.

I went down this rabbit hole because I noticed https (not just dns over tls) would fail downloading files (like in the original issue logs) with tls handshake timeout errors. That was hinting the DNS over TLS implementation might be okay, it's just that TLS was not behaving right. Reverting back to Gluetun v3.39 still worked fine regarding https, with the same MTU (1400) setting. So I went to look into the changes since Gluetun v3.39.0 and noticed Go was upgraded from 1.22 to 1.23; then went to check the Go 1.23 release notes; then ran with those few GODEBUG options to check which ones were necessary to make Gluetun great again 😄 And to my surprise, it worked out (at least on my side)!! 🎉

Now if this actually solves the problem with an MTU of 1400, I think the best course of action would be to:

  1. Change default WIREGUARD_MTU to 1320
  2. Set the default for OPENVPN_MSSFIX as 1320 Set the default mssfix per provider to 1320 if it's not specified by the provider, since some have it specified already (i.e. 1200, 1500 etc.)

Reasons being:

  • other applications (i.e. connected containers) may have trouble with that too-high MTU (whatever communication protocol they would use)
  • those GODEBUG values disable important new features especially security/privacy-wise, such as that post quantum key exchange
  • those GODEBUG values might be a solution for 6-12 months, but these will eventually be removed

qdm12 added a commit that referenced this issue Nov 5, 2024
qdm12 added a commit that referenced this issue Nov 5, 2024
qdm12 added a commit that referenced this issue Nov 5, 2024
@Elekam

This comment was marked as off-topic.

@qdm12

This comment was marked as off-topic.

@qdm12
Copy link
Owner

qdm12 commented Nov 6, 2024

Also, the latest image now default MTU is 1320 instead of 1400. Before closing this issue, I'll implement a "best MTU" mechanism with icmp pings as @frepke suggested though, since it seems like a great feature and would remove a lot of potential issues.

@Elekam

This comment was marked as off-topic.

@Elekam

This comment was marked as off-topic.

@qdm12

This comment was marked as off-topic.

jfroy added a commit to jfroy/flatops that referenced this issue Nov 8, 2024
jfroy added a commit to jfroy/flatops that referenced this issue Nov 10, 2024
It is not reliable? See qdm12/gluetun#2533 maybe.
@qdm12
Copy link
Owner

qdm12 commented Nov 14, 2024

Just for a small update:
I've implemented a standalone code package for now to automatically detect the max MTU possible (PR #2586 to resolve issue #2570 created from this issue). I've been working on this for about a week, it's a "path MTU discovery" mechanism using ICMP, working for both IPv4 and IPv6, and it also falls back to a bruteforce test (try packets of different sizes) if your VPN server decided to drop MTU discovery ICMP packets, which apparently does happen. I just need another few hours on this to wire this up within the Gluetun code, I'll let you know once there is a tagged Docker image to test out!
The good news is, whereas OpenVPN's mtu-test can take 3 minutes, what I have takes at most 2-3 seconds 😉

@floriegl
Copy link

floriegl commented Nov 21, 2024

Not 100% sure if the issue should already be fixed by "latest image now default MTU is 1320", but I get a lot of warnings in regard to DoT with the current latest. The log message is not exactly the same as it also includes for request IN AAAA #####.#####.#####. (always the same domain, but the container using it doesn't use many domains anyway).

========================================
========================================
=============== gluetun ================
========================================
=========== Made with ❤️ by ============
======= https://github.com/qdm12 =======
========================================
========================================

Running version latest built on 2024-11-18T09:49:16.711Z (commit 68ddbfc)

...
2024-11-20T19:04:31Z INFO [wireguard] Connecting to #####:51820
2024-11-20T19:04:31Z INFO [wireguard] Wireguard setup is complete. Note Wireguard is a silent protocol and it may or may not work, without giving any error message. Typically i/o timeout errors indicate the Wireguard connection is not working.
2024-11-20T19:04:31Z INFO [dns] downloading hostnames and IP block lists
2024-11-20T19:04:31Z INFO [healthcheck] healthy!
2024-11-20T19:04:35Z INFO [dns] DNS server listening on [::]:53
2024-11-20T19:04:35Z INFO [dns] ready
2024-11-20T19:04:36Z INFO [ip getter] Public IP address is ##### (##### - source: ipinfo)
2024-11-20T19:04:36Z INFO [vpn] You are running on the bleeding edge of latest!
2024-11-20T19:14:37Z WARN [dns] exchanging over tls connection for request IN AAAA #####.#####.#####.: read tcp 10.5.0.2:57930->1.1.1.1:853: i/o timeout
2024-11-20T19:15:32Z WARN [dns] exchanging over tls connection for request IN AAAA #####.#####.#####.: read tcp 10.5.0.2:39940->1.0.0.1:853: i/o timeout
2024-11-20T19:16:10Z WARN [dns] exchanging over tls connection for request IN AAAA #####.#####.#####.: read tcp 10.5.0.2:56266->1.1.1.1:853: i/o timeout
2024-11-20T19:16:12Z WARN [dns] exchanging over tls connection for request IN AAAA #####.#####.#####.: read tcp 10.5.0.2:34022->1.0.0.1:853: i/o timeout
2024-11-20T19:16:12Z WARN [dns] exchanging over tls connection for request IN AAAA #####.#####.#####.: read tcp 10.5.0.2:34036->1.0.0.1:853: i/o timeout
2024-11-20T19:16:14Z WARN [dns] exchanging over tls connection for request IN AAAA #####.#####.#####.: read tcp 10.5.0.2:34040->1.0.0.1:853: i/o timeout
2024-11-20T19:16:14Z WARN [dns] exchanging over tls connection for request IN AAAA #####.#####.#####.: read tcp 10.5.0.2:34044->1.0.0.1:853: i/o timeout
2024-11-20T19:16:15Z WARN [dns] exchanging over tls connection for request IN AAAA #####.#####.#####.: read tcp 10.5.0.2:51970->1.1.1.1:853: i/o timeout
2024-11-20T19:16:17Z WARN [dns] exchanging over tls connection for request IN AAAA #####.#####.#####.: read tcp 10.5.0.2:34054->1.0.0.1:853: i/o timeout
2024-11-20T19:16:18Z WARN [dns] exchanging over tls connection for request IN AAAA #####.#####.#####.: read tcp 10.5.0.2:51980->1.1.1.1:853: i/o timeout
2024-11-20T19:26:15Z WARN [dns] exchanging over tls connection for request IN AAAA #####.#####.#####.: read tcp 10.5.0.2:34416->1.1.1.1:853: i/o timeout
2024-11-20T19:44:30Z WARN [dns] exchanging over tls connection for request IN AAAA #####.#####.#####.: read tcp 10.5.0.2:38986->1.1.1.1:853: i/o timeout
2024-11-20T19:49:26Z WARN [dns] exchanging over tls connection for request IN AAAA #####.#####.#####.: read tcp 10.5.0.2:34176->1.1.1.1:853: i/o timeout
2024-11-20T19:49:29Z WARN [dns] exchanging over tls connection for request IN AAAA #####.#####.#####.: read tcp 10.5.0.2:34190->1.1.1.1:853: i/o timeout
2024-11-20T19:49:29Z WARN [dns] exchanging over tls connection for request IN AAAA #####.#####.#####.: read tcp 10.5.0.2:49004->1.0.0.1:853: i/o timeout
2024-11-20T19:49:31Z WARN [dns] exchanging over tls connection for request IN AAAA #####.#####.#####.: read tcp 10.5.0.2:44968->1.0.0.1:853: i/o timeout
2024-11-20T19:49:31Z WARN [dns] exchanging over tls connection for request IN AAAA #####.#####.#####.: read tcp 10.5.0.2:44976->1.0.0.1:853: i/o timeout
2024-11-20T19:49:32Z WARN [dns] exchanging over tls connection for request IN AAAA #####.#####.#####.: read tcp 10.5.0.2:51668->1.1.1.1:853: i/o timeout
2024-11-20T19:49:34Z WARN [dns] exchanging over tls connection for request IN AAAA #####.#####.#####.: read tcp 10.5.0.2:51682->1.1.1.1:853: i/o timeout
2024-11-20T21:48:35Z WARN [dns] exchanging over tls connection for request IN AAAA #####.#####.#####.: read tcp 10.5.0.2:40646->1.1.1.1:853: i/o timeout
2024-11-20T21:57:42Z WARN [dns] exchanging over tls connection for request IN AAAA #####.#####.#####.: read tcp 10.5.0.2:60050->1.1.1.1:853: i/o timeout
2024-11-20T22:25:09Z WARN [dns] exchanging over tls connection for request IN AAAA #####.#####.#####.: read tcp 10.5.0.2:35200->1.1.1.1:853: i/o timeout
2024-11-20T22:25:12Z WARN [dns] exchanging over tls connection for request IN AAAA #####.#####.#####.: read tcp 10.5.0.2:49590->1.1.1.1:853: i/o timeout
2024-11-20T22:25:12Z WARN [dns] exchanging over tls connection for request IN AAAA #####.#####.#####.: read tcp 10.5.0.2:49594->1.1.1.1:853: i/o timeout
2024-11-20T22:25:12Z WARN [dns] exchanging over tls connection for request IN AAAA #####.#####.#####.: read tcp 10.5.0.2:49606->1.1.1.1:853: i/o timeout
2024-11-20T22:25:14Z WARN [dns] exchanging over tls connection for request IN AAAA #####.#####.#####.: read tcp 10.5.0.2:48276->1.0.0.1:853: i/o timeout
2024-11-20T22:25:14Z WARN [dns] exchanging over tls connection for request IN AAAA #####.#####.#####.: read tcp 10.5.0.2:49614->1.1.1.1:853: i/o timeout
2024-11-20T22:25:14Z WARN [dns] exchanging over tls connection for request IN AAAA #####.#####.#####.: read tcp 10.5.0.2:49630->1.1.1.1:853: i/o timeout
2024-11-20T22:25:15Z WARN [dns] exchanging over tls connection for request IN AAAA #####.#####.#####.: read tcp 10.5.0.2:48286->1.0.0.1:853: i/o timeout
2024-11-20T22:25:15Z WARN [dns] exchanging over tls connection for request IN AAAA #####.#####.#####.: read tcp 10.5.0.2:49642->1.1.1.1:853: i/o timeout
2024-11-20T22:25:16Z WARN [dns] exchanging over tls connection for request IN AAAA #####.#####.#####.: read tcp 10.5.0.2:48292->1.0.0.1:853: i/o timeout
2024-11-20T22:25:17Z WARN [dns] exchanging over tls connection for request IN AAAA #####.#####.#####.: read tcp 10.5.0.2:48308->1.0.0.1:853: i/o timeout
2024-11-20T22:25:17Z WARN [dns] exchanging over tls connection for request IN AAAA #####.#####.#####.: read tcp 10.5.0.2:49650->1.1.1.1:853: i/o timeout
2024-11-20T22:25:18Z WARN [dns] exchanging over tls connection for request IN AAAA #####.#####.#####.: read tcp 10.5.0.2:48326->1.0.0.1:853: i/o timeout
2024-11-20T22:25:19Z WARN [dns] exchanging over tls connection for request IN AAAA #####.#####.#####.: read tcp 10.5.0.2:48342->1.0.0.1:853: i/o timeout
2024-11-20T22:25:21Z WARN [dns] exchanging over tls connection for request IN AAAA #####.#####.#####.: read tcp 10.5.0.2:56226->1.1.1.1:853: i/o timeout
2024-11-20T22:39:49Z WARN [dns] exchanging over tls connection for request IN AAAA #####.#####.#####.: read tcp 10.5.0.2:34724->1.0.0.1:853: i/o timeout
2024-11-20T23:08:21Z WARN [dns] exchanging over tls connection for request IN AAAA #####.#####.#####.: read tcp 10.5.0.2:57352->1.0.0.1:853: i/o timeout
2024-11-20T23:32:48Z WARN [dns] exchanging over tls connection for request IN AAAA #####.#####.#####.: read tcp 10.5.0.2:49660->1.0.0.1:853: i/o timeout
2024-11-20T23:33:32Z WARN [dns] exchanging over tls connection for request IN AAAA #####.#####.#####.: read tcp 10.5.0.2:41662->1.1.1.1:853: i/o timeout
2024-11-20T23:35:00Z WARN [dns] exchanging over tls connection for request IN AAAA #####.#####.#####.: read tcp 10.5.0.2:37228->1.0.0.1:853: i/o timeout
2024-11-20T23:35:02Z WARN [dns] exchanging over tls connection for request IN AAAA #####.#####.#####.: read tcp 10.5.0.2:37238->1.0.0.1:853: i/o timeout
2024-11-20T23:35:13Z WARN [dns] exchanging over tls connection for request IN AAAA #####.#####.#####.: read tcp 10.5.0.2:45028->1.0.0.1:853: i/o timeout
2024-11-20T23:35:17Z WARN [dns] exchanging over tls connection for request IN AAAA #####.#####.#####.: read tcp 10.5.0.2:45134->1.0.0.1:853: i/o timeout
2024-11-20T23:35:40Z WARN [dns] exchanging over tls connection for request IN AAAA #####.#####.#####.: read tcp 10.5.0.2:52262->1.1.1.1:853: i/o timeout
2024-11-20T23:35:40Z WARN [dns] exchanging over tls connection for request IN AAAA #####.#####.#####.: read tcp 10.5.0.2:39504->1.0.0.1:853: i/o timeout
2024-11-20T23:35:42Z WARN [dns] exchanging over tls connection for request IN AAAA #####.#####.#####.: read tcp 10.5.0.2:39508->1.0.0.1:853: i/o timeout
2024-11-20T23:35:42Z WARN [dns] exchanging over tls connection for request IN AAAA #####.#####.#####.: read tcp 10.5.0.2:36384->1.1.1.1:853: i/o timeout
2024-11-20T23:41:23Z WARN [dns] exchanging over tls connection for request IN AAAA #####.#####.#####.: read tcp 10.5.0.2:32950->1.1.1.1:853: i/o timeout

docker-compose.yml:

services:
  gluetun:
    image: qmcgaw/gluetun
    cap_add:
      - NET_ADMIN
    devices:
      - /dev/net/tun:/dev/net/tun
    environment:
      - VPN_SERVICE_PROVIDER=nordvpn
      - VPN_TYPE=wireguard
      - WIREGUARD_PRIVATE_KEY=#####
      - SERVER_COUNTRIES=#####

@qdm12
Copy link
Owner

qdm12 commented Nov 23, 2024

@floriegl

  1. Can you try, replacing thatdomain.com with that domain: docker run --rm alpine:3.20 /bin/sh -c "apk add knot-utils && kdig -t AAAA -d @1.1.1.1 +tls-ca +tls-host=cloudflare-dns.com thatdomain.com" to check if it works? Maybe it's just cloudflare dropping the query? 🤔
  2. When this happens, can you try (assuming gluetun container name is gluetun): docker exec gluetun nslookup github.com to check if dns resolution works for github.com? The healthcheck should fail is DNS resolution stops working, but the DNS caching might make it still work for some time, despite DNS being no longer functional 🤔
  3. (EDIT) also check your MTU with docker exec gluetun ip link to be sure, it should be mtu 1320 on the tun0 line

@sirjmann92
Copy link

Also seeing this issue with airvpn. MTU on tun0 is 1320, for all three of my gluetun containers.

@floriegl
Copy link

floriegl commented Dec 9, 2024

  1. Can you try, replacing thatdomain.com with that domain: docker run --rm alpine:3.20 /bin/sh -c "apk add knot-utils && kdig -t AAAA -d @1.1.1.1 +tls-ca +tls-host=cloudflare-dns.com thatdomain.com" to check if it works? Maybe it's just cloudflare dropping the query? 🤔

Cloudflare does SERVFAIL for AAAA, but returns the correct IP for A. It seems the domain only has an IPv4 address.

docker run --rm alpine:3.20 /bin/sh -c "apk add knot-utils && kdig -t AAAA -d @1.1.1.1 +tls-ca +tls-host=cloudflare-dns.com #####.#####.#####"
Unable to find image 'alpine:3.20' locally
3.20: Pulling from library/alpine
<some downloading and installing stuff>
Executing busybox-1.36.1-r29.trigger
OK: 20 MiB in 32 packages
;; DEBUG: Querying for owner(#####.#####.#####.), class(1), type(28), server(1.1.1.1), port(853), protocol(TCP)
;; DEBUG: TLS, imported 147 system certificates
;; DEBUG: TLS, received certificate hierarchy:
;; DEBUG:  #1, C=US,ST=California,L=San Francisco,O=Cloudflare\, Inc.,CN=cloudflare-dns.com
;; DEBUG:      SHA-256 PIN: ############################################
;; DEBUG:  #2, C=US,O=DigiCert Inc,CN=DigiCert Global G2 TLS RSA SHA256 2020 CA1
;; DEBUG:      SHA-256 PIN: ############################################
;; DEBUG: TLS, skipping certificate PIN check
;; DEBUG: TLS, The certificate is trusted.
;; TLS session (TLS1.3)-(ECDHE-X25519)-(ECDSA-SECP256R1-SHA256)-(AES-256-GCM)
;; ->>HEADER<<- opcode: QUERY; status: SERVFAIL; id: 37830
;; Flags: qr rd ra; QUERY: 1; ANSWER: 0; AUTHORITY: 0; ADDITIONAL: 1

;; EDNS PSEUDOSECTION:
;; Version: 0; flags: ; UDP size: 1232 B; ext-rcode: NOERROR
;; EDE: 22 (No Reachable Authority): 'at delegation #####.#####.'
;; EDE: 23 (Network Error): 'XXX.XXX.XXX.XXX:53 rcode=REFUSED for #####.#####.##### AAAA'
;; PADDING: 330 B

;; QUESTION SECTION:
;; #####.#####.#####.              IN      AAAA

;; Received 468 B
;; Time 2024-12-09 20:51:24 UTC
;; From 1.1.1.1@853(TLS) in 4134.7 ms

docker run --rm alpine:3.20 /bin/sh -c "apk add knot-utils && kdig -t A -d @1.1.1.1 +tls-ca +tls-host=cloudflare-dns.com #####.#####.#####"
<some downloading and installing stuff>
Executing busybox-1.36.1-r29.trigger
OK: 20 MiB in 32 packages
;; DEBUG: Querying for owner(#####.#####.#####.), class(1), type(1), server(1.1.1.1), port(853), protocol(TCP)
;; DEBUG: TLS, imported 147 system certificates
;; DEBUG: TLS, received certificate hierarchy:
;; DEBUG:  #1, C=US,ST=California,L=San Francisco,O=Cloudflare\, Inc.,CN=cloudflare-dns.com
;; DEBUG:      SHA-256 PIN: ############################################
;; DEBUG:  #2, C=US,O=DigiCert Inc,CN=DigiCert Global G2 TLS RSA SHA256 2020 CA1
;; DEBUG:      SHA-256 PIN: ############################################
;; DEBUG: TLS, skipping certificate PIN check
;; DEBUG: TLS, The certificate is trusted.
;; TLS session (TLS1.3)-(ECDHE-X25519)-(ECDSA-SECP256R1-SHA256)-(AES-256-GCM)
;; ->>HEADER<<- opcode: QUERY; status: NOERROR; id: 26314
;; Flags: qr rd ra; QUERY: 1; ANSWER: 1; AUTHORITY: 0; ADDITIONAL: 1

;; EDNS PSEUDOSECTION:
;; Version: 0; flags: ; UDP size: 1232 B; ext-rcode: NOERROR
;; PADDING: 405 B

;; QUESTION SECTION:
;; #####.#####.#####.              IN      A

;; ANSWER SECTION:
#####.#####.#####.         86400   IN      A       ###.###.###.###

;; Received 468 B
;; Time 2024-12-09 21:23:49 UTC
;; From 1.1.1.1@853(TLS) in 186.4 ms
  1. When this happens, can you try (assuming gluetun container name is gluetun): docker exec gluetun nslookup github.com to check if dns resolution works for github.com? The healthcheck should fail is DNS resolution stops working, but the DNS caching might make it still work for some time, despite DNS being no longer functional 🤔

I am not fully sure, what you mean with "When this happens". I assume it's in regard to the phases with the i/o timeouts. I have examined the logs, and it looks like the timeout phases are about five minutes long which would make it really hard to do manual checks during the phases. But for completeness I added the output for a phase where I didn't have issues.

docker exec gluetun nslookup github.com
Server:         127.0.0.1
Address:        127.0.0.1:53

Non-authoritative answer:

Non-authoritative answer:
Name:   github.com
Address: 140.82.121.4

I also had some 15 minute phases where there were a lot of healthcheck issues with ipinfo.io.lan. and cloudflare.com., but I think that are just some general issues with the NordVPN server I am using. The following are just some of the lines which got repeated in the timeframe.

2024-12-06T00:52:46Z INFO [healthcheck] program has been unhealthy for 6s: restarting VPN (healthcheck error: dialing: dial tcp4 XXX.XXX.XXX.XXX:443: i/o timeout)
2024-12-06T00:52:46Z INFO [healthcheck] 👉 See https://github.com/qdm12/gluetun-wiki/blob/main/faq/healthcheck.md
2024-12-06T00:52:46Z INFO [healthcheck] DO NOT OPEN AN ISSUE UNLESS YOU READ AND TRIED EACH POSSIBLE SOLUTION
2024-12-06T00:52:46Z INFO [vpn] stopping
2024-12-06T00:52:46Z INFO [vpn] starting
2024-12-06T00:52:46Z INFO [firewall] allowing VPN connection...
2024-12-06T00:52:47Z INFO [wireguard] Using available kernelspace implementation
2024-12-06T00:52:47Z INFO [wireguard] Connecting to XXX.XXX.XXX.XXX:XXXXX
2024-12-06T00:52:47Z INFO [wireguard] Wireguard setup is complete. Note Wireguard is a silent protocol and it may or may not work, without giving any error message. Typically i/o timeout errors indicate the Wireguard connection is not working.
2024-12-06T00:52:52Z WARN [dns] dialing tls server for request IN A ipinfo.io.: dial tcp 1.0.0.1:853: i/o timeout
2024-12-06T00:52:52Z WARN [dns] dialing tls server for request IN AAAA ipinfo.io.: dial tcp 1.0.0.1:853: i/o timeout
2024-12-06T00:52:57Z WARN [dns] dialing tls server for request IN AAAA ipinfo.io.: dial tcp 1.0.0.1:853: i/o timeout
2024-12-06T00:52:57Z WARN [dns] dialing tls server for request IN A ipinfo.io.: dial tcp 1.1.1.1:853: i/o timeout
2024-12-06T00:52:59Z INFO [healthcheck] program has been unhealthy for 11s: restarting VPN (healthcheck error: dialing: dial tcp4 XXX.XXX.XXX.XXX:443: i/o timeout)
2024-12-06T00:52:59Z INFO [healthcheck] 👉 See https://github.com/qdm12/gluetun-wiki/blob/main/faq/healthcheck.md
2024-12-06T00:52:59Z INFO [healthcheck] DO NOT OPEN AN ISSUE UNLESS YOU READ AND TRIED EACH POSSIBLE SOLUTION
2024-12-06T00:52:59Z INFO [vpn] stopping
2024-12-06T00:52:59Z ERROR [vpn] getting public IP address information: fetching information: Get "https://ipinfo.io/": context canceled
2024-12-06T00:52:59Z INFO [vpn] starting
2024-12-06T00:52:59Z INFO [firewall] allowing VPN connection...
2024-12-06T00:52:59Z INFO [wireguard] Using available kernelspace implementation
2024-12-06T00:52:59Z INFO [wireguard] Connecting to XXX.XXX.XXX.XXX:XXXXX
2024-12-06T00:52:59Z INFO [wireguard] Wireguard setup is complete. Note Wireguard is a silent protocol and it may or may not work, without giving any error message. Typically i/o timeout errors indicate the Wireguard connection is not working.
  1. (EDIT) also check your MTU with docker exec gluetun ip link to be sure, it should be mtu 1320 on the tun0 line

Looks like that is the case:

docker exec gluetun ip link
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN mode DEFAULT group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
12: eth0@if13: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP mode DEFAULT group default
    link/ether XX:XX:XX:XX:XX:XX brd ff:ff:ff:ff:ff:ff link-netnsid 0
24: tun0: <POINTOPOINT,NOARP,UP,LOWER_UP> mtu 1320 qdisc noqueue state UNKNOWN mode DEFAULT group default
    link/none

@syb3ria
Copy link

syb3ria commented Dec 10, 2024

Not sure of this helps as I'm using k3s but here is my output of the same command. Please note I'm using surfshark as a provider.

 fiddler@server: $ kubectl create deployment dns-query --image=alpine:3.20 -- /bin/sh -c "apk add knot-utils && kdig -t AAAA -d @1.1.1.1 +tls-ca +tls-host=cloudflare-dns.com thatdomain.com"
deployment.apps/dns-query created
fiddler@server: $ kubectl logs deployment/dns-query
fetch https://dl-cdn.alpinelinux.org/alpine/v3.20/main/aarch64/APKINDEX.tar.gz
fetch https://dl-cdn.alpinelinux.org/alpine/v3.20/community/aarch64/APKINDEX.tar.gz
(1/18) Installing gmp (6.3.0-r1)
(2/18) Installing nettle (3.9.1-r0)
(3/18) Installing libunistring (1.2-r0)
(4/18) Installing libidn2 (2.3.7-r0)
(5/18) Installing libffi (3.4.6-r0)
(6/18) Installing libtasn1 (4.19.0-r2)
(7/18) Installing p11-kit (0.25.3-r0)
(8/18) Installing gnutls (3.8.5-r0)
(9/18) Installing lmdb (0.9.32-r0)
(10/18) Installing ngtcp2 (1.5.0-r0)
(11/18) Installing ngtcp2-gnutls (1.5.0-r0)
(12/18) Installing knot-libs (3.3.9-r0)
(13/18) Installing ncurses-terminfo-base (6.4_p20240420-r2)
(14/18) Installing libncursesw (6.4_p20240420-r2)
(15/18) Installing libedit (20240517.3.1-r0)
(16/18) Installing nghttp2-libs (1.62.1-r0)
(17/18) Installing userspace-rcu (0.14.0-r2)
(18/18) Installing knot-utils (3.3.9-r0)
Executing busybox-1.36.1-r29.trigger
OK: 22 MiB in 32 packages
;; DEBUG: Querying for owner(thatdomain.com.), class(1), type(28), server(1.1.1.1), port(853), protocol(TCP)
;; DEBUG: TLS, imported 147 system certificates
;; DEBUG: TLS, received certificate hierarchy:
;; DEBUG:  #1, C=US,ST=California,L=San Francisco,O=Cloudflare\, Inc.,CN=cloudflare-dns.com
;; DEBUG:      SHA-256 PIN: 4pqQ+yl3lAtRvKdoCCUR8iDmA53I+cJ7orgBLiF08kQ=
;; DEBUG:  #2, C=US,O=DigiCert Inc,CN=DigiCert Global G2 TLS RSA SHA256 2020 CA1
;; DEBUG:      SHA-256 PIN: Wec45nQiFwKvHtuHxSAMGkt19k+uPSw9JlEkxhvYPHk=
;; DEBUG: TLS, skipping certificate PIN check
;; DEBUG: TLS, The certificate is trusted.
;; TLS session (TLS1.3)-(ECDHE-X25519)-(ECDSA-SECP256R1-SHA256)-(AES-256-GCM)
;; ->>HEADER<<- opcode: QUERY; status: SERVFAIL; id: 25860
;; Flags: qr rd ra; QUERY: 1; ANSWER: 0; AUTHORITY: 0; ADDITIONAL: 1

;; EDNS PSEUDOSECTION:
;; Version: 0; flags: ; UDP size: 1232 B; ext-rcode: NOERROR
;; EDE: 22 (No Reachable Authority): 'at delegation thatdomain.com.'
;; EDE: 23 (Network Error): '[2a00:a500:0:93::78]:53 rcode=REFUSED for thatdomain.com AAAA'
;; PADDING: 319 B

;; QUESTION SECTION:
;; thatdomain.com.              IN      AAAA

;; Received 468 B
;; Time 2024-12-10 20:04:13 UTC
;; From 1.1.1.1@853(TLS) in 370.9 ms

Let me know if there is anything else I can provide to help further.

@Kinsiinoo
Copy link

Kinsiinoo commented Dec 15, 2024

I'm still having the same issue as @epic0421 with :latest. ProtonVPN + Wireguard + 1320 WIREGUARD_MTU.

@neal421
Copy link

neal421 commented Dec 19, 2024

  1. Can you try, replacing thatdomain.com with that domain: docker run --rm alpine:3.20 /bin/sh -c "apk add knot-utils && kdig -t AAAA -d @1.1.1.1 +tls-ca +tls-host=cloudflare-dns.com thatdomain.com" to check if it works? Maybe it's just cloudflare dropping the query? 🤔

Cloudflare does SERVFAIL for AAAA, but returns the correct IP for A. It seems the domain only has an IPv4 address.

docker run --rm alpine:3.20 /bin/sh -c "apk add knot-utils && kdig -t AAAA -d @1.1.1.1 +tls-ca +tls-host=cloudflare-dns.com #####.#####.#####"
Unable to find image 'alpine:3.20' locally
3.20: Pulling from library/alpine
<some downloading and installing stuff>
Executing busybox-1.36.1-r29.trigger
OK: 20 MiB in 32 packages
;; DEBUG: Querying for owner(#####.#####.#####.), class(1), type(28), server(1.1.1.1), port(853), protocol(TCP)
;; DEBUG: TLS, imported 147 system certificates
;; DEBUG: TLS, received certificate hierarchy:
;; DEBUG:  #1, C=US,ST=California,L=San Francisco,O=Cloudflare\, Inc.,CN=cloudflare-dns.com
;; DEBUG:      SHA-256 PIN: ############################################
;; DEBUG:  #2, C=US,O=DigiCert Inc,CN=DigiCert Global G2 TLS RSA SHA256 2020 CA1
;; DEBUG:      SHA-256 PIN: ############################################
;; DEBUG: TLS, skipping certificate PIN check
;; DEBUG: TLS, The certificate is trusted.
;; TLS session (TLS1.3)-(ECDHE-X25519)-(ECDSA-SECP256R1-SHA256)-(AES-256-GCM)
;; ->>HEADER<<- opcode: QUERY; status: SERVFAIL; id: 37830
;; Flags: qr rd ra; QUERY: 1; ANSWER: 0; AUTHORITY: 0; ADDITIONAL: 1

;; EDNS PSEUDOSECTION:
;; Version: 0; flags: ; UDP size: 1232 B; ext-rcode: NOERROR
;; EDE: 22 (No Reachable Authority): 'at delegation #####.#####.'
;; EDE: 23 (Network Error): 'XXX.XXX.XXX.XXX:53 rcode=REFUSED for #####.#####.##### AAAA'
;; PADDING: 330 B

;; QUESTION SECTION:
;; #####.#####.#####.              IN      AAAA

;; Received 468 B
;; Time 2024-12-09 20:51:24 UTC
;; From 1.1.1.1@853(TLS) in 4134.7 ms

docker run --rm alpine:3.20 /bin/sh -c "apk add knot-utils && kdig -t A -d @1.1.1.1 +tls-ca +tls-host=cloudflare-dns.com #####.#####.#####"
<some downloading and installing stuff>
Executing busybox-1.36.1-r29.trigger
OK: 20 MiB in 32 packages
;; DEBUG: Querying for owner(#####.#####.#####.), class(1), type(1), server(1.1.1.1), port(853), protocol(TCP)
;; DEBUG: TLS, imported 147 system certificates
;; DEBUG: TLS, received certificate hierarchy:
;; DEBUG:  #1, C=US,ST=California,L=San Francisco,O=Cloudflare\, Inc.,CN=cloudflare-dns.com
;; DEBUG:      SHA-256 PIN: ############################################
;; DEBUG:  #2, C=US,O=DigiCert Inc,CN=DigiCert Global G2 TLS RSA SHA256 2020 CA1
;; DEBUG:      SHA-256 PIN: ############################################
;; DEBUG: TLS, skipping certificate PIN check
;; DEBUG: TLS, The certificate is trusted.
;; TLS session (TLS1.3)-(ECDHE-X25519)-(ECDSA-SECP256R1-SHA256)-(AES-256-GCM)
;; ->>HEADER<<- opcode: QUERY; status: NOERROR; id: 26314
;; Flags: qr rd ra; QUERY: 1; ANSWER: 1; AUTHORITY: 0; ADDITIONAL: 1

;; EDNS PSEUDOSECTION:
;; Version: 0; flags: ; UDP size: 1232 B; ext-rcode: NOERROR
;; PADDING: 405 B

;; QUESTION SECTION:
;; #####.#####.#####.              IN      A

;; ANSWER SECTION:
#####.#####.#####.         86400   IN      A       ###.###.###.###

;; Received 468 B
;; Time 2024-12-09 21:23:49 UTC
;; From 1.1.1.1@853(TLS) in 186.4 ms
  1. When this happens, can you try (assuming gluetun container name is gluetun): docker exec gluetun nslookup github.com to check if dns resolution works for github.com? The healthcheck should fail is DNS resolution stops working, but the DNS caching might make it still work for some time, despite DNS being no longer functional 🤔

I am not fully sure, what you mean with "When this happens". I assume it's in regard to the phases with the i/o timeouts. I have examined the logs, and it looks like the timeout phases are about five minutes long which would make it really hard to do manual checks during the phases. But for completeness I added the output for a phase where I didn't have issues.

docker exec gluetun nslookup github.com
Server:         127.0.0.1
Address:        127.0.0.1:53

Non-authoritative answer:

Non-authoritative answer:
Name:   github.com
Address: 140.82.121.4

I also had some 15 minute phases where there were a lot of healthcheck issues with ipinfo.io.lan. and cloudflare.com., but I think that are just some general issues with the NordVPN server I am using. The following are just some of the lines which got repeated in the timeframe.

2024-12-06T00:52:46Z INFO [healthcheck] program has been unhealthy for 6s: restarting VPN (healthcheck error: dialing: dial tcp4 XXX.XXX.XXX.XXX:443: i/o timeout)
2024-12-06T00:52:46Z INFO [healthcheck] 👉 See https://github.com/qdm12/gluetun-wiki/blob/main/faq/healthcheck.md
2024-12-06T00:52:46Z INFO [healthcheck] DO NOT OPEN AN ISSUE UNLESS YOU READ AND TRIED EACH POSSIBLE SOLUTION
2024-12-06T00:52:46Z INFO [vpn] stopping
2024-12-06T00:52:46Z INFO [vpn] starting
2024-12-06T00:52:46Z INFO [firewall] allowing VPN connection...
2024-12-06T00:52:47Z INFO [wireguard] Using available kernelspace implementation
2024-12-06T00:52:47Z INFO [wireguard] Connecting to XXX.XXX.XXX.XXX:XXXXX
2024-12-06T00:52:47Z INFO [wireguard] Wireguard setup is complete. Note Wireguard is a silent protocol and it may or may not work, without giving any error message. Typically i/o timeout errors indicate the Wireguard connection is not working.
2024-12-06T00:52:52Z WARN [dns] dialing tls server for request IN A ipinfo.io.: dial tcp 1.0.0.1:853: i/o timeout
2024-12-06T00:52:52Z WARN [dns] dialing tls server for request IN AAAA ipinfo.io.: dial tcp 1.0.0.1:853: i/o timeout
2024-12-06T00:52:57Z WARN [dns] dialing tls server for request IN AAAA ipinfo.io.: dial tcp 1.0.0.1:853: i/o timeout
2024-12-06T00:52:57Z WARN [dns] dialing tls server for request IN A ipinfo.io.: dial tcp 1.1.1.1:853: i/o timeout
2024-12-06T00:52:59Z INFO [healthcheck] program has been unhealthy for 11s: restarting VPN (healthcheck error: dialing: dial tcp4 XXX.XXX.XXX.XXX:443: i/o timeout)
2024-12-06T00:52:59Z INFO [healthcheck] 👉 See https://github.com/qdm12/gluetun-wiki/blob/main/faq/healthcheck.md
2024-12-06T00:52:59Z INFO [healthcheck] DO NOT OPEN AN ISSUE UNLESS YOU READ AND TRIED EACH POSSIBLE SOLUTION
2024-12-06T00:52:59Z INFO [vpn] stopping
2024-12-06T00:52:59Z ERROR [vpn] getting public IP address information: fetching information: Get "https://ipinfo.io/": context canceled
2024-12-06T00:52:59Z INFO [vpn] starting
2024-12-06T00:52:59Z INFO [firewall] allowing VPN connection...
2024-12-06T00:52:59Z INFO [wireguard] Using available kernelspace implementation
2024-12-06T00:52:59Z INFO [wireguard] Connecting to XXX.XXX.XXX.XXX:XXXXX
2024-12-06T00:52:59Z INFO [wireguard] Wireguard setup is complete. Note Wireguard is a silent protocol and it may or may not work, without giving any error message. Typically i/o timeout errors indicate the Wireguard connection is not working.
  1. (EDIT) also check your MTU with docker exec gluetun ip link to be sure, it should be mtu 1320 on the tun0 line

Looks like that is the case:

docker exec gluetun ip link
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN mode DEFAULT group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
12: eth0@if13: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP mode DEFAULT group default
    link/ether XX:XX:XX:XX:XX:XX brd ff:ff:ff:ff:ff:ff link-netnsid 0
24: tun0: <POINTOPOINT,NOARP,UP,LOWER_UP> mtu 1320 qdisc noqueue state UNKNOWN mode DEFAULT group default
    link/none

I am also getting:
WARN [dns] exchanging over tls connection for request IN AAAA #####.#####.#####.: read tcp 10.5.0.2:57930->1.1.1.1:853: i/o timeout

Did you ever find anything out? I am using nordvpn with wiregaurd, I have mtu 1320on thetun0` line

@floriegl
Copy link

@neal421 can you provide the output of docker run --rm alpine:3.20 /bin/sh -c "apk add knot-utils && kdig -t A -d @1.1.1.1 +tls-ca +tls-host=cloudflare-dns.com #####.#####.#####"? I have the assumption that the problem is that gluetun tries to query the IPv6 address of a domain which only have an IPv4 multiple times because it doesn't register that the domain doesn't have a AAAA record.

@neal421
Copy link

neal421 commented Dec 20, 2024

@floriegl Sure thing, this is what I got.

`root@servarr:~# docker run --rm alpine:3.20 /bin/sh -c "apk add knot-utils && kdig -t A -d @1.1.1.1 +tls-ca +tls-host=cloudflare-dns.com oneofthedomains.com"
Unable to find image 'alpine:3.20' locally
3.20: Pulling from library/alpine
da9db072f522: Already exists
Digest: sha256:1e42bbe2508154c9126d48c2b8a75420c3544343bf86fd041fb7527e017a4b4a
Status: Downloaded newer image for alpine:3.20
fetch https://dl-cdn.alpinelinux.org/alpine/v3.20/main/x86_64/APKINDEX.tar.gz
fetch https://dl-cdn.alpinelinux.org/alpine/v3.20/community/x86_64/APKINDEX.tar.gz
(1/18) Installing gmp (6.3.0-r1)
(2/18) Installing nettle (3.9.1-r0)
(3/18) Installing libunistring (1.2-r0)
(4/18) Installing libidn2 (2.3.7-r0)
(5/18) Installing libffi (3.4.6-r0)
(6/18) Installing libtasn1 (4.19.0-r2)
(7/18) Installing p11-kit (0.25.3-r0)
(8/18) Installing gnutls (3.8.5-r0)
(9/18) Installing lmdb (0.9.32-r0)
(10/18) Installing ngtcp2 (1.5.0-r0)
(11/18) Installing ngtcp2-gnutls (1.5.0-r0)
(12/18) Installing knot-libs (3.3.9-r0)
(13/18) Installing ncurses-terminfo-base (6.4_p20240420-r2)
(14/18) Installing libncursesw (6.4_p20240420-r2)
(15/18) Installing libedit (20240517.3.1-r0)
(16/18) Installing nghttp2-libs (1.62.1-r0)
(17/18) Installing userspace-rcu (0.14.0-r2)
(18/18) Installing knot-utils (3.3.9-r0)
Executing busybox-1.36.1-r29.trigger
OK: 20 MiB in 32 packages
;; DEBUG: Querying for owner(free.btr.kz.), class(1), type(1), server(1.1.1.1), port(853), protocol(TCP)
;; DEBUG: TLS, imported 147 system certificates
;; DEBUG: TLS, received certificate hierarchy:
;; DEBUG: #1, C=US,ST=California,L=San Francisco,O=Cloudflare, Inc.,CN=cloudflare-dns.com
;; DEBUG: SHA-256 PIN: 4pqQ+yl3lAtRvKdoCCUR8iDmA53I+cJ7orgBLiF08kQ=
;; DEBUG: #2, C=US,O=DigiCert Inc,CN=DigiCert Global G2 TLS RSA SHA256 2020 CA1
;; DEBUG: SHA-256 PIN: Wec45nQiFwKvHtuHxSAMGkt19k+uPSw9JlEkxhvYPHk=
;; DEBUG: TLS, skipping certificate PIN check
;; DEBUG: TLS, The certificate is trusted.
;; TLS session (TLS1.3)-(ECDHE-X25519)-(ECDSA-SECP256R1-SHA256)-(AES-256-GCM)
;; ->>HEADER<<- opcode: QUERY; status: SERVFAIL; id: 58557
;; Flags: qr rd ra; QUERY: 1; ANSWER: 0; AUTHORITY: 0; ADDITIONAL: 1

;; EDNS PSEUDOSECTION:`

@floriegl
Copy link

@qdm12 I assume that the WARN [dns] exchanging over tls connection for request IN AAAA #####.#####.#####.: read tcp 10.5.0.2:57930->1.1.1.1:853: i/o timeout problem is that gluetun tries to query the IPv6 address of a domain which only has an IPv4 address. As it somehow receives the SERVFAIL for an AAAA record as an i/o timeout, it retries multiple times until it gives up for some minutes/hours. If you want to replicate it, you could try with frogfind.com, as it also does not have a AAAA record (and also no https).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests