Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Very slow response from upstream servers with a lot of context deadline exceeded #7357

Closed
4 tasks done
SukkaW opened this issue Oct 18, 2024 · 15 comments
Closed
4 tasks done
Milestone

Comments

@SukkaW
Copy link

SukkaW commented Oct 18, 2024

Prerequisites

Platform (OS and CPU architecture)

Linux, ARM64

Raspberry Pi 4

sukka@sukka-pi:/tmp $ uname -a
Linux sukka-pi 6.6.51+rpt-rpi-v8 #1 SMP PREEMPT Debian 1:6.6.51-1+rpt3 (2024-10-08) aarch64 GNU/Linux

Installation

GitHub releases or script from README

Setup

On one machine

AdGuard Home version

v0.107.53

Action

Replace the following command with the one you're calling or a
description of the failing action:

nslookup -debug -type=a 'www.example.com' '127.0.0.1'
dig www.example.com @127.0.0.1

Expected result

The result returned immdiately.

Actual result

The result returned in about 20 seconds.

Additional information and/or screenshots

I have set following upstream servers in the DNS settings:

https://120.53.53.53/dns-query
https://1.12.12.12/dns-query
tls://120.53.53.53
tls://1.12.12.12

But when I use the Test upstreams button, AdGuardHome says all those servers could not be used:

image

I have tested those servers locally, they return the result very fast, so definitely no internet connectivity issue, and also there is no problem with the upstream servers:

sukka@sukka-pi:/tmp $ time dig www.example.com @120.53.53.53 +https +short
93.184.215.14

real    0m0.342s
user    0m0.060s
sys     0m0.014s
sukka@sukka-pi:/tmp $ time dig www.example.com @120.53.53.53 +tls +short
93.184.215.14

real    0m0.250s
user    0m0.054s
sys     0m0.025s
sukka@sukka-pi:/tmp $ time dig www.example.com @1.12.12.12 +https +short
93.184.215.14

real    0m0.128s
user    0m0.056s
sys     0m0.024s
sukka@sukka-pi:/tmp $ time dig www.example.com @1.12.12.12 +tls +short
93.184.215.14

real    0m0.185s
user    0m0.059s
sys     0m0.020s

I have also tried with curl:

sukka@sukka-pi:/tmp $ time curl https://120.53.53.53/dns-query

real    0m0.212s
user    0m0.147s
sys     0m0.025s

sukka@sukka-pi:/tmp $ time curl https://1.12.12.12/dns-query

real    0m0.214s
user    0m0.149s
sys     0m0.016s

I then enabled log.verbose in the config file (AdGuardHome.yaml), and I got those logs:

2024/10/18 22:29:36.482822 1090#350 [error] dnsproxy: exchange failed upstream=tls://1.12.12.12:853 question=";www.example.com.\tIN\t A" duration=10.011039607s err="getting conn to tls://1.12.12.12:853: connecting to 1.12.12.12: read tcp 10.10.1.200:60174->1.12.12.12:853: i/o timeout"
2024/10/18 22:29:38.492967 1090#485 [debug] started GET 10.10.1.200:11451 /control/status
2024/10/18 22:29:38.494299 1090#485 [debug] finished GET 10.10.1.200:11451 /control/status in 1.363265ms
2024/10/18 22:29:46.495655 1090#350 [error] dnsproxy: exchange failed upstream=tls://120.53.53.53:853 question=";www.example.com.\tIN\t A" duration=10.012209931s err="getting conn to tls://120.53.53.53:853: connecting to 120.53.53.53: read tcp 10.10.1.200:40744->120.53.53.53:853: i/o timeout"
2024/10/18 22:29:46.496177 1090#350 [debug] dnsproxy: resolving err src=upstream err="all upstreams failed to exchange request: requesting https://120.53.53.53:443/dns-query: Get \"https://120.53.53.53:443/dns-query?dns=AAABAAABAAAAAAABA3d3dwdleGFtcGxlA2NvbQAAAQABAAApCAAAAIAAAAA\": context deadline exceeded\nrequesting https://1.12.12.12:443/dns-query: Get \"https://1.12.12.12:443/dns-query?dns=AAABAAABAAAAAAABA3d3dwdleGFtcGxlA2NvbQAAAQABAAApCAAAAIAAAAA\": net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)\ngetting conn to tls://1.12.12.12:853: connecting to 1.12.12.12: read tcp 10.10.1.200:60174->1.12.12.12:853: i/o timeout\ngetting conn to tls://120.53.53.53:853: connecting to 120.53.53.53: read tcp 10.10.1.200:40744->120.53.53.53:853: i/o timeout"
2024/10/18 22:29:46.496360 1090#350 [debug] dnsforward: finished processing upstream
2024/10/18 22:29:46.496644 1090#350 [debug] dnsproxy: out line_num=1 line=";; opcode: QUERY, status: SERVFAIL, id: 32875"
2024/10/18 22:29:46.496905 1090#350 [debug] dnsproxy: out line_num=2 line=";; flags: qr rd ra; QUERY: 1, ANSWER: 0, AUTHORITY: 0, ADDITIONAL: 0"
2024/10/18 22:29:46.497117 1090#350 [debug] dnsproxy: out line_num=3 line=""
2024/10/18 22:29:46.497331 1090#350 [debug] dnsproxy: out line_num=4 line=";; QUESTION SECTION:"
2024/10/18 22:29:46.497545 1090#350 [debug] dnsproxy: out line_num=5 line=";www.example.com.\tIN\t A"
2024/10/18 22:29:46.497752 1090#350 [debug] dnsproxy: out line_num=6 line=""
2024/10/18 22:29:46.498229 1090#350 [debug] dnsproxy: handling dns request proto=udp err="using request handler: all upstreams failed to exchange request: requesting https://120.53.53.53:443/dns-query: Get \"https://120.53.53.53:443/dns-query?dns=AAABAAABAAAAAAABA3d3dwdleGFtcGxlA2NvbQAAAQABAAApCAAAAIAAAAA\": context deadline exceeded\nrequesting https://1.12.12.12:443/dns-query: Get \"https://1.12.12.12:443/dns-query?dns=AAABAAABAAAAAAABA3d3dwdleGFtcGxlA2NvbQAAAQABAAApCAAAAIAAAAA\": net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)\ngetting conn to tls://1.12.12.12:853: connecting to 1.12.12.12: read tcp 10.10.1.200:60174->1.12.12.12:853: i/o timeout\ngetting conn to tls://120.53.53.53:853: connecting to 120.53.53.53: read tcp 10.10.1.200:40744->120.53.53.53:853: i/o timeout"
2024/10/18 22:32:17.276231 1090#486 [debug] started GET 10.10.1.200:11451 /control/status
2024/10/18 22:32:17.277575 1090#486 [debug] finished GET 10.10.1.200:11451 /control/status in 1.398173ms
2024/10/18 22:32:19.917784 1090#486 [debug] started POST 10.10.1.200:11451 /control/test_upstream_dns
2024/10/18 22:32:29.919010 ERROR response received addr=https://1.12.12.12:443/dns-query proto=tcp status="requesting https://1.12.12.12:443/dns-query: Get \"https://1.12.12.12:443/dns-query?dns=AAABAAABAAAAAAAABHRlc3QAAAEAAQ\": net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)"
2024/10/18 22:32:29.919133 ERROR response received addr=https://120.53.53.53:443/dns-query proto=tcp status="requesting https://120.53.53.53:443/dns-query: Get \"https://120.53.53.53:443/dns-query?dns=AAABAAABAAAAAAAABHRlc3QAAAEAAQ\": context deadline exceeded"
2024/10/18 22:32:29.933972 1090#486 [debug] finished POST 10.10.1.200:11451 /control/test_upstream_dns in 10.016169079s
2024/10/18 22:32:48.354208 1090#486 [debug] started GET 10.10.1.200:11451 /control/status
2024/10/18 22:32:48.355457 1090#486 [debug] finished GET 10.10.1.200:11451 /control/status in 1.294076ms
2024/10/18 22:34:47.290447 1090#554 [debug] started GET 10.10.1.200:11451 /control/status
2024/10/18 22:34:47.291737 1090#554 [debug] finished GET 10.10.1.200:11451 /control/status in 1.338358ms
2024/10/18 22:34:48.408058 1090#554 [debug] started POST 10.10.1.200:11451 /control/test_upstream_dns
2024/10/18 22:34:58.409273 ERROR response received addr=https://120.53.53.53:443/dns-query proto=tcp status="requesting https://120.53.53.53:443/dns-query: Get \"https://120.53.53.53:443/dns-query?dns=AAABAAABAAAAAAAABHRlc3QAAAEAAQ\": context deadline exceeded"
2024/10/18 22:34:58.409278 ERROR response received addr=https://1.12.12.12:443/dns-query proto=tcp status="requesting https://1.12.12.12:443/dns-query: Get \"https://1.12.12.12:443/dns-query?dns=AAABAAABAAAAAAAABHRlc3QAAAEAAQ\": net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)"
2024/10/18 22:34:58.445246 1090#554 [debug] finished POST 10.10.1.200:11451 /control/test_upstream_dns in 10.037199239s

Also average upstream servers response time:

image

So somehow AdGuardHome failed to connect to upstream servers while dig works just fine.

@SukkaW SukkaW changed the title Very slow response from upstream DNS Very slow response from upstream servers with a lot of context deadline exceeded Oct 18, 2024
@xWTF
Copy link

xWTF commented Oct 19, 2024

Same issue, v0.107.53 takes ~20s to resolve through DoH and throws several generic timeout messages. The problem is gone after reverting back to v0.107.52.
Possibly related: #7346 (similar issue from v0.108.0-b.58, which is released between these two versions).

Environment: Linux 4.1.52 aarch64 ASUSWRT-Merlin 3004.388.7

I'm stunned by the huge amount of changes between these two versions, not sure how to investigate what's going on from these 29,523 additions and 27,647 deletions.

@SukkaW
Copy link
Author

SukkaW commented Oct 19, 2024

@xWTF Even w/ verbose logging I can't see any useful information. So I assume the most likely cause would be the bumping of go from 1.22 to 1.23.

@xWTF
Copy link

xWTF commented Oct 19, 2024

@xWTF Even w/ verbose logging I can't see any useful information. So I assume the most likely cause would be the bumping of go from 1.22 to 1.23.

Yep, on my device the log simply says "timeout" without any additional information, appears to be a regular connection problem but curl and old version works fine.
Nothing suspicious from the changelog / milestone, so the version bump could be the cause. Currently I don't have evidence tho, the changes are just too much for me to review.

adguard pushed a commit to AdguardTeam/dnsproxy that referenced this issue Nov 28, 2024
Updates AdguardTeam/AdGuardHome#7357.

Squashed commit of the following:

commit 4170865
Author: Stanislav Chzhen <s.chzhen@adguard.com>
Date:   Thu Nov 28 16:53:39 2024 +0300

    upstream: imp docs

commit 0e89c40
Author: Stanislav Chzhen <s.chzhen@adguard.com>
Date:   Thu Nov 28 15:57:38 2024 +0300

    upstream: doh close iddle conns
@SukkaW
Copy link
Author

SukkaW commented Nov 29, 2024

@schzhn Hello, I see there is a commit mentioning the issue. Is there a nightly release I can download and help you test?

adguard pushed a commit that referenced this issue Nov 29, 2024
Updates #7357.

Squashed commit of the following:

commit 599dee0
Author: Stanislav Chzhen <s.chzhen@adguard.com>
Date:   Fri Nov 29 14:52:16 2024 +0300

    all: upd golibs

commit eb90c49
Author: Stanislav Chzhen <s.chzhen@adguard.com>
Date:   Fri Nov 29 14:32:57 2024 +0300

    all: upd proxy

commit 12b8e51
Author: Stanislav Chzhen <s.chzhen@adguard.com>
Date:   Thu Nov 28 20:04:58 2024 +0300

    all: upd proxy

commit 77fcabc
Author: Stanislav Chzhen <s.chzhen@adguard.com>
Date:   Thu Nov 28 17:13:30 2024 +0300

    all: upd chlog

commit 714f93d
Author: Stanislav Chzhen <s.chzhen@adguard.com>
Date:   Thu Nov 28 17:02:34 2024 +0300

    all: upd proxy
@schzhn
Copy link
Member

schzhn commented Nov 29, 2024

The edge release contains several bug fixes.
Could you please check it?

You can also download the binary for your platform from the following link
https://static.adtidy.org/adguardhome/edge/version.json

@SukkaW
Copy link
Author

SukkaW commented Nov 30, 2024

@schzhn Thanks for your reply! However, I can still re-produce the issue by using dnsproxy alone.

Here is how I started dnsproxy:

./dnsproxy --listen=127.0.0.1 --port=1053 -u tls://1.12.12.12 -u tls://120.53.53.53 -u https://1.12.12.12/dns-query -u https://120.53.53.53/dns-query --verbose

Here is what happened I tried to query example.com A through 127.0.0.1:1053:

sukka@sukka-pi:~ $ dig example.com @127.0.0.1 -p 1053
;; communications error to 127.0.0.1#1053: timed out
;; communications error to 127.0.0.1#1053: timed out
;; communications error to 127.0.0.1#1053: timed out

; <<>> DiG 9.18.28-1~deb12u2-Debian <<>> example.com @127.0.0.1 -p 1053
;; global options: +cmd
;; no servers could be reached

And here is the log output of dnsproxy (it clearly shows I am using the latest version of dnsproxy and dig @127.0.0.1 -p 1053 does reach the dnsproxy):

2024/11/30 12:52:58.824171 INFO dnsproxy starting version=v0.73.4 revision=f901a5f branch=HEAD commit_time=1732813243
2024/11/30 12:52:58.824718 DEBUG hosts files are enabled
2024/11/30 12:52:58.824827 DEBUG hosts files are not specified, using default paths=[etc/hosts]
2024/11/30 12:52:58.825543 DEBUG set upstream idx=0 addr=tls://1.12.12.12:853
2024/11/30 12:52:58.825619 DEBUG set upstream idx=1 addr=tls://120.53.53.53:853
2024/11/30 12:52:58.825694 DEBUG set upstream idx=2 addr=https://1.12.12.12:443/dns-query
2024/11/30 12:52:58.825764 DEBUG set upstream idx=3 addr=https://120.53.53.53:443/dns-query
2024/11/30 12:52:58.825889 INFO upstream mode is set prefix=dnsproxy mode=load_balance
2024/11/30 12:52:58.825922 INFO cache disabled prefix=dnsproxy
2024/11/30 12:52:58.825948 INFO starting dns proxy server prefix=dnsproxy
2024/11/30 12:52:58.826009 INFO creating udp server socket prefix=dnsproxy addr=127.0.0.1:1053
2024/11/30 12:52:58.826262 INFO listening to udp prefix=dnsproxy addr=127.0.0.1:1053
2024/11/30 12:52:58.826326 INFO creating tcp server socket prefix=dnsproxy addr=127.0.0.1:1053
2024/11/30 12:52:58.826882 INFO listening to tcp prefix=dnsproxy addr=127.0.0.1:1053
2024/11/30 12:52:58.827605 INFO entering udp listener loop prefix=dnsproxy addr=127.0.0.1:1053
2024/11/30 12:52:58.827772 INFO entering listener loop prefix=dnsproxy proto=tcp addr=127.0.0.1:1053
2024/11/30 12:53:08.571333 DEBUG handling new udp packet prefix=dnsproxy raddr=127.0.0.1:58687
2024/11/30 12:53:08.572074 DEBUG in prefix=dnsproxy line_num=1 line=";; opcode: QUERY, status: NOERROR, id: 56048"
2024/11/30 12:53:08.572134 DEBUG in prefix=dnsproxy line_num=2 line=";; flags: rd ad; QUERY: 1, ANSWER: 0, AUTHORITY: 0, ADDITIONAL: 1"
2024/11/30 12:53:08.572165 DEBUG in prefix=dnsproxy line_num=3 line=""
2024/11/30 12:53:08.572230 DEBUG in prefix=dnsproxy line_num=4 line=";; OPT PSEUDOSECTION:"
2024/11/30 12:53:08.572262 DEBUG in prefix=dnsproxy line_num=5 line="; EDNS: version 0; flags:; udp: 1232"
2024/11/30 12:53:08.572291 DEBUG in prefix=dnsproxy line_num=6 line="; COOKIE: c5580290450ae6ce"
2024/11/30 12:53:08.572316 DEBUG in prefix=dnsproxy line_num=7 line=""
2024/11/30 12:53:08.572344 DEBUG in prefix=dnsproxy line_num=8 line=";; QUESTION SECTION:"
2024/11/30 12:53:08.572427 DEBUG in prefix=dnsproxy line_num=9 line=";example.com.\tIN\t A"
2024/11/30 12:53:08.572503 DEBUG in prefix=dnsproxy line_num=10 line=""
2024/11/30 12:53:08.572685 DEBUG handling request prefix=default_handler req=";example.com.\tIN\t A"
2024/11/30 12:53:08.572780 DEBUG no hosts records found prefix=default_handler name=example.com qtype=1
2024/11/30 12:53:08.572853 DEBUG not caching prefix=dnsproxy reason=disabled
2024/11/30 12:53:08.573030 DEBUG dialing prefix=bootstrap addr=120.53.53.53:853 idx=1 total=1
2024/11/30 12:53:08.586024 DEBUG connection succeeded prefix=bootstrap addr=120.53.53.53:853 elapsed=12.897543ms
2024/11/30 12:53:13.576973 DEBUG handling new udp packet prefix=dnsproxy raddr=127.0.0.1:48210
2024/11/30 12:53:13.577196 DEBUG in prefix=dnsproxy line_num=1 line=";; opcode: QUERY, status: NOERROR, id: 56048"
2024/11/30 12:53:13.577281 DEBUG in prefix=dnsproxy line_num=2 line=";; flags: rd ad; QUERY: 1, ANSWER: 0, AUTHORITY: 0, ADDITIONAL: 1"
2024/11/30 12:53:13.577351 DEBUG in prefix=dnsproxy line_num=3 line=""
2024/11/30 12:53:13.577419 DEBUG in prefix=dnsproxy line_num=4 line=";; OPT PSEUDOSECTION:"
2024/11/30 12:53:13.577487 DEBUG in prefix=dnsproxy line_num=5 line="; EDNS: version 0; flags:; udp: 1232"
2024/11/30 12:53:13.577545 DEBUG in prefix=dnsproxy line_num=6 line="; COOKIE: c5580290450ae6ce"
2024/11/30 12:53:13.577573 DEBUG in prefix=dnsproxy line_num=7 line=""
2024/11/30 12:53:13.577601 DEBUG in prefix=dnsproxy line_num=8 line=";; QUESTION SECTION:"
2024/11/30 12:53:13.577629 DEBUG in prefix=dnsproxy line_num=9 line=";example.com.\tIN\t A"
2024/11/30 12:53:13.577654 DEBUG in prefix=dnsproxy line_num=10 line=""
2024/11/30 12:53:13.577721 DEBUG handling request prefix=default_handler req=";example.com.\tIN\t A"
2024/11/30 12:53:13.577770 DEBUG no hosts records found prefix=default_handler name=example.com qtype=1
2024/11/30 12:53:13.577803 DEBUG not caching prefix=dnsproxy reason=disabled
2024/11/30 12:53:13.577871 DEBUG creating a new http client
2024/11/30 12:53:13.578101 DEBUG got error, switching to http/2 for this upstream err="HTTP3 support is not enabled"
2024/11/30 12:53:13.578277 DEBUG sending request addr=https://1.12.12.12:443/dns-query proto=tcp qtype=A qname=example.com.
2024/11/30 12:53:13.579023 DEBUG dialing prefix=bootstrap addr=1.12.12.12:443 idx=1 total=1
2024/11/30 12:53:13.592625 DEBUG connection succeeded prefix=bootstrap addr=1.12.12.12:443 elapsed=13.502458ms
2024/11/30 12:53:18.582829 DEBUG handling new udp packet prefix=dnsproxy raddr=127.0.0.1:49099
2024/11/30 12:53:18.583666 DEBUG in prefix=dnsproxy line_num=1 line=";; opcode: QUERY, status: NOERROR, id: 56048"
2024/11/30 12:53:18.583827 DEBUG in prefix=dnsproxy line_num=2 line=";; flags: rd ad; QUERY: 1, ANSWER: 0, AUTHORITY: 0, ADDITIONAL: 1"
2024/11/30 12:53:18.584179 DEBUG in prefix=dnsproxy line_num=3 line=""
2024/11/30 12:53:18.584304 DEBUG in prefix=dnsproxy line_num=4 line=";; OPT PSEUDOSECTION:"
2024/11/30 12:53:18.584434 DEBUG in prefix=dnsproxy line_num=5 line="; EDNS: version 0; flags:; udp: 1232"
2024/11/30 12:53:18.584508 DEBUG in prefix=dnsproxy line_num=6 line="; COOKIE: c5580290450ae6ce"
2024/11/30 12:53:18.584576 DEBUG in prefix=dnsproxy line_num=7 line=""
2024/11/30 12:53:18.584662 DEBUG in prefix=dnsproxy line_num=8 line=";; QUESTION SECTION:"
2024/11/30 12:53:18.585016 DEBUG in prefix=dnsproxy line_num=9 line=";example.com.\tIN\t A"
2024/11/30 12:53:18.585115 DEBUG in prefix=dnsproxy line_num=10 line=""
2024/11/30 12:53:18.585255 DEBUG handling request prefix=default_handler req=";example.com.\tIN\t A"
2024/11/30 12:53:18.585365 DEBUG no hosts records found prefix=default_handler name=example.com qtype=1
2024/11/30 12:53:18.585625 DEBUG not caching prefix=dnsproxy reason=disabled
2024/11/30 12:53:18.586307 DEBUG sending request addr=https://1.12.12.12:443/dns-query proto=tcp qtype=A qname=example.com.
2024/11/30 12:53:18.587150 DEBUG dialing prefix=bootstrap addr=1.12.12.12:443 idx=1 total=1
2024/11/30 12:53:18.587194 ERROR exchange failed prefix=dnsproxy upstream=tls://120.53.53.53:853 question=";example.com.\tIN\t A" duration=10.014163635s err="getting conn to tls://120.53.53.53:853: connecting to 120.53.53.53: read tcp 10.10.1.200:58792->120.53.53.53:853: i/o timeout"
2024/11/30 12:53:18.587748 DEBUG dialing prefix=bootstrap addr=1.12.12.12:853 idx=1 total=1
2024/11/30 12:53:18.598501 DEBUG connection succeeded prefix=bootstrap addr=1.12.12.12:853 elapsed=10.663176ms
2024/11/30 12:53:18.601223 DEBUG connection succeeded prefix=bootstrap addr=1.12.12.12:443 elapsed=13.734842ms
2024/11/30 12:53:23.583600 ERROR response received addr=https://1.12.12.12:443/dns-query proto=tcp status="requesting https://1.12.12.12:443/dns-query: Get \"https://1.12.12.12:443/dns-query?dns=AAABIAABAAAAAAABB2V4YW1wbGUDY29tAAABAAEAACkE0AAAAAAADAAKAAjFWAKQRQrmzg\": net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)"
2024/11/30 12:53:23.583781 DEBUG recreating the http client err="requesting https://1.12.12.12:443/dns-query: Get \"https://1.12.12.12:443/dns-query?dns=AAABIAABAAAAAAABB2V4YW1wbGUDY29tAAABAAEAACkE0AAAAAAADAAKAAjFWAKQRQrmzg\": net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)"
2024/11/30 12:53:23.583849 DEBUG got error, switching to http/2 for this upstream err="HTTP3 support is not enabled"
2024/11/30 12:53:23.584006 ERROR exchange failed prefix=dnsproxy upstream=https://1.12.12.12:443/dns-query question=";example.com.\tIN\t A" duration=10.006080394s err="requesting https://1.12.12.12:443/dns-query: Get \"https://1.12.12.12:443/dns-query?dns=AAABIAABAAAAAAABB2V4YW1wbGUDY29tAAABAAEAACkE0AAAAAAADAAKAAjFWAKQRQrmzg\": net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)"
2024/11/30 12:53:23.584124 DEBUG dialing prefix=bootstrap addr=1.12.12.12:853 idx=1 total=1
2024/11/30 12:53:23.617608 DEBUG connection succeeded prefix=bootstrap addr=1.12.12.12:853 elapsed=33.400298ms
2024/11/30 12:53:28.591839 ERROR response received addr=https://1.12.12.12:443/dns-query proto=tcp status="requesting https://1.12.12.12:443/dns-query: Get \"https://1.12.12.12:443/dns-query?dns=AAABIAABAAAAAAABB2V4YW1wbGUDY29tAAABAAEAACkE0AAAAAAADAAKAAjFWAKQRQrmzg\": context deadline exceeded"
2024/11/30 12:53:28.592428 DEBUG recreating the http client err="requesting https://1.12.12.12:443/dns-query: Get \"https://1.12.12.12:443/dns-query?dns=AAABIAABAAAAAAABB2V4YW1wbGUDY29tAAABAAEAACkE0AAAAAAADAAKAAjFWAKQRQrmzg\": context deadline exceeded"
2024/11/30 12:53:28.592509 DEBUG got error, switching to http/2 for this upstream err="HTTP3 support is not enabled"
2024/11/30 12:53:28.592652 DEBUG sending request addr=https://1.12.12.12:443/dns-query proto=tcp qtype=A qname=example.com.
2024/11/30 12:53:28.593205 DEBUG dialing prefix=bootstrap addr=1.12.12.12:443 idx=1 total=1
2024/11/30 12:53:28.599165 ERROR exchange failed prefix=dnsproxy upstream=tls://1.12.12.12:853 question=";example.com.\tIN\t A" duration=10.011421426s err="getting conn to tls://1.12.12.12:853: connecting to 1.12.12.12: read tcp 10.10.1.200:42712->1.12.12.12:853: i/o timeout"
2024/11/30 12:53:28.599260 DEBUG creating a new http client
2024/11/30 12:53:28.599310 DEBUG got error, switching to http/2 for this upstream err="HTTP3 support is not enabled"
2024/11/30 12:53:28.599374 DEBUG sending request addr=https://120.53.53.53:443/dns-query proto=tcp qtype=A qname=example.com.
2024/11/30 12:53:28.599820 DEBUG dialing prefix=bootstrap addr=120.53.53.53:443 idx=1 total=1
2024/11/30 12:53:28.623968 DEBUG connection succeeded prefix=bootstrap addr=1.12.12.12:443 elapsed=30.678959ms
2024/11/30 12:53:28.628789 DEBUG connection succeeded prefix=bootstrap addr=120.53.53.53:443 elapsed=28.895973ms
2024/11/30 12:53:28.838927 DEBUG response received addr=https://120.53.53.53:443/dns-query proto=tcp status=ok
2024/11/30 12:53:28.839104 DEBUG exchange successfully finished prefix=dnsproxy upstream=https://120.53.53.53:443/dns-query question=";example.com.\tIN\t A" duration=239.747516ms
2024/11/30 12:53:28.839148 DEBUG resolved prefix=dnsproxy src=upstream rtt=20.266226505s
2024/11/30 12:53:28.839245 DEBUG out prefix=dnsproxy line_num=1 line=";; opcode: QUERY, status: NOERROR, id: 56048"
2024/11/30 12:53:28.839277 DEBUG out prefix=dnsproxy line_num=2 line=";; flags: qr rd ra; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 1"
2024/11/30 12:53:28.839293 DEBUG out prefix=dnsproxy line_num=3 line=""
2024/11/30 12:53:28.839308 DEBUG out prefix=dnsproxy line_num=4 line=";; OPT PSEUDOSECTION:"
2024/11/30 12:53:28.839334 DEBUG out prefix=dnsproxy line_num=5 line="; EDNS: version 0; flags:; udp: 1232"
2024/11/30 12:53:28.839355 DEBUG out prefix=dnsproxy line_num=6 line=""
2024/11/30 12:53:28.839371 DEBUG out prefix=dnsproxy line_num=7 line=";; QUESTION SECTION:"
2024/11/30 12:53:28.839386 DEBUG out prefix=dnsproxy line_num=8 line=";example.com.\tIN\t A"
2024/11/30 12:53:28.839400 DEBUG out prefix=dnsproxy line_num=9 line=""
2024/11/30 12:53:28.839415 DEBUG out prefix=dnsproxy line_num=10 line=";; ANSWER SECTION:"
2024/11/30 12:53:28.839437 DEBUG out prefix=dnsproxy line_num=11 line="example.com.\t1870\tIN\tA\t93.184.215.14"
2024/11/30 12:53:28.839452 DEBUG out prefix=dnsproxy line_num=12 line=""
2024/11/30 12:53:28.843114 DEBUG response received addr=https://1.12.12.12:443/dns-query proto=tcp status=ok
2024/11/30 12:53:28.843212 DEBUG exchange successfully finished prefix=dnsproxy upstream=https://1.12.12.12:443/dns-query question=";example.com.\tIN\t A" duration=10.257100796s
2024/11/30 12:53:28.843261 DEBUG resolved prefix=dnsproxy src=upstream rtt=10.257243238s
2024/11/30 12:53:28.843347 DEBUG out prefix=dnsproxy line_num=1 line=";; opcode: QUERY, status: NOERROR, id: 56048"
2024/11/30 12:53:28.843376 DEBUG out prefix=dnsproxy line_num=2 line=";; flags: qr rd ra; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 1"
2024/11/30 12:53:28.843392 DEBUG out prefix=dnsproxy line_num=3 line=""
2024/11/30 12:53:28.843407 DEBUG out prefix=dnsproxy line_num=4 line=";; OPT PSEUDOSECTION:"
2024/11/30 12:53:28.843423 DEBUG out prefix=dnsproxy line_num=5 line="; EDNS: version 0; flags:; udp: 1232"
2024/11/30 12:53:28.843443 DEBUG out prefix=dnsproxy line_num=6 line=""
2024/11/30 12:53:28.843465 DEBUG out prefix=dnsproxy line_num=7 line=";; QUESTION SECTION:"
2024/11/30 12:53:28.843481 DEBUG out prefix=dnsproxy line_num=8 line=";example.com.\tIN\t A"
2024/11/30 12:53:28.843495 DEBUG out prefix=dnsproxy line_num=9 line=""
2024/11/30 12:53:28.843509 DEBUG out prefix=dnsproxy line_num=10 line=";; ANSWER SECTION:"
2024/11/30 12:53:28.843528 DEBUG out prefix=dnsproxy line_num=11 line="example.com.\t1870\tIN\tA\t93.184.215.14"
2024/11/30 12:53:28.843544 DEBUG out prefix=dnsproxy line_num=12 line=""
2024/11/30 12:53:33.618226 ERROR exchange failed prefix=dnsproxy upstream=tls://1.12.12.12:853 question=";example.com.\tIN\t A" duration=10.034061513s err="getting conn to tls://1.12.12.12:853: connecting to 1.12.12.12: read tcp 10.10.1.200:42726->1.12.12.12:853: i/o timeout"
2024/11/30 12:53:33.618370 DEBUG sending request addr=https://120.53.53.53:443/dns-query proto=tcp qtype=A qname=example.com.
2024/11/30 12:53:33.648000 DEBUG response received addr=https://120.53.53.53:443/dns-query proto=tcp status=ok
2024/11/30 12:53:33.648113 DEBUG exchange successfully finished prefix=dnsproxy upstream=https://120.53.53.53:443/dns-query question=";example.com.\tIN\t A" duration=29.732068ms
2024/11/30 12:53:33.648163 DEBUG resolved prefix=dnsproxy src=upstream rtt=20.0703218s
2024/11/30 12:53:33.648282 DEBUG out prefix=dnsproxy line_num=1 line=";; opcode: QUERY, status: NOERROR, id: 56048"
2024/11/30 12:53:33.648332 DEBUG out prefix=dnsproxy line_num=2 line=";; flags: qr rd ra; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 1"
2024/11/30 12:53:33.648411 DEBUG out prefix=dnsproxy line_num=3 line=""
2024/11/30 12:53:33.648449 DEBUG out prefix=dnsproxy line_num=4 line=";; OPT PSEUDOSECTION:"
2024/11/30 12:53:33.648492 DEBUG out prefix=dnsproxy line_num=5 line="; EDNS: version 0; flags:; udp: 1232"
2024/11/30 12:53:33.648519 DEBUG out prefix=dnsproxy line_num=6 line=""
2024/11/30 12:53:33.648546 DEBUG out prefix=dnsproxy line_num=7 line=";; QUESTION SECTION:"
2024/11/30 12:53:33.648577 DEBUG out prefix=dnsproxy line_num=8 line=";example.com.\tIN\t A"
2024/11/30 12:53:33.648602 DEBUG out prefix=dnsproxy line_num=9 line=""
2024/11/30 12:53:33.648628 DEBUG out prefix=dnsproxy line_num=10 line=";; ANSWER SECTION:"
2024/11/30 12:53:33.648657 DEBUG out prefix=dnsproxy line_num=11 line="example.com.\t1865\tIN\tA\t93.184.215.14"
2024/11/30 12:53:33.648682 DEBUG out prefix=dnsproxy line_num=12 line=""

Also to make sure my network is working, I have tried dig using the same DoT & DoH servers:

sukka@sukka-pi:~ $ time dig example.com @120.53.53.53 +tls +short
93.184.215.14

real    0m0.121s
user    0m0.060s
sys     0m0.013s
sukka@sukka-pi:~ $ time dig example.com @120.53.53.53 +https +short
93.184.215.14

real    0m0.083s
user    0m0.023s
sys     0m0.012s
sukka@sukka-pi:~ $ time dig example.com @1.12.12.12 +tls +short
93.184.215.14

real    0m0.134s
user    0m0.051s
sys     0m0.020s
sukka@sukka-pi:~ $ time dig example.com @1.12.12.12 +https +short
93.184.215.14

real    0m0.130s
user    0m0.051s
sys     0m0.030s

@Skyxim
Copy link

Skyxim commented Nov 30, 2024

Same problem, when using doh.pub, there is a long waiting time, and a large number of timeouts appear in the log, which may be related to compatibility adgh.log

It might be related to TLS version support. DNSPod does not support TLS 1.3. Could it be that an issue in the fallback handling is causing significant delays?

Using WireGuard to capture packets with dnsproxy reveals a large number of handshake failures.
dnspoxy.pcapng.zip

q @https://doh.pub/dns-query github.com -v --tls-min-version=1.2 A
DEBU[0000] Name: github.com                             
DEBU[0000] RR types: [A]                                
DEBU[0000] Server(s): [https://doh.pub/dns-query]       
DEBU[0000] Using server https://doh.pub:443/dns-query with transport http 
DEBU[0000] Using HTTP(s) transport: https://doh.pub:443/dns-query 
DEBU[0000] [http] sending GET request to https://doh.pub:443/dns-query?dns=NmUBAAABAAAAAAAABmdpdGh1YgNjb20AAAEAAQ 
github.com. 1m A 20.205.243.166


q @https://doh.pub/dns-query github.com -v --tls-min-version=1.3 A
DEBU[0000] Name: github.com                             
DEBU[0000] RR types: [A]                                
DEBU[0000] Server(s): [https://doh.pub/dns-query]       
DEBU[0000] Using server https://doh.pub:443/dns-query with transport http 
DEBU[0000] Using HTTP(s) transport: https://doh.pub:443/dns-query 
DEBU[0000] [http] sending GET request to https://doh.pub:443/dns-query?dns=w_YBAAABAAAAAAAABmdpdGh1YgNjb20AAAEAAQ 
FATA[0000] requesting https://doh.pub:443/dns-query?dns=w_YBAAABAAAAAAAABmdpdGh1YgNjb20AAAEAAQ: Get "https://doh.pub:443/dns-query?dns=w_YBAAABAAAAAAAABmdpdGh1YgNjb20AAAEAAQ": remote error: tls: handshake failure 
image

@xWTF
Copy link

xWTF commented Nov 30, 2024

@Skyxim The handshake failure within your pcap appears to be a MTU issue, which is not the problem we encounter in agh v0.107.53+, lower your local MTU and try again.

This needs further investigation. Yes it's MTU issue, but not on our local side.

@xWTF
Copy link

xWTF commented Nov 30, 2024

Well, I've confirmed this is indeed an MSS issue (see the additional comments at bottom). I never thought about this possibility before.


In AGH 0.73.2, the TLS client hello is a classic 252 bytes, which is short enough to be sent without any issue.

image

However, in 0.73.3+, a key_share extension is sent, boosting the record size to 1476 bytes, which can cause problems when the MSS is incorrect.

image

image

The change is related to the Golang version bump, as we guessed previously. See here for the relevant changelog.

I can confirm this by setting GODEBUG=tlskyber=0, which disables the key_share extension, and the handshake can be completed normally.


It seems like the clamp to pmtu option in ROS is not working as expected. It's clamping the MSS to 1452, while the maximum possible MSS (tested manually) is 1444.

I'm not sure what's going wrong here since the PMTU discovery is supposed to work reliably; maybe some poorly designed TCP audit device from ISP can't handle these large packets and dropped them.

This strange MSS issue only occurs to Tencent DoH servers (1.12.12.12 and 120.53.53.53). I've tested various other DoH servers (223.5.5.5, 9.9.9.9, 1.1.1.1), and they all function correctly.

The maximum possible MSS to Tencent DoH servers is 8 bytes smaller (1444) than MSS advertised from their side (1452). I suspect that Tencent engineers may have misconfigured their network while setting up TOA (which are commonly used to pass the real IP from a load balancer to VMs/pods), since the TOA header is 8 bytes.

My current solution is to clamp MSS from these two IPs using mangle rules:

image

This is, unfortunately, not an easy to implement solution for most home grade routers.

If you're running AGH on a Linux system, you can do this via iptables and clamp on incoming packets. You may also set the GODEBUG=tlskyber=0 environment, or switch to other DoH/DoT providers without this issue.

@Skyxim
Copy link

Skyxim commented Nov 30, 2024

You may also set the GODEBUG=tlskyber=0 environment, or switch to other DoH/DoT providers without this issue.

OK, it seems that my problem is not related to this issue. Turning off tlskyber is indeed effective. Thanks for your help

@xWTF
Copy link

xWTF commented Nov 30, 2024

You may also set the GODEBUG=tlskyber=0 environment, or switch to other DoH/DoT providers without this issue.

OK, it seems that my problem is not related to this issue. Turning off tlskyber is indeed effective. Thanks for your help

Sorry for the confusion caused by the previous reply. It is indeed the same problem. The MSS configuration error occurs on Tencent's side, not on our local side. You may want to check the edited version of my second comment for more details.

@Skyxim
Copy link

Skyxim commented Nov 30, 2024

You may also set the GODEBUG=tlskyber=0 environment, or switch to other DoH/DoT providers without this issue.

OK, it seems that my problem is not related to this issue. Turning off tlskyber is indeed effective. Thanks for your help好吧,看来我的问题与这个问题无关。关闭 tlskyber 确实有效。感谢您的帮助

Sorry for the confusion caused by the previous reply. It is indeed the same problem. The MSS configuration error occurs on Tencent's side, not on our local side. You may want to check the edited version of my second comment for more details.

Yes, I saw that reply as well. I will also try to report it to Tencent. Thanks again!

@SukkaW
Copy link
Author

SukkaW commented Nov 30, 2024

Similar issue: hashicorp/terraform-provider-aws#39311

cc @xWTF

That issue is caused by the AWS Firewall dropping over stretched TLS client hello (also caused by Go 1.23), resulting in Go 1.23 no longer being able to access AWS API. The true cause is not the same (badly made firewall vs wrong MSS), but the root cause (Go 1.23 w/ overstretched TLS client hello) is the same.

I have submitted a report on V2EX: https://www.v2ex.com/t/1094021

@SukkaW
Copy link
Author

SukkaW commented Nov 30, 2024

Created a few social posts to raise the concern:

@schzhn
Copy link
Member

schzhn commented Dec 5, 2024

It appears that Tencent has updated the TLS configuration of their DNS servers, as I can no longer reproduce the issue on my setup. Thank you all for reporting, testing, and investigating this matter.

@schzhn schzhn closed this as completed Dec 5, 2024
@EugeneOne1 EugeneOne1 modified the milestones: v0.107.55, v0.107.56 Dec 5, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants