Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[增强] 支持 udp:// 上游 #160

Closed
windmsn opened this issue Apr 10, 2024 · 37 comments
Closed

[增强] 支持 udp:// 上游 #160

windmsn opened this issue Apr 10, 2024 · 37 comments
Labels
enhancement New feature or request

Comments

@windmsn
Copy link

windmsn commented Apr 10, 2024

拓扑如下:
国内:局域网设备->dnsmasq->chinadns-ng->211.136.192.6/120.196.165.24(运营商DNS)
国外:局域网设备->dnsmasq->chinadns-ng->dns2tcp->8.8.8.8
dnsmasq有ipv6需求,不能扔
dns2tcp只能监听udp,使用chinadns-ng自带的tcp://8.8.8.8会出现大量的connection reset by peer.所以只能用dns2tcp

Wed Apr 10 21:36:27 2024 kern.warn kernel: [16615901.916000] connection reset by peer.
Wed Apr 10 21:42:10 2024 kern.warn kernel: [16616244.456000] connection reset by peer.
Wed Apr 10 21:42:10 2024 kern.warn kernel: [16616244.476000] connection reset by peer.
Wed Apr 10 21:42:11 2024 kern.warn kernel: [16616245.500000] connection reset by peer.
Wed Apr 10 21:42:11 2024 kern.warn kernel: [16616245.516000] connection reset by peer.
Wed Apr 10 21:42:13 2024 kern.warn kernel: [16616247.524000] connection reset by peer.
Wed Apr 10 21:42:42 2024 kern.warn kernel: [16616276.436000] connection reset by peer.
Wed Apr 10 21:42:42 2024 kern.warn kernel: [16616276.460000] connection reset by peer.
Wed Apr 10 21:42:43 2024 kern.warn kernel: [16616277.492000] connection reset by peer.
Wed Apr 10 21:42:43 2024 kern.warn kernel: [16616277.496000] connection reset by peer.
Wed Apr 10 21:42:43 2024 kern.warn kernel: [16616277.500000] connection reset by peer.
Wed Apr 10 21:42:43 2024 kern.warn kernel: [16616277.508000] connection reset by peer.
Wed Apr 10 21:42:43 2024 kern.warn kernel: [16616277.512000] connection reset by peer.
Wed Apr 10 21:42:43 2024 kern.warn kernel: [16616277.640000] connection reset by peer.
Wed Apr 10 21:42:43 2024 kern.warn kernel: [16616277.644000] connection reset by peer.

今天使用时出现一个情况,手机在刷抖音的时候。app会调起大量的tcp的dns查询,dnsmasq监听53端口,接收到请求后转发到chinadns-ng,但chinadns-ng上游全都只支持udp导致dnsmasq在等待结果时启动了多个进程。

[root@ManTou:/root]#netstat -anlp | grep dns
tcp        0      0 0.0.0.0:53              0.0.0.0:*               LISTEN      11779/dnsmasq
tcp        0      0 127.0.0.1:60740         127.0.0.1:5354          ESTABLISHED 11781/dnsmasq
tcp        0      0 127.0.0.1:48100         127.0.0.1:5354          ESTABLISHED 11783/dnsmasq
tcp        0      0 127.0.0.1:42354         127.0.0.1:5354          ESTABLISHED 11784/dnsmasq
tcp        0      0 127.0.0.1:51948         127.0.0.1:5354          ESTABLISHED 11779/dnsmasq
tcp        0      0 10.0.0.1:53             10.0.0.37:51276         ESTABLISHED 11784/dnsmasq
tcp        0      0 10.0.0.1:53             10.0.0.37:51275         ESTABLISHED 11783/dnsmasq
tcp        0      0 127.0.0.1:50012         127.0.0.1:5354          ESTABLISHED 11782/dnsmasq
tcp        0      0 127.0.0.1:54257         127.0.0.1:5354          ESTABLISHED 11785/dnsmasq
tcp        0      0 127.0.0.1:60232         127.0.0.1:5354          ESTABLISHED 11780/dnsmasq
tcp        0      0 10.0.0.1:53             10.0.0.37:51277         ESTABLISHED 11785/dnsmasq
tcp        0      0 10.0.0.1:53             10.0.0.37:51278         ESTABLISHED 11786/dnsmasq
tcp        0      0 127.0.0.1:49734         127.0.0.1:5354          ESTABLISHED 11786/dnsmasq
tcp        0      0 :::5354                 :::*                    LISTEN      26727/chinadns-ng
tcp        0      0 :::53                   :::*                    LISTEN      11779/dnsmasq
tcp        0      0 ::ffff:127.0.0.1:5354   ::ffff:127.0.0.1:48100  ESTABLISHED 26727/chinadns-ng
tcp        0      0 2409:8a55:4ce7:d810::1:53 2409:8a55:4ce7:d810:f451:e5ab:11b4:95e9:49162 ESTABLISHED 11782/dnsmasq
tcp        0      0 ::ffff:127.0.0.1:5354   ::ffff:127.0.0.1:60232  ESTABLISHED 26727/chinadns-ng
tcp        0      0 2409:8a55:4ce7:d810::1:53 2409:8a55:4ce7:d810:f451:e5ab:11b4:95e9:49159 ESTABLISHED 11780/dnsmasq
tcp        0      0 ::ffff:127.0.0.1:5354   ::ffff:127.0.0.1:50012  ESTABLISHED 26727/chinadns-ng
tcp        0      0 ::ffff:127.0.0.1:5354   ::ffff:127.0.0.1:49734  ESTABLISHED 26727/chinadns-ng
tcp        0      0 2409:8a55:4ce7:d810::1:53 2409:8a55:4ce7:d810:f451:e5ab:11b4:95e9:49160 ESTABLISHED 11779/dnsmasq
tcp        0      0 ::ffff:127.0.0.1:5354   ::ffff:127.0.0.1:51948  ESTABLISHED 26727/chinadns-ng
tcp        0      0 2409:8a55:4ce7:d810::1:53 2409:8a55:4ce7:d810:f451:e5ab:11b4:95e9:49161 ESTABLISHED 11781/dnsmasq
tcp        0      0 ::ffff:127.0.0.1:5354   ::ffff:127.0.0.1:60740  ESTABLISHED 26727/chinadns-ng
tcp        0      0 ::ffff:127.0.0.1:5354   ::ffff:127.0.0.1:42354  ESTABLISHED 26727/chinadns-ng
tcp        0      0 ::ffff:127.0.0.1:5354   ::ffff:127.0.0.1:54257  ESTABLISHED 26727/chinadns-ng
udp        0      0 0.0.0.0:44167           0.0.0.0:*                           26727/chinadns-ng
udp        0      0 0.0.0.0:39137           0.0.0.0:*                           26727/chinadns-ng
udp        0      0 0.0.0.0:53              0.0.0.0:*                           11779/dnsmasq
udp        0      0 0.0.0.0:67              0.0.0.0:*                           11779/dnsmasq
udp        0      0 :::5354                 :::*                                26727/chinadns-ng
udp        0      0 :::53                   :::*                                11779/dnsmasq
unix  2      [ ]         DGRAM                    30882539 11779/dnsmasq       

运行时配置如下

2024-04-10 16:07:02 I [main.zig:117 main] local listen addr: ::#5354@tcp+udp
2024-04-10 16:07:02 I [main.zig:117 main] china upstream: tcpin://211.136.192.6
2024-04-10 16:07:02 I [main.zig:117 main] china upstream: udpin://211.136.192.6
2024-04-10 16:07:02 I [main.zig:117 main] china upstream: tcpin://120.196.165.24
2024-04-10 16:07:02 I [main.zig:117 main] china upstream: udpin://120.196.165.24
2024-04-10 16:07:02 I [main.zig:117 main] trust upstream: tcpin://127.0.0.1#5353
2024-04-10 16:07:02 I [main.zig:117 main] trust upstream: udpin://127.0.0.1#5353
2024-04-10 16:07:02 I [main.zig:117 main] trust upstream: tcpin://127.0.0.1#5352
2024-04-10 16:07:02 I [main.zig:117 main] trust upstream: udpin://127.0.0.1#5352

当客户端使用以下命令查询时
dig @127.0.0.1 -p5354 +tcp www.youtube.com
chinadns-ng报以下错误

2024-04-10 16:07:11 I [server.zig:203 service_tcp] new connection:7 from ::ffff:10.0.0.21#62838
2024-04-10 16:07:11 I [server.zig:302 QueryLog.query] query(id:39920, tag:gfw, qtype:28, 'www.youtube.com') from ::ffff:10.0.0.21#62838
2024-04-10 16:07:11 I [server.zig:349 QueryLog.forward] forward query(qid:1, from:tcp, 'www.youtube.com') to trust group
2024-04-10 16:07:11 I [Upstream.zig:490 Group.send] forward query(qid:1, from:tcp) to upstream tcpin://127.0.0.1#5353
2024-04-10 16:07:11 I [Upstream.zig:490 Group.send] forward query(qid:1, from:tcp) to upstream tcpin://127.0.0.1#5352
2024-04-10 16:07:11 E [Upstream.zig:148 _send_tcp] connect(8, 'tcpin://127.0.0.1#5353') failed: (146) Connection refused
2024-04-10 16:07:11 E [Upstream.zig:148 _send_tcp] connect(9, 'tcpin://127.0.0.1#5352') failed: (146) Connection refused

客户端的dig报以下错误

[root@ManTou:/root]#dig @127.0.0.1 -p5354 +tcp www.youtube.com
;; Connection to 127.0.0.1#5354(127.0.0.1) for www.youtube.com failed: connection refused.

同样地。国内网站使用tcp查询时因为运营商dns不支持tcp而导至查询失败连接超时

2024-04-10 16:42:37 I [server.zig:302 QueryLog.query] query(id:53791, tag:chn, qtype:1, 'www.taobao.com') from ::ffff:127.0.0.1#35362
2024-04-10 16:42:37 I [server.zig:349 QueryLog.forward] forward query(qid:33, from:tcp, 'www.taobao.com') to china group
2024-04-10 16:42:37 I [Upstream.zig:490 Group.send] forward query(qid:33, from:tcp) to upstream tcpin://211.136.192.6
2024-04-10 16:42:37 I [Upstream.zig:490 Group.send] forward query(qid:33, from:tcp) to upstream tcpin://120.196.165.24
2024-04-10 16:42:42 W [server.zig:827 on_timeout] query(qid:33, id:53791, tag:chn) from tcp://::ffff:127.0.0.1#35362 [timeout]
2024-04-10 16:42:42 E [Upstream.zig:148 _send_tcp] connect(12, 'tcpin://211.136.192.6') failed: (145) Operation timed out
2024-04-10 16:42:42 E [Upstream.zig:148 _send_tcp] connect(13, 'tcpin://120.196.165.24') failed: (145) Operation timed out




[root@ManTou:/root]#dig @127.0.0.1 -p5354 +tcp www.taobao.com
; <<>> DiG 9.9.9-P3 <<>> @127.0.0.1 -p5354 +tcp www.taobao.com
; (1 server found)
;; global options: +cmd
;; connection timed out; no servers could be reached

虽然运行时看似upstream有udpin和tcpin,但是客户端只要指定了tcp查询,chinadns-ng就只forward tcp,并不会使用udp,

@windmsn
Copy link
Author

windmsn commented Apr 10, 2024

目前临时解决的方法就只能是先指定udp了
bind-port 5354@udp
后续还是希望能对上游dns进行指定?
或者接收到tcp查询时,同时向upstream进行tcpin与udpin的转发?

@zfl9
Copy link
Owner

zfl9 commented Apr 11, 2024

#153

@zfl9
Copy link
Owner

zfl9 commented Apr 11, 2024

不过有一点你确实提醒了我,就是启动时的打印消息:

  • 若监听了TCP和UDP,则1.1.1.1这种无协议限定的上游,在内部为:

    • tcpin://1.1.1.1:客户端使用TCP查询时,此上游被使用
    • udpin://1.1.1.1:客户端使用UDP查询时,此上游被使用
  • 若只监听了TCP,则1.1.1.1这种无协议限定的上游,在内部为:

    • tcpin://1.1.1.1:客户端使用TCP查询时,此上游被使用
  • 若只监听了UDP,则1.1.1.1这种无协议限定的上游,在内部为:

    • udpin://1.1.1.1:客户端使用UDP查询时,此上游被使用

现在是无脑的打印 tcpin 和 udpin 两个上游地址(没有管TCP、UDP是否监听)

@zfl9
Copy link
Owner

zfl9 commented Apr 11, 2024

你的问题与 #153 完全一样,我在那里也做了详细解释和说明:

#153 (comment)

@zfl9
Copy link
Owner

zfl9 commented Apr 11, 2024

或者接收到tcp查询时,同时向upstream进行tcpin与udpin的转发?

不可行,具体原因见 #153

简单来说:

  • 从udp进来的查询,可以转发给任意协议的上游(tcp、udp、tls、https),即使因为 msg size 而被 TC,udp查询方也会自动使用tcp发起一个相同的查询(这是RFC强制要求的,所有现实中的resolver也都做了这个处理)
  • 从tcp进来的查询,只能转发给基于tcp协议的上游(tcp、tls、https),因为tcp查询方没有 TC截断而自动重试 的逻辑,因为tcp上的msg没有size限制,不应该发生TC,自然也就不会有这个逻辑。

@windmsn
Copy link
Author

windmsn commented Apr 11, 2024

我昨天也翻看到#153 的内容,

目前配置文件如下

bind-addr ::
bind-port 5354
china-dns 211.136.192.6,120.196.165.24,tcp://223.5.5.5,tcp://223.6.6.6
trust-dns 127.0.0.1#5353,127.0.0.1#5352,tcp://2001:4860:4860::8888,tcp://2001:4860:4860::8844,tcp://8.8.8.8,tcp://8.8.4.4
gfwlist-file /root/gfwlist.txt
chnlist-file /root/chnlist.txt
ipset-name4 china
ipset-name6 chnroute6
chnlist-first
add-taggfw-ip china-banned,china-banned6

想了一下能不能这样优化

在国内dns里加上了tcp协议的dns tcp://223.5.5.5,tcp://223.6.6.6

当客户端发起tcp查询时。chinadns-ng能否直接把查询转发到指定tcp协议的dns服务器上。而不转发到211.136.192.6,120.196.165.24这类不支持tcp的运营商dns服务器。
当客户端使用udp查询时,则转发给任意协议的上游。

昨天到现在统计限一下udp的查询占有95%,tcp查询占有5%,所以udp查询在运营商的dns上是最快的。

@zfl9
Copy link
Owner

zfl9 commented Apr 11, 2024

当然还有个思路:对于从tcp收到的查询,chinadns-ng允许转发给任意协议的上游(包括udp),如果收到的reply被TC了,则在chinadns-ng这边丢弃这个reply(先不交给tcp客户端),然后再次向上游发起一次相同的查询(这回就需要排除udp上游了),然后这次拿到的reply肯定不会被TC,最后将这个reply返回给tcp客户端。

但是这样真的有必要吗?而且即使真的这样做,也有个前提条件:上游组中必须有至少一个支持TCP查询的上游,否则因TC而发起的第二次query必定会失败。

@zfl9
Copy link
Owner

zfl9 commented Apr 11, 2024

想了一下能不能这样优化

在国内dns里加上了tcp协议的dns tcp://223.5.5.5,tcp://223.6.6.6

当客户端发起tcp查询时。chinadns-ng能否直接把查询转发到指定tcp协议的dns服务器上。而不转发到211.136.192.6,120.196.165.24这类不支持tcp的运营商dns服务器。

根据你给出的配置,目前chinadns-ng对于一个上游组的查询策略是 并发查询所有,因此你说的是可以做到的,只不过实际的情况是这样:

从tcp收到查询,转发给china组时:

  • 211.136.192.6 因为没有协议限定,所以内部实际上会转发给tcpin://211.136.192.6,因为它不支持tcp查询,所以会连接失败(没啥影响,只是打印一个log)
  • 120.196.165.24 因为没有协议限定,所以内部实际上会转发给tcpin://120.196.165.24,因为它不支持tcp查询,所以会连接失败(没啥影响,只是打印一个log)
  • tcp://223.5.5.5 支持tcp查询,没问题
  • tcp://223.6.6.6 支持tcp查询,没问题

最终结果采纳最先返回的哪个,也就是要么是223.5.5.5、要么是223.6.6.6返回的结果,运营商dns失败不会有任何影响(这就是允许配置多个dns上游的核心目的)。

@zfl9
Copy link
Owner

zfl9 commented Apr 11, 2024

当客户端使用udp查询时,则转发给任意协议的上游。

这个目前就是这样工作的,无需更改。

@windmsn
Copy link
Author

windmsn commented Apr 11, 2024

当然还有个思路:对于从tcp收到的查询,chinadns-ng允许转发给任意协议的上游(包括udp),如果收到的reply被TC了,则在chinadns-ng这边丢弃这个reply(先不交给tcp客户端),然后再次向上游发起一次相同的查询(这回就需要排除udp上游了),然后这次拿到的reply肯定不会被TC,最后将这个reply返回给tcp客户端。

但是这样真的有必要吗?而且即使真的这样做,也有个前提条件:上游组中必须有至少一个支持TCP查询的上游,否则因TC而发起的第二次query必定会失败。

其实我现在纠结的是当客户端使用tcp查询发送到上游运营商DNS:211.136.192.6,120.196.165.24的时候他使用tcpin导致Operation timed out,然后dnsmasq无故挂起后启动多个进程的问题。。

然而。如果bind-port 5354@udp,或者去掉运营商dns(211.136.192.6,120.196.165.24,),只使用223.5.5.5,223.6.6.6可以udp+tcp的dns时

dnsmasq则正常没问题

2024-04-11 02:12:12 E [Upstream.zig:148 _send_tcp] connect(32, 'tcpin://211.136.192.6') failed: (145) Operation timed out
2024-04-11 02:12:12 E [Upstream.zig:148 _send_tcp] connect(33, 'tcpin://120.196.165.24') failed: (145) Operation timed out
2024-04-11 02:12:12 I [server.zig:203 service_tcp] new connection:11 from ::ffff:127.0.0.1#50222
2024-04-11 02:12:12 I [server.zig:302 QueryLog.query] query(id:55607, tag:chn, qtype:1, 'www.163.com') from ::ffff:127.0.0.1#50222
2024-04-11 02:12:12 I [server.zig:349 QueryLog.forward] forward query(qid:448, from:tcp, 'www.163.com') to china group
2024-04-11 02:12:12 I [Upstream.zig:490 Group.send] forward query(qid:448, from:tcp) to upstream tcpin://211.136.192.6
2024-04-11 02:12:12 I [Upstream.zig:490 Group.send] forward query(qid:448, from:tcp) to upstream tcpin://120.196.165.24
2024-04-11 02:12:12 I [Upstream.zig:490 Group.send] forward query(qid:448, from:tcp) to upstream tcp://223.5.5.5
2024-04-11 02:12:12 I [Upstream.zig:490 Group.send] forward query(qid:448, from:tcp) to upstream tcp://223.6.6.6
2024-04-11 02:12:12 I [server.zig:531 ReplyLog.reply] reply(qid:448, tag:chn, qtype:1, 'www.163.com') from tcp://223.6.6.6 [accept]
2024-04-11 02:12:12 I [server.zig:531 ReplyLog.reply] reply(qid:448, tag:null, qtype:1, 'www.163.com') from tcp://223.5.5.5 [ignore]
2024-04-11 02:12:12 I [server.zig:203 service_tcp] close connection:11 from ::ffff:127.0.0.1#50222
2024-04-11 02:12:13 E [Upstream.zig:148 _send_tcp] connect(34, 'tcpin://211.136.192.6') failed: (145) Operation timed out
2024-04-11 02:12:13 E [Upstream.zig:148 _send_tcp] connect(35, 'tcpin://120.196.165.24') failed: (145) Operation timed out
2024-04-11 02:12:14 I [server.zig:203 service_tcp] new connection:11 from ::ffff:127.0.0.1#37595
2024-04-11 02:12:14 I [server.zig:302 QueryLog.query] query(id:44846, tag:chn, qtype:1, 'www.163.com') from ::ffff:127.0.0.1#37595
2024-04-11 02:12:14 I [server.zig:349 QueryLog.forward] forward query(qid:449, from:tcp, 'www.163.com') to china group
2024-04-11 02:12:14 I [Upstream.zig:490 Group.send] forward query(qid:449, from:tcp) to upstream tcpin://211.136.192.6
2024-04-11 02:12:14 I [Upstream.zig:490 Group.send] forward query(qid:449, from:tcp) to upstream tcpin://120.196.165.24
2024-04-11 02:12:14 I [Upstream.zig:490 Group.send] forward query(qid:449, from:tcp) to upstream tcp://223.5.5.5
2024-04-11 02:12:14 I [Upstream.zig:490 Group.send] forward query(qid:449, from:tcp) to upstream tcp://223.6.6.6
2024-04-11 02:12:14 I [server.zig:531 ReplyLog.reply] reply(qid:449, tag:chn, qtype:1, 'www.163.com') from tcp://223.6.6.6 [accept]
2024-04-11 02:12:14 I [server.zig:203 service_tcp] close connection:11 from ::ffff:127.0.0.1#37595
2024-04-11 02:12:14 I [server.zig:531 ReplyLog.reply] reply(qid:449, tag:null, qtype:1, 'www.163.com') from tcp://223.5.5.5 [ignore]
2024-04-11 02:12:14 E [Upstream.zig:148 _send_tcp] connect(36, 'tcpin://211.136.192.6') failed: (145) Operation timed out
2024-04-11 02:12:14 E [Upstream.zig:148 _send_tcp] connect(37, 'tcpin://120.196.165.24') failed: (145) Operation timed out

@zfl9
Copy link
Owner

zfl9 commented Apr 11, 2024

其实,你可以关闭 dnsmasq 的 dns 功能,让 chinadns-ng 负责所有 dns,这样就不会有你说的 dnsmasq 因为 tcp 查询而启动多个进程的问题了。

-p, --port=<port>
Listen on <port> instead of the standard DNS port (53). Setting this to zero completely disables DNS function, leaving only DHCP and/or TFTP.

将 dnsmasq 里面的 port 设置为 0,这样就关闭了 DNS,只留下 DHCP 等功能。

@zfl9
Copy link
Owner

zfl9 commented Apr 11, 2024

因为目前 dnsmasq 对于 TCP 上的 DNS 实现很糟糕,效率也很低,每个 TCP 连接/查询 都会 fork 一个新的 dnsmasq 进程去处理,如果并发量稍微高一些,再加上路由器上本来内存就不多,进程数多起来之后很容易把系统弄得宕机。

不如让dnsmasq专门负责DHCP,dns交给其他软件去做。

@windmsn
Copy link
Author

windmsn commented Apr 11, 2024

因为目前 dnsmasq 对于 TCP 上的 DNS 实现很糟糕,效率也很低,每个 TCP 连接/查询 都会 fork 一个新的 dnsmasq 进程去处理,如果并发量稍微高一些,再加上路由器上本来内存就不多,进程数多起来之后很容易把系统弄得宕机。

不如让dnsmasq专门负责DHCP,dns交给其他软件去做。

现在先这样处理了,关了dnsmasq的53端口。chinadns-ng监听tcp+udp的53端口,添加了cache与verdict-cache,先观察使用情况。感谢大神回复!!!

@zfl9
Copy link
Owner

zfl9 commented Apr 11, 2024

另外,我建议你修改下配置,没必要在 223.5.5.5/223.6.6.6 前面加上 tcp:// 限定,直接和运营商 DNS 一样就行了(没有协议限定),这样 chinadns-ng 这边会自动根据查询方的传入协议来决定与上游的通信协议,因为大部分情况下DNS仍然走UDP,所以能减少很多不必要的TCP查询。

@zfl9
Copy link
Owner

zfl9 commented Apr 11, 2024

dns2tcp只能监听udp,使用chinadns-ng自带的tcp://8.8.8.8会出现大量的connection reset by peer.所以只能用dns2tcp

这个问题也不太可能吧,在tcp的处理上,chinadns-ng和dns2tcp一样的。访问失败建议查询下是不是iptables规则问题(没有走代理?)

@windmsn
Copy link
Author

windmsn commented Apr 11, 2024

dns2tcp只能监听udp,使用chinadns-ng自带的tcp://8.8.8.8会出现大量的connection reset by peer.所以只能用dns2tcp

这个问题也不太可能吧,在tcp的处理上,chinadns-ng和dns2tcp一样的。访问失败建议查询下是不是iptables规则问题(没有走代理?)

其实这是我一个很奇怪的需求,目前我这的运营商是中国移动的,ping 8.8.8.8以及2001:4860:4860::8888的延时平时只有20ms左右,夜深人静的时候。延时更可达到10ms。

rmbp ~ % ping6 2001:4860:4860::8888
PING6(56=40+8+8 bytes) 2409:8a55:4ced:4530:8e85:90ff:fe50:26d6 --> 2001:4860:4860::8888
16 bytes from 2001:4860:4860::8888, icmp_seq=0 hlim=54 time=22.103 ms
16 bytes from 2001:4860:4860::8888, icmp_seq=1 hlim=54 time=22.634 ms
16 bytes from 2001:4860:4860::8888, icmp_seq=2 hlim=54 time=21.940 ms
16 bytes from 2001:4860:4860::8888, icmp_seq=3 hlim=54 time=22.151 ms
16 bytes from 2001:4860:4860::8888, icmp_seq=4 hlim=54 time=21.955 ms
16 bytes from 2001:4860:4860::8888, icmp_seq=5 hlim=54 time=22.218 ms

然而,使用8.8.8.8以及2001:4860:4860::8888查udp的53端口,一些国外的域名就会被污染。使用tcp查询。则干净。

平时主要在油站游荡。使用ipv6能直连rr1---sn-i3b7knsd.googlevideo.com等googlevideo.com的域名。
直连8.8.8.8解析出来googlevideo.com的ip是香港的,速度很快,然而我的小鸡是美国的。如果走代理出去,那解析出来的ip则是美国的,连接就相对较慢了。所以dns和代理策略上我还做了分流,googlevideo.com的采用ipv6直连。而youtube.com等则走代理。

WX20240411-120443@2x

WX20240411-115557@2x

然后看到chinadns-ng也能直接支持tcp://,就把dns2tcp关了。然后路由器就经常报。

他能用。但就是会报这个错。

Thu Apr 11 11:55:58 2024 kern.warn kernel: [ 1064.672000] connection reset by peer.
Thu Apr 11 11:55:58 2024 kern.warn kernel: [ 1064.692000] connection reset by peer.
Thu Apr 11 11:55:58 2024 kern.warn kernel: [ 1064.732000] connection reset by peer.
Thu Apr 11 11:55:58 2024 kern.warn kernel: [ 1064.736000] connection reset by peer.
Thu Apr 11 11:55:58 2024 kern.warn kernel: [ 1064.776000] connection reset by peer.
Thu Apr 11 11:55:58 2024 kern.warn kernel: [ 1064.780000] connection reset by peer.
Thu Apr 11 11:55:58 2024 kern.warn kernel: [ 1064.828000] connection reset by peer.
Thu Apr 11 11:55:59 2024 kern.warn kernel: [ 1066.456000] connection reset by peer.
Thu Apr 11 11:55:59 2024 kern.warn kernel: [ 1066.472000] connection reset by peer.
Thu Apr 11 11:55:59 2024 kern.warn kernel: [ 1066.476000] connection reset by peer.
Thu Apr 11 11:55:59 2024 kern.warn kernel: [ 1066.496000] connection reset by peer.
Thu Apr 11 11:59:22 2024 kern.warn kernel: [ 1269.592000] connection reset by peer.
Thu Apr 11 11:59:22 2024 kern.warn kernel: [ 1269.608000] connection reset by peer.
Thu Apr 11 11:59:22 2024 kern.warn kernel: [ 1269.628000] connection reset by peer.
Thu Apr 11 11:59:22 2024 kern.warn kernel: [ 1269.632000] connection reset by peer.
Thu Apr 11 11:59:23 2024 kern.warn kernel: [ 1269.640000] connection reset by peer.
Thu Apr 11 11:59:23 2024 kern.warn kernel: [ 1269.672000] connection reset by peer.
Thu Apr 11 11:59:23 2024 kern.warn kernel: [ 1269.676000] connection reset by peer.
Thu Apr 11 11:59:23 2024 kern.warn kernel: [ 1270.056000] connection reset by peer.
Thu Apr 11 11:59:23 2024 kern.warn kernel: [ 1270.060000] connection reset by peer.
Thu Apr 11 11:59:23 2024 kern.warn kernel: [ 1270.064000] connection reset by peer.
Thu Apr 11 11:59:23 2024 kern.warn kernel: [ 1270.068000] connection reset by peer.
Thu Apr 11 11:59:23 2024 kern.warn kernel: [ 1270.100000] connection reset by peer.
Thu Apr 11 11:59:23 2024 kern.warn kernel: [ 1270.112000] connection reset by peer.
Thu Apr 11 11:59:23 2024 kern.warn kernel: [ 1270.116000] connection reset by peer.
Thu Apr 11 11:59:23 2024 kern.warn kernel: [ 1270.140000] connection reset by peer.
Thu Apr 11 11:59:23 2024 kern.warn kernel: [ 1270.484000] connection reset by peer.
Thu Apr 11 11:59:23 2024 kern.warn kernel: [ 1270.500000] connection reset by peer.
Thu Apr 11 11:59:23 2024 kern.warn kernel: [ 1270.504000] connection reset by peer.
Thu Apr 11 11:59:23 2024 kern.warn kernel: [ 1270.520000] connection reset by peer.
Thu Apr 11 11:59:23 2024 kern.warn kernel: [ 1270.524000] connection reset by peer.
Thu Apr 11 11:59:23 2024 kern.warn kernel: [ 1270.528000] connection reset by peer.
Thu Apr 11 11:59:23 2024 kern.warn kernel: [ 1270.532000] connection reset by peer.
Thu Apr 11 11:59:23 2024 kern.warn kernel: [ 1270.540000] connection reset by peer.
Thu Apr 11 11:59:23 2024 kern.warn kernel: [ 1270.572000] connection reset by peer.
Thu Apr 11 11:59:23 2024 kern.warn kernel: [ 1270.580000] connection reset by peer.
Thu Apr 11 11:59:23 2024 kern.warn kernel: [ 1270.580000] connection reset by peer.
Thu Apr 11 11:59:23 2024 kern.warn kernel: [ 1270.608000] connection reset by peer.
Thu Apr 11 11:59:23 2024 kern.warn kernel: [ 1270.636000] connection reset by peer.
Thu Apr 11 11:59:24 2024 kern.warn kernel: [ 1270.668000] connection reset by peer.
Thu Apr 11 11:59:24 2024 kern.warn kernel: [ 1270.672000] connection reset by peer.
Thu Apr 11 11:59:24 2024 kern.warn kernel: [ 1270.700000] connection reset by peer.
Thu Apr 11 11:59:24 2024 kern.warn kernel: [ 1270.708000] connection reset by peer.
Thu Apr 11 11:59:24 2024 kern.warn kernel: [ 1270.712000] connection reset by peer.
Thu Apr 11 11:59:34 2024 kern.warn kernel: [ 1280.660000] connection reset by peer.
Thu Apr 11 11:59:34 2024 kern.warn kernel: [ 1280.664000] connection reset by peer.
Thu Apr 11 11:59:34 2024 kern.warn kernel: [ 1280.704000] connection reset by peer.
Thu Apr 11 11:59:34 2024 kern.warn kernel: [ 1280.708000] connection reset by peer.
Thu Apr 11 11:59:34 2024 kern.warn kernel: [ 1280.740000] connection reset by peer.
Thu Apr 11 11:59:34 2024 kern.warn kernel: [ 1280.744000] connection reset by peer.
Thu Apr 11 11:59:34 2024 kern.warn kernel: [ 1280.756000] connection reset by peer.
Thu Apr 11 11:59:34 2024 kern.warn kernel: [ 1281.116000] connection reset by peer.
Thu Apr 11 11:59:34 2024 kern.warn kernel: [ 1281.128000] connection reset by peer.
Thu Apr 11 11:59:34 2024 kern.warn kernel: [ 1281.132000] connection reset by peer.
Thu Apr 11 11:59:34 2024 kern.warn kernel: [ 1281.152000] connection reset by peer.
Thu Apr 11 11:59:34 2024 kern.warn kernel: [ 1281.160000] connection reset by peer.
Thu Apr 11 11:59:34 2024 kern.warn kernel: [ 1281.164000] connection reset by peer.
Thu Apr 11 11:59:34 2024 kern.warn kernel: [ 1281.188000] connection reset by peer.
Thu Apr 11 11:59:34 2024 kern.warn kernel: [ 1281.192000] connection reset by peer.
Thu Apr 11 11:59:34 2024 kern.warn kernel: [ 1281.196000] connection reset by peer.
Thu Apr 11 11:59:34 2024 kern.warn kernel: [ 1281.528000] connection reset by peer.
Thu Apr 11 11:59:34 2024 kern.warn kernel: [ 1281.536000] connection reset by peer.
Thu Apr 11 11:59:34 2024 kern.warn kernel: [ 1281.540000] connection reset by peer.
Thu Apr 11 11:59:34 2024 kern.warn kernel: [ 1281.572000] connection reset by peer.
Thu Apr 11 11:59:34 2024 kern.warn kernel: [ 1281.576000] connection reset by peer.
Thu Apr 11 11:59:34 2024 kern.warn kernel: [ 1281.580000] connection reset by peer.
Thu Apr 11 11:59:34 2024 kern.warn kernel: [ 1281.612000] connection reset by peer.
Thu Apr 11 11:59:34 2024 kern.warn kernel: [ 1281.616000] connection reset by peer.
Thu Apr 11 12:00:00 2024 kern.warn kernel: [ 1306.916000] connection reset by peer.
Thu Apr 11 12:00:00 2024 kern.warn kernel: [ 1306.960000] connection reset by peer.
Thu Apr 11 12:00:00 2024 kern.warn kernel: [ 1307.004000] connection reset by peer.
Thu Apr 11 12:02:11 2024 kern.warn kernel: [ 1437.828000] connection reset by peer.
Thu Apr 11 12:02:11 2024 kern.warn kernel: [ 1437.832000] connection reset by peer.
Thu Apr 11 12:02:11 2024 kern.warn kernel: [ 1437.840000] connection reset by peer.
Thu Apr 11 12:02:11 2024 kern.warn kernel: [ 1437.844000] connection reset by peer.
Thu Apr 11 12:02:11 2024 kern.warn kernel: [ 1437.864000] connection reset by peer.
Thu Apr 11 12:02:11 2024 kern.warn kernel: [ 1437.872000] connection reset by peer.
Thu Apr 11 12:02:11 2024 kern.warn kernel: [ 1437.876000] connection reset by peer.
Thu Apr 11 12:02:11 2024 kern.warn kernel: [ 1437.912000] connection reset by peer.
Thu Apr 11 12:02:11 2024 kern.warn kernel: [ 1437.916000] connection reset by peer.
Thu Apr 11 12:02:11 2024 kern.warn kernel: [ 1438.244000] connection reset by peer.
Thu Apr 11 12:02:11 2024 kern.warn kernel: [ 1438.272000] connection reset by peer.
Thu Apr 11 12:02:11 2024 kern.warn kernel: [ 1438.276000] connection reset by peer.
Thu Apr 11 12:02:11 2024 kern.warn kernel: [ 1438.288000] connection reset by peer.
Thu Apr 11 12:02:11 2024 kern.warn kernel: [ 1438.308000] connection reset by peer.
Thu Apr 11 12:02:11 2024 kern.warn kernel: [ 1438.320000] connection reset by peer.
Thu Apr 11 12:02:11 2024 kern.warn kernel: [ 1438.324000] connection reset by peer.
Thu Apr 11 12:02:12 2024 kern.warn kernel: [ 1438.636000] connection reset by peer.
Thu Apr 11 12:02:12 2024 kern.warn kernel: [ 1438.644000] connection reset by peer.
Thu Apr 11 12:02:12 2024 kern.warn kernel: [ 1438.648000] connection reset by peer.
Thu Apr 11 12:02:12 2024 kern.warn kernel: [ 1438.656000] connection reset by peer.
Thu Apr 11 12:02:12 2024 kern.warn kernel: [ 1438.684000] connection reset by peer.
Thu Apr 11 12:02:12 2024 kern.warn kernel: [ 1438.692000] connection reset by peer.
Thu Apr 11 12:02:12 2024 kern.warn kernel: [ 1438.696000] connection reset by peer.
Thu Apr 11 12:02:12 2024 kern.warn kernel: [ 1438.724000] connection reset by peer.
Thu Apr 11 12:02:12 2024 kern.warn kernel: [ 1438.728000] connection reset by peer.
Thu Apr 11 12:02:12 2024 kern.warn kernel: [ 1438.748000] connection reset by peer.
Thu Apr 11 12:03:04 2024 kern.warn kernel: [ 1491.148000] connection reset by peer.
Thu Apr 11 12:03:04 2024 kern.warn kernel: [ 1491.208000] connection reset by peer.
Thu Apr 11 12:03:04 2024 kern.warn kernel: [ 1491.252000] connection reset by peer.
Thu Apr 11 12:03:24 2024 kern.warn kernel: [ 1511.140000] connection reset by peer.
Thu Apr 11 12:03:24 2024 kern.warn kernel: [ 1511.156000] connection reset by peer.
Thu Apr 11 12:03:24 2024 kern.warn kernel: [ 1511.176000] connection reset by peer.
Thu Apr 11 12:03:24 2024 kern.warn kernel: [ 1511.196000] connection reset by peer.
Thu Apr 11 12:03:24 2024 kern.warn kernel: [ 1511.204000] connection reset by peer.
Thu Apr 11 12:03:24 2024 kern.warn kernel: [ 1511.228000] connection reset by peer.
Thu Apr 11 12:03:24 2024 kern.warn kernel: [ 1511.260000] connection reset by peer.
Thu Apr 11 12:03:25 2024 kern.warn kernel: [ 1511.768000] connection reset by peer.
Thu Apr 11 12:03:25 2024 kern.warn kernel: [ 1511.840000] connection reset by peer.
Thu Apr 11 12:03:39 2024 kern.warn kernel: [ 1526.584000] connection reset by peer.

但是chinadns-ng(udp)->udp2tcp->8.8.8.8就没有这个问题。。。

@zfl9
Copy link
Owner

zfl9 commented Apr 11, 2024

好吧,那确实摸不着头脑。那就不管了哈哈。

@windmsn
Copy link
Author

windmsn commented Apr 11, 2024

好吧,那确实摸不着头脑。那就不管了哈哈。

所以。。才想着。上游那里能指定udp就更好了,因为按照我现在的配置

trust-dns 127.0.0.1#5353,127.0.0.1#5352,tcp://2001:4860:4860::8888,tcp://2001:4860:4860::8844,tcp://8.8.8.8,tcp://8.8.4.4

127.0.0.1#5353,127.0.0.1#5352
dns2tcp<上游8.8.8.8>,只接收udp的协议,走tcp的时候又会抛connect(8, 'tcpin://127.0.0.1#5353') failed: (146) Connection refused)

tcp://2001:4860:4860::8888,tcp://2001:4860:4860::8844,tcp://8.8.8.8,tcp://8.8.4.4
走这个tcp时。。路由器又抛kern.warn kernel: [ 1526.584000] connection reset by peer.

@zfl9
Copy link
Owner

zfl9 commented Apr 11, 2024

当然还有个思路:对于从tcp收到的查询,chinadns-ng允许转发给任意协议的上游(包括udp),如果收到的reply被TC了,则在chinadns-ng这边丢弃这个reply(先不交给tcp客户端),然后再次向上游发起一次相同的查询(这回就需要排除udp上游了),然后这次拿到的reply肯定不会被TC,最后将这个reply返回给tcp客户端。

但是这样真的有必要吗?而且即使真的这样做,也有个前提条件:上游组中必须有至少一个支持TCP查询的上游,否则因TC而发起的第二次query必定会失败。

回头有空我看看,用这个思路,这样就能支持 udp:// 上游了(仅udp查询),在TC截断重试的查询中,此类上游被禁用。

因为tc的情况还是比较少见,所以还是没啥问题的。逻辑上和效率上都OK

@windmsn
Copy link
Author

windmsn commented Apr 11, 2024

当然还有个思路:对于从tcp收到的查询,chinadns-ng允许转发给任意协议的上游(包括udp),如果收到的reply被TC了,则在chinadns-ng这边丢弃这个reply(先不交给tcp客户端),然后再次向上游发起一次相同的查询(这回就需要排除udp上游了),然后这次拿到的reply肯定不会被TC,最后将这个reply返回给tcp客户端。
但是这样真的有必要吗?而且即使真的这样做,也有个前提条件:上游组中必须有至少一个支持TCP查询的上游,否则因TC而发起的第二次query必定会失败。

回头有空我看看,用这个思路,这样就能支持 udp:// 上游了(仅udp查询),在TC截断重试的查询中,此类上游被禁用。

因为tc的情况还是比较少见,所以还是没啥问题的。逻辑上和效率上都OK

好咧。感谢大佬的关注和回复。。

就目前来说,我觉得dns所返回的MSG SIZE,应该是由网站/app/dns的服务提供商控制的。
例如我开发一款直播APP。当我知道域名解析的时候MSG SIZE超过udp的MSG SIZE,将会使用CNAME等控制MSG SIZE的大小,或者在app里使用tcp协议进行查询。反之,就一直使用udp查询。遇到部分地区用户的dns动持时,还需要app内跟服务器获取dns解析结果。。
dns的服务提供商也是一样。dnspod的119.29.29.29,119.28.28.28,中国移动的211.136.192.6,120.196.165.24,中国电信的202.96.128.86,202.96.134.33其实也不支持tcp查询。

rmbp ~ % dig @10.0.0.1 www.qq.com

; <<>> DiG 9.10.6 <<>> @10.0.0.1 www.qq.com
; (1 server found)
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 36762
;; flags: qr rd ra; QUERY: 1, ANSWER: 3, AUTHORITY: 3, ADDITIONAL: 14

;; QUESTION SECTION:
;www.qq.com.			IN	A

;; ANSWER SECTION:
www.qq.com.		71	IN	CNAME	ins-r23tsuuf.ias.tencent-cloud.net.
ins-r23tsuuf.ias.tencent-cloud.net. 70 IN A	112.53.42.114
ins-r23tsuuf.ias.tencent-cloud.net. 70 IN A	112.53.42.52

;; AUTHORITY SECTION:
tencent-cloud.net.	40953	IN	NS	ns-open1.qq.com.
tencent-cloud.net.	40953	IN	NS	ns-open3.qq.com.
tencent-cloud.net.	40953	IN	NS	ns-open2.qq.com.

;; ADDITIONAL SECTION:
ns-open1.qq.com.	170748	IN	A	117.135.174.196
ns-open1.qq.com.	170748	IN	A	203.205.236.176
ns-open1.qq.com.	170748	IN	A	59.36.132.139
ns-open2.qq.com.	171717	IN	A	182.254.59.163
ns-open2.qq.com.	171717	IN	A	203.205.195.63
ns-open2.qq.com.	171717	IN	A	203.205.195.122
ns-open2.qq.com.	171717	IN	A	61.241.27.10
ns-open3.qq.com.	59	IN	A	121.51.167.100
ns-open3.qq.com.	59	IN	A	140.207.180.51
ns-open3.qq.com.	59	IN	A	203.205.220.25
ns-open3.qq.com.	59	IN	A	218.68.91.163
ns-open3.qq.com.	59	IN	A	101.227.161.202
ns-open1.qq.com.	171717	IN	AAAA	2402:4e00:111:ffe::3
ns-open2.qq.com.	171717	IN	AAAA	240e:e1:aa00:2001::3

;; Query time: 8 msec
;; SERVER: 10.0.0.1#53(10.0.0.1)
;; WHEN: Thu Apr 11 12:42:35 CST 2024
;; MSG SIZE  rcvd: 425

其实正常情况MSG SIZE 都不大。。。

@zfl9 zfl9 added the enhancement New feature or request label Apr 11, 2024
@zfl9 zfl9 changed the title 客户端使用tcp查询,但上游DNS只支持udp的问题 [增强] 支持 udp:// 上游 Apr 11, 2024
@zfl9
Copy link
Owner

zfl9 commented Apr 12, 2024

想了下,其实根本不用在chinadns-ng这边重试,直接过滤TC的reply就行了(tcp查询时)
因为此时上游组中肯定至少有一个 tcp-based 上游(tcp://tcpi://tls://)。

无协议限定的上游由这样两个“upstream”组成:tcpi://(查询方使用tcp时启用)、udpi://(查询方使用udp时启用)


UPDATE: dev分支已修改,测试正常。

@windmsn
Copy link
Author

windmsn commented Apr 12, 2024

想了下,其实根本不用在chinadns-ng这边重试,直接过滤TC的reply就行了(tcp查询时) 因为此时上游组中肯定至少有一个 tcp-based 上游(tcp://tcpi://tls://)。

无协议限定的上游由这样两个“upstream”组成:tcpi://(查询方使用tcp时启用)、udpi://(查询方使用udp时启用)

UPDATE: dev分支已修改,测试正常。

坐等更新,今天使用还发现一个问题,

dns上游为
china-dns 202.96.128.86,202.96.134.33,tcp://223.5.5.5

查询域名cn-beijing-data.aliyundrive.net时上游返回EDNS的数据MSG SIZE达到530
返回EDNS是概率性的。并非每一次都返回530 SIZE的EDNS reply,有70%的概率是返回491 SIZE的reply

root@OLAY:~# dig @127.0.0.1 -p 15354 cn-beijing-data.aliyundrive.net

; <<>> DiG 9.18.16 <<>> @127.0.0.1 -p 15354 cn-beijing-data.aliyundrive.net
; (1 server found)
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 57624
;; flags: qr rd ra; QUERY: 1, ANSWER: 5, AUTHORITY: 2, ADDITIONAL: 15

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 4096
;; QUESTION SECTION:
;cn-beijing-data.aliyundrive.net. IN    A

;; ANSWER SECTION:
cn-beijing-data.aliyundrive.net. 600 IN CNAME   ccp-bj29-bj-1592982087.oss-enet-ds.aliyuncs.com.
ccp-bj29-bj-1592982087.oss-enet-ds.aliyuncs.com. 600 IN CNAME ccp-bj29-bj-1592982087.oss-enet-ds.aliyuncs.com.gds.alibabadns.com.
ccp-bj29-bj-1592982087.oss-enet-ds.aliyuncs.com.gds.alibabadns.com. 600 IN A 49.7.23.203
ccp-bj29-bj-1592982087.oss-enet-ds.aliyuncs.com.gds.alibabadns.com. 600 IN A 49.7.23.200
ccp-bj29-bj-1592982087.oss-enet-ds.aliyuncs.com.gds.alibabadns.com. 600 IN A 49.7.23.201

;; AUTHORITY SECTION:
alibabadns.com.         157     IN      NS      ns1.alibabadns.com.
alibabadns.com.         157     IN      NS      ns2.alibabadns.com.

;; ADDITIONAL SECTION:
ns1.alibabadns.com.     472     IN      A       140.205.103.192
ns1.alibabadns.com.     472     IN      A       140.205.122.66
ns1.alibabadns.com.     472     IN      A       47.88.74.38
ns1.alibabadns.com.     472     IN      A       47.241.207.18
ns1.alibabadns.com.     472     IN      A       106.11.35.19
ns1.alibabadns.com.     472     IN      A       106.11.41.157
ns2.alibabadns.com.     573     IN      A       106.11.41.158
ns2.alibabadns.com.     573     IN      A       140.205.103.194
ns2.alibabadns.com.     573     IN      A       140.205.122.77
ns2.alibabadns.com.     573     IN      A       47.88.74.36
ns2.alibabadns.com.     573     IN      A       47.241.207.16
ns2.alibabadns.com.     573     IN      A       106.11.35.18
ns1.alibabadns.com.     524     IN      AAAA    2401:b180:4100::1
ns2.alibabadns.com.     519     IN      AAAA    2401:b180:4100::2

;; Query time: 0 msec
;; SERVER: 127.0.0.1#15354(127.0.0.1) (UDP)
;; WHEN: Fri Apr 12 19:49:43 CST 2024
;; MSG SIZE  rcvd: 530

此时chinadns-ng把这个reply给缓存了。
然后客户端(wget)访问cn-beijing-data.aliyundrive.net时就会报这个错:
wget: unable to resolve host address 'cn-beijing-data.aliyundrive.net'

只有等缓存时间过了,再请求cn-beijing-data.aliyundrive.net时。返回新的reply没有带EDNS字样MSG SIZE 为491时。才能正常访问cn-beijing-data.aliyundrive.net

root@OLAY:~# dig @127.0.0.1 -p 15354 cn-beijing-data.aliyundrive.net

; <<>> DiG 9.18.16 <<>> @127.0.0.1 -p 15354 cn-beijing-data.aliyundrive.net
; (1 server found)
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 55717
;; flags: qr rd ra; QUERY: 1, ANSWER: 5, AUTHORITY: 2, ADDITIONAL: 13

;; QUESTION SECTION:
;cn-beijing-data.aliyundrive.net. IN    A

;; ANSWER SECTION:
cn-beijing-data.aliyundrive.net. 600 IN CNAME   ccp-bj29-bj-1592982087.oss-enet-ds.aliyuncs.com.
ccp-bj29-bj-1592982087.oss-enet-ds.aliyuncs.com. 600 IN CNAME ccp-bj29-bj-1592982087.oss-enet-ds.aliyuncs.com.gds.alibabadns.com.
ccp-bj29-bj-1592982087.oss-enet-ds.aliyuncs.com.gds.alibabadns.com. 600 IN A 49.7.23.200
ccp-bj29-bj-1592982087.oss-enet-ds.aliyuncs.com.gds.alibabadns.com. 600 IN A 49.7.23.203
ccp-bj29-bj-1592982087.oss-enet-ds.aliyuncs.com.gds.alibabadns.com. 600 IN A 49.7.23.201

;; AUTHORITY SECTION:
alibabadns.com.         285     IN      NS      ns1.alibabadns.com.
alibabadns.com.         285     IN      NS      ns2.alibabadns.com.

;; ADDITIONAL SECTION:
ns1.alibabadns.com.     326     IN      A       140.205.122.66
ns1.alibabadns.com.     326     IN      A       47.88.74.38
ns1.alibabadns.com.     326     IN      A       47.241.207.18
ns1.alibabadns.com.     326     IN      A       106.11.35.19
ns1.alibabadns.com.     326     IN      A       106.11.41.157
ns1.alibabadns.com.     326     IN      A       140.205.103.192
ns2.alibabadns.com.     371     IN      A       140.205.103.194
ns2.alibabadns.com.     371     IN      A       140.205.122.77
ns2.alibabadns.com.     371     IN      A       47.88.74.36
ns2.alibabadns.com.     371     IN      A       47.241.207.16
ns2.alibabadns.com.     371     IN      A       106.11.35.18
ns2.alibabadns.com.     371     IN      A       106.11.41.158
ns1.alibabadns.com.     187     IN      AAAA    2401:b180:4100::1

;; Query time: 4 msec
;; SERVER: 127.0.0.1#15354(127.0.0.1) (UDP)
;; WHEN: Fri Apr 12 20:01:53 CST 2024
;; MSG SIZE  rcvd: 491

当使用dnsmasq时。第一次请求没缓存的SIZE也是491,如下

root@OLAY:~# dig cn-beijing-data.aliyundrive.net

; <<>> DiG 9.18.16 <<>> cn-beijing-data.aliyundrive.net
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 27716
;; flags: qr rd ra; QUERY: 1, ANSWER: 5, AUTHORITY: 2, ADDITIONAL: 13

;; QUESTION SECTION:
;cn-beijing-data.aliyundrive.net. IN    A

;; ANSWER SECTION:
cn-beijing-data.aliyundrive.net. 600 IN CNAME   ccp-bj29-bj-1592982087.oss-enet-ds.aliyuncs.com.
ccp-bj29-bj-1592982087.oss-enet-ds.aliyuncs.com. 600 IN CNAME ccp-bj29-bj-1592982087.oss-enet-ds.aliyuncs.com.gds.alibabadns.com.
ccp-bj29-bj-1592982087.oss-enet-ds.aliyuncs.com.gds.alibabadns.com. 600 IN A 49.7.23.200
ccp-bj29-bj-1592982087.oss-enet-ds.aliyuncs.com.gds.alibabadns.com. 600 IN A 49.7.23.201
ccp-bj29-bj-1592982087.oss-enet-ds.aliyuncs.com.gds.alibabadns.com. 600 IN A 49.7.23.203

;; AUTHORITY SECTION:
alibabadns.com.         395     IN      NS      ns2.alibabadns.com.
alibabadns.com.         395     IN      NS      ns1.alibabadns.com.

;; ADDITIONAL SECTION:
ns1.alibabadns.com.     90      IN      A       47.88.74.38
ns1.alibabadns.com.     90      IN      A       47.241.207.18
ns1.alibabadns.com.     90      IN      A       106.11.35.19
ns1.alibabadns.com.     90      IN      A       106.11.41.157
ns1.alibabadns.com.     90      IN      A       140.205.103.192
ns1.alibabadns.com.     90      IN      A       140.205.122.66
ns2.alibabadns.com.     473     IN      A       106.11.41.158
ns2.alibabadns.com.     473     IN      A       140.205.103.194
ns2.alibabadns.com.     473     IN      A       140.205.122.77
ns2.alibabadns.com.     473     IN      A       47.88.74.36
ns2.alibabadns.com.     473     IN      A       47.241.207.16
ns2.alibabadns.com.     473     IN      A       106.11.35.18
ns1.alibabadns.com.     281     IN      AAAA    2401:b180:4100::1

;; Query time: 3 msec
;; SERVER: 127.0.0.1#53(127.0.0.1) (UDP)
;; WHEN: Fri Apr 12 20:04:47 CST 2024
;; MSG SIZE  rcvd: 491

再次查询时。他缓存的SIZE就只有249了。

root@OLAY:~# dig cn-beijing-data.aliyundrive.net

; <<>> DiG 9.18.16 <<>> cn-beijing-data.aliyundrive.net
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 42531
;; flags: qr rd ra; QUERY: 1, ANSWER: 5, AUTHORITY: 0, ADDITIONAL: 1

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 1232
;; QUESTION SECTION:
;cn-beijing-data.aliyundrive.net. IN    A

;; ANSWER SECTION:
cn-beijing-data.aliyundrive.net. 573 IN CNAME   ccp-bj29-bj-1592982087.oss-enet-ds.aliyuncs.com.
ccp-bj29-bj-1592982087.oss-enet-ds.aliyuncs.com. 573 IN CNAME ccp-bj29-bj-1592982087.oss-enet-ds.aliyuncs.com.gds.alibabadns.com.
ccp-bj29-bj-1592982087.oss-enet-ds.aliyuncs.com.gds.alibabadns.com. 573 IN A 49.7.23.203
ccp-bj29-bj-1592982087.oss-enet-ds.aliyuncs.com.gds.alibabadns.com. 573 IN A 49.7.23.201
ccp-bj29-bj-1592982087.oss-enet-ds.aliyuncs.com.gds.alibabadns.com. 573 IN A 49.7.23.200

;; Query time: 0 msec
;; SERVER: 127.0.0.1#53(127.0.0.1) (UDP)
;; WHEN: Fri Apr 12 20:05:14 CST 2024
;; MSG SIZE  rcvd: 249

仔细看过。dnsmasq缓存的只有ANSWER SECTION但是多了一个EDNS的标识
chinadns-ng的缓存是把整个reply给缓存上去了。。。

@zfl9
Copy link
Owner

zfl9 commented Apr 12, 2024

这其实是一个musl的dns问题,最新的musl才支持tcp fallback(udp上的msg被TC时),之前的版本不支持。

因为530字节,刚好超过了512的默认大小(musl没有带EDNS0选项,所以只能接受512字节的udp msg)。

https://news.ycombinator.com/item?id=36933028

https://git.musl-libc.org/cgit/musl/commit/?id=51d4669fb97782f6a66606da852b5afd49a08001


不过为了兼容性(兼容这种不支持tcp fallback的resolver),等会我也修改下cache逻辑吧,只保留answer section,其他去除。

缓存的reply实际上没有问题,只是老版本musl不支持>512自己的包,哈哈。。

@windmsn
Copy link
Author

windmsn commented Apr 12, 2024

这其实是一个musl的dns问题,最新的musl才支持tcp fallback(udp上的msg被TC时),之前的版本不支持。

因为530字节,刚好超过了512的默认大小(musl没有带EDNS0选项,所以只能接受512字节的udp msg)。

https://news.ycombinator.com/item?id=36933028

https://git.musl-libc.org/cgit/musl/commit/?id=51d4669fb97782f6a66606da852b5afd49a08001

不过为了兼容性(兼容这种不支持tcp fallback的resolver),等会我也修改下cache逻辑吧,只保留answer section,其他去除。

缓存的reply实际上没有问题,只是老版本musl不支持>512自己的包,哈哈。。

我差点忽略了一个细节。就是那个wget。在第一次请求(reply还没缓存)的时候是可以的,然后我手贱把wget关了再开。。才发现这个问题。就是wget查chinadns-ng缓存命中之后才会这样。

@zfl9
Copy link
Owner

zfl9 commented Apr 12, 2024

其实是因为你第一次dig的请求,dig支持edns,所以第一次的reply是有edns rr的,看log,这个rr应该有十几二十字节。

然后chinadns-ng把这个reply缓存下来

第二次wget来请求,因为刚好size超过了512,于是产生了truncate,刚好你的musl版本没有tcp fallback。所以提示解析失败。

此时你可以其他主机用wget测试(glibc版本的,不能是musl),或者用dig重新请求,其实都是正常的,没有问题。

另外,如果第一次解析请求是musl/glibc发起的,比如用刚刚的wget去解析这个域名(然后被chinadns缓存起来),因为size为491,不会发生truncate,其实就没问题。


总之,原因已经了解,待会我改一下就ok了。

@windmsn
Copy link
Author

windmsn commented Apr 12, 2024

其实是因为你第一次dig的请求,dig支持edns,所以第一次的reply是有edns rr的,看log,这个rr应该有十几二十字节。

然后chinadns-ng把这个reply缓存下来

第二次wget来请求,因为刚好size超过了512,于是产生了truncate,刚好你的musl版本没有tcp fallback。所以提示解析失败。

此时你可以其他主机用wget测试(glibc版本的,不能是musl),或者用dig重新请求,其实都是正常的,没有问题。

另外,如果第一次解析请求是musl/glibc发起的,比如用刚刚的wget去解析这个域名(然后被chinadns缓存起来),因为size为491,不会发生truncate,其实就没问题。

总之,原因已经了解,待会我改一下就ok了。

顺序应该是这样的:

第一次wget是正常的。当时还没dig。第一个reply应该返回给wget了,并且chinadns-ng进行缓存,
然后第二次wget时报错。以为是dns挂了,才去dig的。dig的时候就发现能dig出来Query time: 0 msec应该是缓存来的。但wget就报错了。

第二次wget的时候dns部分应该只有resolver和chinadns-ng通迅了。中间是否少了什么东西。。

。。但这个问题很难复现,

@zfl9
Copy link
Owner

zfl9 commented Apr 12, 2024

应该不用复现了,因为这个问题本质就是旧版 musl 不支持 size > 512 的 msg(具体地说,> 512 的 reply 被 TC 了,而 musl 又不支持 tcp fallback),所以 musl 报告 dns 解析失败,于是 wget 抛出这个错误。

此时,如果使用glibc版本的wget,或者用浏览器去访问,都是ok的。问题其实与chinadns-ng无关。

我已经在修改缓存代码了,缓存时只保留answer节(也就是最小化的response),防止msg过大,而resolver又不支持tcp fallback的问题。(也就是dnsmasq这样的行为,缓存的reply只有answer)

@zfl9
Copy link
Owner

zfl9 commented Apr 12, 2024

dev已修改,测试ok。

@windmsn
Copy link
Author

windmsn commented Apr 12, 2024

dnsmasq上是添加了

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 1232

flags由
;; flags: qr rd ra; QUERY: 1, ANSWER: 5, AUTHORITY: 2, ADDITIONAL: 13
变成了
;; flags: qr rd ra; QUERY: 1, ANSWER: 5, AUTHORITY: 0, ADDITIONAL: 1

移除了;; AUTHORITY SECTION:;; ADDITIONAL SECTION:

但是缓存后的ADDITIONAL: 1不知道在哪。。没看到

完整的是这样的

root@OLAY:~# dig cn-beijing-data.aliyundrive.net

; <<>> DiG 9.18.16 <<>> cn-beijing-data.aliyundrive.net
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 3603
;; flags: qr rd ra; QUERY: 1, ANSWER: 5, AUTHORITY: 2, ADDITIONAL: 13

;; QUESTION SECTION:
;cn-beijing-data.aliyundrive.net. IN    A

;; ANSWER SECTION:
cn-beijing-data.aliyundrive.net. 600 IN CNAME   ccp-bj29-bj-1592982087.oss-enet-ds.aliyuncs.com.
ccp-bj29-bj-1592982087.oss-enet-ds.aliyuncs.com. 600 IN CNAME ccp-bj29-bj-1592982087.oss-enet-ds.aliyuncs.com.gds.alibabadns.com.
ccp-bj29-bj-1592982087.oss-enet-ds.aliyuncs.com.gds.alibabadns.com. 600 IN A 49.7.23.203
ccp-bj29-bj-1592982087.oss-enet-ds.aliyuncs.com.gds.alibabadns.com. 600 IN A 49.7.23.201
ccp-bj29-bj-1592982087.oss-enet-ds.aliyuncs.com.gds.alibabadns.com. 600 IN A 49.7.23.200

;; AUTHORITY SECTION:
alibabadns.com.         377     IN      NS      ns1.alibabadns.com.
alibabadns.com.         377     IN      NS      ns2.alibabadns.com.

;; ADDITIONAL SECTION:
ns1.alibabadns.com.     570     IN      A       47.88.74.38
ns1.alibabadns.com.     570     IN      A       47.241.207.18
ns1.alibabadns.com.     570     IN      A       106.11.35.19
ns1.alibabadns.com.     570     IN      A       106.11.41.157
ns1.alibabadns.com.     570     IN      A       140.205.103.192
ns1.alibabadns.com.     570     IN      A       140.205.122.66
ns2.alibabadns.com.     550     IN      A       106.11.35.18
ns2.alibabadns.com.     550     IN      A       106.11.41.158
ns2.alibabadns.com.     550     IN      A       140.205.103.194
ns2.alibabadns.com.     550     IN      A       140.205.122.77
ns2.alibabadns.com.     550     IN      A       47.88.74.36
ns2.alibabadns.com.     550     IN      A       47.241.207.16
ns1.alibabadns.com.     282     IN      AAAA    2401:b180:4100::1

;; Query time: 4 msec
;; SERVER: 127.0.0.1#53(127.0.0.1) (UDP)
;; WHEN: Fri Apr 12 21:39:38 CST 2024
;; MSG SIZE  rcvd: 491

root@OLAY:~# dig cn-beijing-data.aliyundrive.net

; <<>> DiG 9.18.16 <<>> cn-beijing-data.aliyundrive.net
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 57290
;; flags: qr rd ra; QUERY: 1, ANSWER: 5, AUTHORITY: 0, ADDITIONAL: 1

;; OPT PSEUDOSECTION:

; EDNS: version: 0, flags:; udp: 1232

;; QUESTION SECTION:
;cn-beijing-data.aliyundrive.net. IN    A

;; ANSWER SECTION:
cn-beijing-data.aliyundrive.net. 596 IN CNAME   ccp-bj29-bj-1592982087.oss-enet-ds.aliyuncs.com.
ccp-bj29-bj-1592982087.oss-enet-ds.aliyuncs.com. 596 IN CNAME ccp-bj29-bj-1592982087.oss-enet-ds.aliyuncs.com.gds.alibabadns.com.
ccp-bj29-bj-1592982087.oss-enet-ds.aliyuncs.com.gds.alibabadns.com. 596 IN A 49.7.23.200
ccp-bj29-bj-1592982087.oss-enet-ds.aliyuncs.com.gds.alibabadns.com. 596 IN A 49.7.23.201
ccp-bj29-bj-1592982087.oss-enet-ds.aliyuncs.com.gds.alibabadns.com. 596 IN A 49.7.23.203

;; Query time: 0 msec
;; SERVER: 127.0.0.1#53(127.0.0.1) (UDP)
;; WHEN: Fri Apr 12 21:39:42 CST 2024
;; MSG SIZE  rcvd: 249

@zfl9
Copy link
Owner

zfl9 commented Apr 12, 2024

但是缓存后的ADDITIONAL: 1不知道在哪。。没看到

就是 OPT RR,也就是 EDNS version 0 那行。

@windmsn
Copy link
Author

windmsn commented Apr 12, 2024

但是缓存后的ADDITIONAL: 1不知道在哪。。没看到

就是 OPT RR,也就是 EDNS version 0 那行。

期待新版本,,,求编译个开发版测试测试。。。

@zfl9
Copy link
Owner

zfl9 commented Apr 12, 2024

什么平台?发release的chinadns-ng文件名给我

@windmsn
Copy link
Author

windmsn commented Apr 12, 2024

什么平台?发release的chinadns-ng文件名给我
ChinaDNS-NG 2024.03.27 | target:x86_64-linux-musl | cpu:x86_64_v3 | mode:fast+lto
ChinaDNS-NG 2024.03.27 | target:mipsel-linux-musl | cpu:mips32r5+soft_float | mode:fast+lto

两个平台。

@zfl9
Copy link
Owner

zfl9 commented Apr 12, 2024

TEMP.zip

@zfl9
Copy link
Owner

zfl9 commented Apr 12, 2024

明天会发布一个版本。

@windmsn
Copy link
Author

windmsn commented Apr 12, 2024

明天会发布一个版本。

目前两个平台测试未发现问题。。

@zfl9
Copy link
Owner

zfl9 commented Apr 13, 2024

见最新版本。

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants