Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

【兼容性】对于收到的每个 query msg,都尽量进行”回复“,即使是 bad msg #180

Closed
hapood opened this issue Jul 11, 2024 · 27 comments

Comments

@hapood
Copy link

hapood commented Jul 11, 2024

下面是我截取的部分日志,请求非常频繁,即使手机在锁屏状态也还是持续请求。

2024-07-10 13:44:49 W [server.zig:391 on_query] dns.check_query(fd:6) failed: invalid query msg
2024-07-10 13:44:50 W [server.zig:391 on_query] dns.check_query(fd:6) failed: invalid query msg
2024-07-10 13:44:53 W [server.zig:391 on_query] dns.check_query(fd:6) failed: invalid query msg
2024-07-10 13:44:58 W [server.zig:391 on_query] dns.check_query(fd:6) failed: invalid query msg
2024-07-10 13:45:04 W [server.zig:391 on_query] dns.check_query(fd:6) failed: invalid query msg
2024-07-10 13:45:06 W [server.zig:391 on_query] dns.check_query(fd:6) failed: invalid query msg
2024-07-10 13:45:15 W [server.zig:391 on_query] dns.check_query(fd:6) failed: invalid query msg
2024-07-10 13:45:17 W [server.zig:391 on_query] dns.check_query(fd:6) failed: invalid query msg
2024-07-10 13:45:18 W [server.zig:391 on_query] dns.check_query(fd:6) failed: invalid query msg
2024-07-10 13:45:20 W [server.zig:391 on_query] dns.check_query(fd:6) failed: invalid query msg
2024-07-10 13:45:22 W [server.zig:391 on_query] dns.check_query(fd:6) failed: invalid query msg
2024-07-10 13:45:25 W [server.zig:391 on_query] dns.check_query(fd:6) failed: invalid query msg
2024-07-10 13:45:30 W [server.zig:391 on_query] dns.check_query(fd:6) failed: invalid query msg
2024-07-10 13:45:44 W [server.zig:391 on_query] dns.check_query(fd:6) failed: invalid query msg
2024-07-10 13:45:45 W [server.zig:391 on_query] dns.check_query(fd:6) failed: invalid query msg
2024-07-10 13:45:47 W [server.zig:391 on_query] dns.check_query(fd:6) failed: invalid query msg
2024-07-10 13:45:53 W [server.zig:391 on_query] dns.check_query(fd:6) failed: invalid query msg
2024-07-10 13:45:58 W [server.zig:391 on_query] dns.check_query(fd:6) failed: invalid query msg
2024-07-10 13:46:22 W [server.zig:391 on_query] dns.check_query(fd:6) failed: invalid query msg
2024-07-10 13:46:31 W [server.zig:391 on_query] dns.check_query(fd:6) failed: invalid query msg
2024-07-10 13:46:36 W [server.zig:391 on_query] dns.check_query(fd:6) failed: invalid query msg
2024-07-10 13:46:36 W [server.zig:391 on_query] dns.check_query(fd:6) failed: invalid query msg
2024-07-10 13:46:37 W [server.zig:391 on_query] dns.check_query(fd:6) failed: invalid query msg
2024-07-10 13:46:37 W [server.zig:391 on_query] dns.check_query(fd:6) failed: invalid query msg
2024-07-10 13:46:50 W [server.zig:391 on_query] dns.check_query(fd:6) failed: invalid query msg
2024-07-10 13:46:56 W [server.zig:391 on_query] dns.check_query(fd:6) failed: invalid query msg
2024-07-10 13:47:03 W [server.zig:391 on_query] dns.check_query(fd:6) failed: invalid query msg
2024-07-10 13:47:03 W [server.zig:391 on_query] dns.check_query(fd:6) failed: invalid query msg
2024-07-10 13:47:09 W [server.zig:391 on_query] dns.check_query(fd:6) failed: invalid query msg
2024-07-10 23:26:47 W [server.zig:391 on_query] dns.check_query(fd:6) failed: invalid query msg
2024-07-10 23:26:47 W [server.zig:647 on_reply] dns.check_reply(upstream:udpi://8.8.8.8) failed: invalid reply msg
2024-07-10 23:26:50 W [server.zig:391 on_query] dns.check_query(fd:6) failed: invalid query msg
2024-07-10 23:26:50 W [server.zig:391 on_query] dns.check_query(fd:6) failed: invalid query msg

@zfl9
Copy link
Owner

zfl9 commented Jul 11, 2024

此错误是因为chinadns收到了非dns协议的数据包,也许是有程序在做什么端口扫描。

@zfl9
Copy link
Owner

zfl9 commented Jul 11, 2024

我建议先把 verbose 日志开一下,这样方便定位是哪个 ip:port 在扫描(发这个请求),看看它来自哪个设备。

dns.c 里面有打印,需要的话可以取消注释(check_msg),因为我觉得正常情况下不太可能出这个错,所以注掉了。

@hapood
Copy link
Author

hapood commented Jul 11, 2024

谢谢,我先打开verbose看看。

代码我也拉下来了,但我对zig的环境不太熟悉,已经安装了0.10.1版本,但尝试构建master分支的时候报错。

/mnt/c/Users/hapoo/Projects/chinadns-ng/build.zig:239:34: error: expected ';' after declaration
\ url='{s}'; path='{s}'
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/mnt/c/Users/hapoo/Projects/chinadns-ng/build.zig:240:34: note: invalid byte: '$'
\ mkdir -p "$(dirname "$path")"

我的环境是windows里的wsl2 ubuntu ,直接运行的zig run build命令,请问如果想自己构建的话,我还要做什么配置吗?

@zfl9
Copy link
Owner

zfl9 commented Jul 11, 2024

没有其他依赖,zig 0.10.1 即可。

这个错误是因为 build.zig 内有一些 shell 代码(wget 下载 wolfssl 依赖),所以只能在 unix 环境下执行,Windows 不支持。

wsl 的话应该可以,但你需要进入 wsl 环境,在 wsl 里面执行 zig build。或者,直接在 linux 上构建。

@zfl9
Copy link
Owner

zfl9 commented Jul 11, 2024

我刚刚用 wsl1 测试了,可以正常构建,大致步骤:

git clone https://github.com/zfl9/chinadns-ng

cd chinadns-ng

# 如果不需要 DoT 支持,那么就不用管这个
# make 和 Autotools (wolfssl构建过程需要)
apt install make autoconf automake libtool

zig build -Dwolfssl
zig build # 不需要 DoT 时用这个,构建过程更快

@hapood
Copy link
Author

hapood commented Jul 11, 2024

是的,很奇怪,我也检查了下,看起来不像是脚本报错了,而是我的zig不认识 \\标识符,我删掉了脚本那段\\,其他的\\还是报错

hapood@surface7pro:/mnt/c/Users/hapoo/Projects/chinadns-ng$ ~/zig/zig build run
/mnt/c/Users/hapoo/Projects/chinadns-ng/src/opt.zig:17:76: error: expected ';' after declaration
\\usage: chinadns-ng <options...>. the existing options are as follows:
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/mnt/c/Users/hapoo/Projects/chinadns-ng/src/opt.zig:18:74: note: invalid byte: 't'
\\ -C, --config format similar to the long option
^
error: main.zig...
error: The following command exited with error code 1:
/home/hapood/zig/zig build-obj -fstage1 /mnt/c/Users/hapoo/Projects/chinadns-ng/src/main.zig -lc -fstrip -ffunction-sections -OReleaseFast --cache-dir /mnt/c/Users/hapoo/Projects/chinadns-ng/zig-cache --global-cache-dir /mnt/c/Users/hapoo/Projects/chinadns-ng/zig-cache --name main.zig -fsingle-threaded --pkg-begin build_opts /mnt/c/Users/hapoo/Projects/chinadns-ng/zig-cache/options/ExkwlLJeOioCJznJMV67jsx8fNKJej0LwYDcj0tOVRda6s_G7UH6hN1Og7I02uhb --pkg-end -I /mnt/c/Users/hapoo/Projects/chinadns-ng -D LOG_FILENAME="main.zig" -fno-PIE -flto --enable-cache
error: the following build command failed with exit code 1:
/mnt/c/Users/hapoo/Projects/chinadns-ng/zig-cache/o/21abdbe86d248fc1dee140f03b4d018e/build /home/hapood/zig/zig /mnt/c/Users/hapoo/Projects/chinadns-ng /mnt/c/Users/hapoo/Projects/chinadns-ng/zig-cache /home/hapood/.cache/zig run

@zfl9
Copy link
Owner

zfl9 commented Jul 11, 2024

换一个目录试试?比如 /opt?,在这里重新 git clone,然后 zig build。

@zfl9
Copy link
Owner

zfl9 commented Jul 11, 2024

而是我的zig不认识 \标识符,我删掉了脚本那段\,其他的\还是报错

\\是多行字符串,先不要修改 src 下的源码,正常 git clone,zig build。

@hapood
Copy link
Author

hapood commented Jul 11, 2024

多谢,成功了!是windows文件系统导致zig编译器不能正常编译的bug,在WSL2下完全正常。
还有最后一个问题,我需要构建linux的x86_64_v3架构的bin,对应release里的

chinadns-ng+wolfssl@x86_64-linux-musl@x86_64_v3@fast+lto

请问应该怎么构建呀。

@zfl9
Copy link
Owner

zfl9 commented Jul 11, 2024

# make 和 Autotools (wolfssl构建过程需要)
apt install make autoconf automake libtool

zig build -Dwolfssl -Dtarget=x86_64-linux-musl -Dcpu=x86_64_v3

这些在 readme 【编译】一节其实有详细说明。

@hapood
Copy link
Author

hapood commented Jul 11, 2024

我打开了verbose,日志显示请求都是来自于本机127.0.0.1(openwrt),而不是我预期中的192.168.x.x,停用openwrt的dnsmasq后ip地址正常了,我继续观察下是什么机器报错。

2024-07-11 12:18:36 I [server.zig:206 service_tcp] new connection:9 from 127.0.0.1#51908
2024-07-11 12:18:36 I [server.zig:206 service_tcp] new connection:11 from 127.0.0.1#51916
2024-07-11 12:18:36 I [server.zig:206 service_tcp] new connection:12 from 127.0.0.1#51926
2024-07-11 12:18:36 I [server.zig:206 service_tcp] close connection:11 from 127.0.0.1#51916
2024-07-11 12:18:36 I [server.zig:206 service_tcp] close connection:9 from 127.0.0.1#51908
2024-07-11 12:18:36 I [server.zig:206 service_tcp] close connection:12 from 127.0.0.1#51926
2024-07-11 12:19:22 I [server.zig:206 service_tcp] new connection:9 from 127.0.0.1#53208
2024-07-11 12:19:22 I [server.zig:206 service_tcp] new connection:11 from 127.0.0.1#53216
2024-07-11 12:19:22 I [server.zig:206 service_tcp] close connection:9 from 127.0.0.1#53208
2024-07-11 12:19:22 I [server.zig:206 service_tcp] close connection:11 from 127.0.0.1#53216

@hapood
Copy link
Author

hapood commented Jul 12, 2024

我把dns.c的错误日志打开了,并且加了请求来源的日志,抓取到如下结果

2024-07-12 08:21:26 E [dns.c:111 check_msg] there should be one and only one question: 0
2024-07-12 08:21:26 W [server.zig:404 on_query] dns.check_query(fd:6) failed: invalid query msg
2024-07-12 08:21:26 W [server.zig:409 on_query] invalid query from 192.168.88.178#35546
2024-07-12 08:21:26 E [dns.c:111 check_msg] there should be one and only one question: 0
2024-07-12 08:21:26 W [server.zig:404 on_query] dns.check_query(fd:6) failed: invalid query msg
2024-07-12 08:21:26 W [server.zig:409 on_query] invalid query from 192.168.88.178#60499

可以看到invalid query 错误全部来自于我的三星手机(192.168.88.178),不会写打印dns请求的代码,无法分析这个请求的具体意图是什么。

另外还抓取了另外2种错误,多次触发

2024-07-12 08:24:41 E [dns.c:100 check_msg] msg length is out of range: 12 [17, 65535]
2024-07-12 08:24:41 W [server.zig:657 on_reply] dns.check_reply(upstream:udpi://8.8.8.8) failed: invalid reply msg

这个可以看作是网络错误吗?google的DNS返回值异常。

2024-07-12 08:21:23 I [server.zig:595 ReplyLog.add_ip] add answer_ip(qid:140, tag:gfw, qtype:1, 'google.com') to sstp_black,sstp_black6
2024-07-12 08:21:23 I [server.zig:587 ReplyLog.reply] reply(qid:145, tag:none, qtype:1, 'www.goooooooooooooooooooooooooooooooooooooooooooooooooooooooooogle.com') from udpi://8.8.8.8 [accept]
2024-07-12 08:21:23 I [server.zig:620 ReplyLog.cache] add cache(qid:145, tag:none, qtype:1, 'www.goooooooooooooooooooooooooooooooooooooooooooooooooooooooooogle.com') size:136 ttl:82
2024-07-12 08:21:23 I [server.zig:587 ReplyLog.reply] reply(qid:146, tag:none, qtype:1, '*google.com') from udpi://8.8.8.8 [accept]
2024-07-12 08:21:23 I [server.zig:322 QueryLog.query] query(id:3072, tag:gfw, qtype:1, 'google.com') from 192.168.88.178#50770
2024-07-12 08:21:23 I [server.zig:359 QueryLog.cache] hit cache(id:3072, tag:gfw, qtype:1, 'google.com') size:44 ttl:152
2024-07-12 08:21:23 I [server.zig:322 QueryLog.query] query(id:3328, tag:gfw, qtype:1, 'google.com') from 192.168.88.178#37832
2024-07-12 08:21:23 I [server.zig:359 QueryLog.cache] hit cache(id:3328, tag:gfw, qtype:1, 'google.com') size:44 ttl:152
2024-07-12 08:21:23 I [server.zig:322 QueryLog.query] query(id:3584, tag:gfw, qtype:1, 'google.com') from 192.168.88.178#54661
2024-07-12 08:21:23 E [dns.c:186 skip_name] remaining length is less than sizeof(dns_record): -21 < 10
2024-07-12 08:21:23 I [server.zig:359 QueryLog.cache] hit cache(id:3584, tag:gfw, qtype:1, 'google.com') size:44 ttl:152
2024-07-12 08:21:23 I [server.zig:322 QueryLog.query] query(id:4352, tag:none, qtype:1, 'google.com.onion') from 192.168.88.178#52291
2024-07-12 08:21:23 I [server.zig:388 QueryLog.forward] forward query(qid:147, from:udp, 'google.com.onion') to china group
2024-07-12 08:21:23 I [Upstream.zig:939 Group.send] forward query(qid:147, from:udp) to upstream udpi://223.5.5.5
2024-07-12 08:21:23 I [server.zig:388 QueryLog.forward] forward query(qid:147, from:udp, 'google.com.onion') to trust group

这里报了一个remaining length is less than sizeof(dns_record): -21 < 10,触发了skip,没看懂这段代码。

@hapood
Copy link
Author

hapood commented Jul 12, 2024

2024-07-12 08:24:46 W [server.zig:855 on_timeout] query(qid:176, id:512, tag:gfw) from local://0#0 [timeout]
2024-07-12 08:24:46 W [server.zig:855 on_timeout] query(qid:178, id:3584, tag:gfw) from local://0#0 [timeout]

这里也不太懂,为什么local会有这么多的timeout,看代码本地不都是DNS缓存吗

@hapood hapood closed this as completed Jul 12, 2024
@hapood hapood reopened this Jul 12, 2024
@zfl9
Copy link
Owner

zfl9 commented Jul 12, 2024

there should be one and only one question: 0

这个说明 dns query 格式不对,按照 RFC,这里应该有一个 question。因此这是一个 bad msg。


8.8.8.8 报这个错误

msg length is out of range: 12 [17, 65535]

意思是 msg 长度太小了,dns header 长度刚好是 12 字节,所以这里缺少一个 question。问题同上,bad msg,可能是网络错误,也可能是 udp 包被修改了,不得而知。


2024-07-12 08:21:23 I [server.zig:322 QueryLog.query] query(id:3584, tag:gfw, qtype:1, 'google.com') from 192.168.88.178#54661
2024-07-12 08:21:23 E [dns.c:186 skip_name] remaining length is less than sizeof(dns_record): -21 < 10

这里的 skip_name 用于跳过 query msg 的 question.name,skip 后发现长度不够了(负数),说明是个 bad msg。

UPDATE: 补充一下,skip_name 用于 dns msg 的解析过程,因为域名在 msg 中是变长的,此函数用途就是“解析 msg 中的下一个 name 字段,并跳过,所谓跳过就是移动 ptr 和 len”,没有什么深奥的含义,如果你熟悉 dns msg 的格式,就很好理解了。

@zfl9
Copy link
Owner

zfl9 commented Jul 12, 2024

2024-07-12 08:24:46 W [server.zig:855 on_timeout] query(qid:176, id:512, tag:gfw) from local://0#0 [timeout] 2024-07-12 08:24:46 W [server.zig:855 on_timeout] query(qid:178, id:3584, tag:gfw) from local://0#0 [timeout]

这里也不太懂,为什么local会有这么多的timeout,看代码本地不都是DNS缓存吗

这个是 response timeout 的打印,from local://0#0local://0#0 其实是 query 的“发起方地址”,由于这是 DNS cache refresh 触发的 dns 请求,没有 client address (udp:// or tcp://),所以打印格式就是 local://0#0,不用在意这个细节。

@hapood
Copy link
Author

hapood commented Jul 12, 2024

多谢,我猜测大概是三星手机用了这个异常的dns请求来做网络连通性的测试?所以才会导致网络异常时持续做这个请求导致的ddos。
只要dns返回结果给了三星手机,哪怕是错误,三星手机也认为网络是通的,就不会再请求了。

@zfl9
Copy link
Owner

zfl9 commented Jul 12, 2024

只要dns返回结果给了三星手机,哪怕是错误,三星手机也认为网络是通的,就不会再请求了。

也许是的,你可以本地做个小改动,测试下,将它发来的 bad query msg(check_msg 返回 false)修改后,原样丢回去,看看是否不再疯狂请求 dns。如果这样改动后问题解决,那我后面也会加入这个逻辑,避免类似问题发生。

这里说的改动是指:将 query msg 转为“合法的” reply msg(不带上 question section,因此实际上不是“合法的”,但该客户端认为它“合法”,这就足够了),具体可以参考 dns.c 的 dns_empty_reply 实现,注意把 question_count 改为0,msg_minlen 改为 sizeof(struct dns_header),也就是只需要包含 12 byte 的 dns header。

@zfl9 zfl9 changed the title 能否增加调试开关,将dns.c中的log.error打印出来,目前我的galaxy手机每天都有大量的DNS请求报错,但是不清楚原因 【提高兼容性】对于收到的每个 query msg,都尽量进行”回复“,即使是 bad msg Jul 12, 2024
@zfl9 zfl9 changed the title 【提高兼容性】对于收到的每个 query msg,都尽量进行”回复“,即使是 bad msg 【兼容性】对于收到的每个 query msg,都尽量进行”回复“,即使是 bad msg Jul 12, 2024
@zfl9
Copy link
Owner

zfl9 commented Jul 12, 2024

我待会在 dev 分支先改一下,你待会切换到 dev 分支,编译然后放上去试试。

@hapood
Copy link
Author

hapood commented Jul 12, 2024

我这边观察发生大量非法dns请求的时候,都是网络真的出问题的时候(比如ss的服务端无法访问,光猫拨号掉线),其他时候只是零星会触发这个question count的报错,大概是因为三星手机检测网络的逻辑并非dns一种,其他方式检测成功也会忽略这个dns返回。

请问Google和cloudflare的dns对这种异常dns请求是怎么处理的呢?

@zfl9
Copy link
Owner

zfl9 commented Jul 12, 2024

请问Google和cloudflare的dns对这种异常dns是怎么处理的呢?

不好说,要自己写代码构造一个这样的 bad msg 进行测试才知道。

@zfl9
Copy link
Owner

zfl9 commented Jul 12, 2024

无论如何,也就两种处理方式:

  • 静默丢弃,不回任何东西,就像现在 chinadns-ng 所做的这样
  • 给对方回一个 reply msg,也就是我待会准备在 dev 分支所做的这样

目前根据你的反馈,前者可能导致客户端一直重试这个请求,所以我准备尝试下后者。

@zfl9
Copy link
Owner

zfl9 commented Jul 12, 2024

试试 dev 分支,记得 pull 到最新。

@hapood
Copy link
Author

hapood commented Jul 13, 2024

看起来之前的问题的确解决了,三星不再频繁发送那个奇怪的DNS request了。
但是今天启动了一个新的网络设备,TCL电视机(192.168.88.234),在持续发送奇怪的DNS请求,日志如下

另外dev代码是不是也可以加上 invalid query 的ip来源,目前是没有的。

2024-07-13` 00:37:01 E [dns.c:115 check_msg] query msg should not have the TC flag set
2024-07-13 00:37:01 W [server.zig:394 on_query] dns.check_query(fd:6) failed: invalid query msg from 192.168.88.234#22303
2024-07-13 00:37:01 E [dns.c:115 check_msg] query msg should not have the TC flag set
2024-07-13 00:37:01 W [server.zig:394 on_query] dns.check_query(fd:6) failed: invalid query msg from 192.168.88.234#59359
2024-07-13 00:37:01 E [dns.c:115 check_msg] query msg should not have the TC flag set
2024-07-13 00:37:01 W [server.zig:394 on_query] dns.check_query(fd:6) failed: invalid query msg from 192.168.88.234#5719
2024-07-13 00:37:03 E [dns.c:115 check_msg] query msg should not have the TC flag set
2024-07-13 00:37:03 W [server.zig:394 on_query] dns.check_query(fd:6) failed: invalid query msg from 192.168.88.234#22303
2024-07-13 00:37:03 E [dns.c:115 check_msg] query msg should not have the TC flag set
2024-07-13 00:37:03 W [server.zig:394 on_query] dns.check_query(fd:6) failed: invalid query msg from 192.168.88.234#59359
2024-07-13 00:37:03 E [dns.c:115 check_msg] query msg should not have the TC flag set
2024-07-13 00:37:03 W [server.zig:394 on_query] dns.check_query(fd:6) failed: invalid query msg from 192.168.88.234#5719

@zfl9
Copy link
Owner

zfl9 commented Jul 13, 2024

query msg中的tc标志问题,也可以做下兼容,也就是接受这个query(并在chinadns这边把tc标志抹了)。

你说的check_failed的srcaddr打印确实应该加上,之前可能漏了。

zfl9 added a commit that referenced this issue Jul 13, 2024
@zfl9
Copy link
Owner

zfl9 commented Jul 13, 2024

继续测试最新 dev 分支,记得 pull 到最新。

@hapood
Copy link
Author

hapood commented Jul 13, 2024

谢谢,现在日志看没有任何报错了

@zfl9
Copy link
Owner

zfl9 commented Jul 14, 2024

好,后面merge到master,更新下版本。

@zfl9 zfl9 closed this as completed in 60a435d Jul 16, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants