Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

v2ray的TLS流量可被简单特征码匹配精准识别(附PoC) #704

Closed
p4gefau1t opened this issue May 30, 2020 · 112 comments
Closed

v2ray的TLS流量可被简单特征码匹配精准识别(附PoC) #704

p4gefau1t opened this issue May 30, 2020 · 112 comments
Labels

Comments

@p4gefau1t
Copy link

这个issue应该不能算bug report,但是也没有找到合适的模板,所以没有使用模板,抱歉。

先说结论:仅凭tls client hello的cipher suite字段,就可以非常准确地将v2ray流量和正常浏览器流量区分开来。

PoC(来自@DuckSoft),此iptables规则可封禁所有v2ray的allowInsecureCiphers设置为false(默认设置)的出站TLS流量,而其他TLS流量不受影响:

iptables -I OUTPUT -m string --algo kmp --hex-string "|001ecca8cca9c02fc02bc030c02cc027c013c023c009c014c00a130113031302|" -j DROP

在新的版本释出之前,个人建议的缓解措施是客户端将allowInsecureCiphers设置为true。服务端建议可以拒绝所有具有此特征的TCP连接,以强迫所有客户端更新版本(但此举可能导致服务器遭到主动探测)。


下面是发现的过程和分析。

实验的启发来自下面这篇文章,其中提到了使用机器学习训练的模型可对v2ray的tls+ws流量进行识别,准确率高达0.9999

https://fr33land.net/2020/03/12/can-enable-tls-in-v2ray-help/

他训练的模型已经开源,仓库如下

https://github.com/rickyzhang82/V2Ray-Deep-Packet-Inspection

经过本地测试,可以复现。并且不限于tls+ws,对tls+vmess等组合也同样有效。其他tls流量如浏览器流量等,全程没有出现误报情况。

因此初步怀疑是v2ray使用的utls进行client hello伪造出现的问题。

https://github.com/v2ray/v2ray-core/blob/edb4fed387d27890902e7ee97aae0d97292f912b/transport/internet/tls/config.go#L176-L230

此处使用的cipher suite,可能出于安全目的,使用了一组特殊的组合,而与绝大多数浏览器不同 。为了对比,下面是utls模拟的chrome的client hello。

https://github.com/refraction-networking/utls/blob/43c36d3c1f57546d5cbb05c066df7b5a78686c51/u_parrots.go#L141-L214

抓包可以发现,对比真实的chrome与utls的client hello,两者基本一致,但与v2ray的存在较大差别,其中包括suite和extension的差别。此后,我们将utls的chrome的cipher suite patch到v2ray中后,此模型无法识别v2ray的tls流量

所以我们可以初步认为,模型很可能是学习了tls client hello的特征,导致流量被识别。

但实际上,识别tls client hello并不需要使用机器学习的方法,简单的DPI即可实现,因此在gfw部署的成本很低。并且,由于这组cipher suites太过特殊,我们可以仅凭cipher suites进行准确识别。

顺带一提,cipher suites列表在代码中的顺序,和实际的发送的client hello中的顺序似乎是相反的,不知这是有意为之还是bug。

个人建议,客户端依旧使用utls,但应该伪造chrome/firefox的浏览器client hello,AllowInsecureCiphers仅对服务器生效,由服务器限制不安全的cipher suites。

@DuckSoft
Copy link

DuckSoft commented May 30, 2020

因为 Cipher Suite 特征过于明显,随手撸一个从 0x90 偏移量开始的 memcmp 都能精准高效识别:
图片

“特征码”:cca8cca9c02fc02bc030c02cc027c013c023c009c014c00a130113031302

甚至可以利用 iptables 进行明文匹配……其他的 PoC 方式请大家自己开动脑筋……

代码参考:

iptables -I OUTPUT -m string --algo kmp --hex-string "|001ecca8cca9c02fc02bc030c02cc027c013c023c009c014c00a130113031302|" -j DROP

Upvote, and similar to v2ray/v2ray-core#1660, v2ray/v2ray-core#2098

@ghost
Copy link

ghost commented May 30, 2020

Confirmed with V2ray 4.23 on Archlinux:

image

image

EVEN The server is not being configured to use TLS.
The issue occurs on the INITIAL TLS handshake packet. Which means the buggy Client Hello Message will always be sent if AllowInsecureCiphers is set to false, which is the default value

I suggest this issue go top priority.

@p4gefau1t p4gefau1t changed the title 关于v2ray的TLS流量可被机器学习识别的PoC复现,以及原理讨论 v2ray的TLS流量可被简单特征码匹配精准识别(附PoC) May 30, 2020
@proletarius101
Copy link

proletarius101 commented May 30, 2020

This issue has been addressed by Tor and Naiveproxy. It's great that someone realize this undermined problem in v2ray. However, it's not that easy to completely remove the attack surface unless we integrate related components of a popular browsers and keep it up-to-date, like what Tor and Naiveproxy do. (Because if we take a closer look, the connection behavior could be a strong fingerprint).

@proletarius101
Copy link

proletarius101 commented May 30, 2020

Btw, generally speaking, it's simply a fingerprint of most go programs. So it depends on what we want mock. Apparently go programs have different connection behavior than Firefox, for example.

@ghost
Copy link

ghost commented May 30, 2020

IMO, use TLS library's default setting would resolve this problem: We're same as other Go program in this condition.

Btw, generally speaking, it's simply a fingerprint of most go programs. So it depends on what we want mock. Apparently go programs have different connection behavior from Firefox, for example.

@proletarius101
Copy link

IMO, use TLS library's default setting would resolve this problem: We're same as other Go program in this condition.

Agreed

@hanazaki05
Copy link

There is an utls enabled version for fingerprint issue, you need to inspect the codes by yourself to ensure it's safe.
https://github.com/emc2314/v2ray-core

@vcptr
Copy link

vcptr commented May 30, 2020

you don't need machine learning to find something unique in plain words. TLS protocol itself is plaintext in the network.

Different browser/version has their own unique fingerprints. There's a project collecting TLS fingerprints and do the statistic works. see https://tlsfingerprint.io/

@p4gefau1t
Copy link
Author

p4gefau1t commented May 30, 2020

IMO, use TLS library's default setting would resolve this problem: We're same as other Go program in this condition.

Btw, generally speaking, it's simply a fingerprint of most go programs. So it depends on what we want mock. Apparently go programs have different connection behavior from Firefox, for example.

Agreed. V2ray misused utls library, and blocked some "insecure" ciphers, and make it easy to detect.

The easiest way to solve this is to use net/tls default settings. But I think the best way to fix this is to use utls anti-fingerprinting features correctly.

https://github.com/refraction-networking/utls

@darhwa
Copy link

darhwa commented May 30, 2020

我倒是好奇当初是基于何种考虑,要增加这么一个默认的CipherSuites列表呢?

如果客户端与服务器都处于自己控制之下, 不管CipherSuites改不改,协商出来的只会是TLS 1.3的那三件套。如果服务器本身不安全,那改这么个CipherSuites列表又能起到什么鸟用?

我怀疑当初加进这个的人,根本就没弄清楚TLS握手的过程。 v2ray/v2ray-core#2477 就是个最近的例子。

另外我再补充一点,golang默认设置里面,ClientSessionCache也是没有的,建议一并拿掉。光把那个CipherSuites列表拿掉,还是会比一般的golang客户端的ClientHello多一个session_ticket的extension。

补充第二点,目前的alpn设置也非常奇葩。好几个地方设置成单有h2。试问有哪些主流应用会在客户端设置单有h2的alpn?这也可算是一个显著特征。

@KevinZonda
Copy link

@klzgrad you may be interested

@mnihyc
Copy link

mnihyc commented May 30, 2020

图片
v2ws + tls(apache2) 复现成功,本来还以为套 apache2 没事,没想到直接传递了......

@DuckSoft
Copy link

图片
v2ws + tls(apache2) 复现成功,本来还以为套 apache2 没事,没想到直接传递了......

所以说,这次的问题是客户端当了猪队友,发了一个特征极强的 Client Hello 给服务器

@mnihyc
Copy link

mnihyc commented May 30, 2020

图片
v2ws + tls(apache2) 复现成功,本来还以为套 apache2 没事,没想到直接传递了......

所以说,这次的问题是客户端当了猪队友,发了一个特征极强的 Client Hello 给服务器

确实,可以考虑把 v2 换掉了,TLS 的替换方案也不少(指比较冷门的)

@kotori2
Copy link

kotori2 commented May 30, 2020

@studentmain But since very few apps using Go, it is still kind of easy to detect from it's TLS fingerprint.

@proletarius101
Copy link

proletarius101 commented May 30, 2020

确实,可以考虑把 v2 换掉了,TLS 的替换方案也不少(指比较冷门的)

This defect also applies to Trojan which openssl + customized client TLS configuration.

Traditional Https proxy is also potentially detectable because of small handshake packages in TLS.

@StarryVoid
Copy link

StarryVoid commented May 31, 2020

用 V2rayN v3.18 + V2ray-core v4.23.1 测试
使用 TLS 的多种配置中,仅发现 vmess+h2+tls 默认没有 Client Hello
服务端抓包命令 tcpdump -ni eth0 "tcp port 443 and (tcp[((tcp[12] & 0xf0) >> 2)] = 0x16)"

@StarryVoid
Copy link

StarryVoid commented May 31, 2020

另附一份 Cloudflare 的 TLS Client Hello cipher suites 对照表
https://raw.githubusercontent.com/cloudflare/mitmengine/master/reference_fingerprints/mitmengine/browser.txt

@ghost
Copy link

ghost commented May 31, 2020

目前的问题不在于使用 Go 或者是其它语言/库来实现 TLS

V2ray-Core 当前的问题是: 在不开 AllowInsecureCiphers 时,硬编码了几个安全的 TLS 加密套件导致 Client Hello 中的 Ciphers 成为 V2ray 的流量指纹

问题的本身不在于 TLS 握手包特征,而是 V2ray 错误地使用了 TLS 库

@fdmove
Copy link

fdmove commented May 31, 2020

顺带一提,cipher suites列表在代码中的顺序,和实际的发送的client hello中的顺序似乎是相反的,不知这是有意为之还是bug。
-->
这个应该是为了优先使用TLS1.3

@xiaokangwang
Copy link

现在的情况是V2用的TLS的库是Go语言的TLS库,所以肯定在行为上和OpenSSL有一定区别。现在可以做的就是首先先改成使用Go语言默认的加密套件。
长期的来看,如果想解决这个问题比较容易的方法还是在客户端本地用使用了OpenSSL库的程序转发一下这个流量。比如 ncat -l localhost 1234 --sh-exec "ncat --ssl v2ray.example.com 1234"

@GoldJohnKing
Copy link

私以为使用Go的默认TLS库已经足够,毕竟考虑到Go的默认TLS库的使用范围足够广,理应不会直接将所有Go程序一并干掉……所以只要和其他广泛使用的Go程序的TLS特征一致或相似,就足够了。不过本地整个OpenSSL当然是更好的。

@maidmeow4
Copy link

reproduced
image

@rickyzhang82
Copy link

@p4gefau1t @DuckSoft

Great jobs, guys! I'm not an expert on TLS. I really appreciate your finding this!

  • IMO, V2ray uses the stock version TLS implementation from Golang. It doesn't use utls.
  • The V2Ray client side needs to blend in its TLS traffic as whatever popular web browser client in mainland China. In this way, people have a better chance to circumvent GFW.
  • The V2Ray server side needs to defend active probe from CCP, which is reported by shadowshock researcher in here. The gfw.report is down now. But you can access it from archive.

So far, I see the proposal only address the client side. Any concern on server side that can be probed by CCP due to TLS handshake leaking?

@DuckSoft
Copy link

@p4gefau1t @DuckSoft

Great jobs, guys! I'm not an expert on TLS. I really appreciate your finding this!

* IMO, V2ray uses the stock version TLS implementation from Golang. It doesn't use [utls](https://github.com/refraction-networking/utls).

* The V2Ray client side needs to blend in its TLS traffic as whatever popular web browser client in mainland China. In this way, people have a better chance to circumvent GFW.

* The V2Ray server side needs to defend active probe from CCP, which is reported by [shadowshock researcher in here](https://web.archive.org/web/20200416083158/https://gfw.report/). The gfw.report is down now. But you can access it from archive.

So far, I see the proposal only address the client side. Any concern on server side that can be probed by CCP due to TLS handshake leaking?

To be honest, I contributed only to validation and implementation, and the original idea was from @p4gefau1t, so he's the very superb man. Server side may also be probed, but under the shelter of nginx/caddy, the fingerprint will be eliminated. As long as you keep your endpoint address secret, GFW can't discover any difference.

But the cruel thing is: It's not server's fault. So far it seems that, it's your client that stirred trouble and shout at your server: "come on boy, I need to circumvent GFW!" I don't know how can this affect servers, but I can see someone will broadcast this very payload to scan competitors' servers. That shall also be considered.

@lp123sun
Copy link

lp123sun commented May 31, 2020

Websocket + TLS +Nginx路径分流,也受影响吗?

@DuckSoft
Copy link

Websocket + TLS +Nginx路径分流,也受影响吗

影响。

@icebluey
Copy link

icebluey commented Jun 1, 2020

@tomac4t 的表格也证明了,v4.23.2并没有真正解决TLS指纹独特性的问题。v4.23.2指纹的出现次数小于100,估计都是v2ray用户提交的。原因如我前面提到过的,alpn采用了特殊值,以及使用了ClientSessionCache。

以下iptables规则可以精准阻断使用了TLS连接方式的最新v4.23.2版本v2ray(无论是ws还是h2还是tcp模式),并且不影响其他golang程序:

iptables -N GOLANG
iptables -A GOLANG -m string --algo bm ! --hex-string "|0010000e000c02683208687474702f312e31|" -j DROP
iptables -A OUTPUT -m string --algo bm --hex-string "|0026c02fc030c02bc02ccca8cca9c013c009c014c00a009c009d002f0035c012000a130113031302|" -j GOLANG

有兴趣的同学可以试试。

鉴于是否使用uTLS还在讨论/测试中,期盼官方能尽快合并 v2ray/v2ray-core#2521 或类似的补丁,先解燃眉之急。毕竟敏感日期将至,谁都不知道墙更新规则的速度有多快。

@nicholascw

http/1.1 only 的代碼:
0010000b000908687474702f312e31
h2+http/1.1 的代碼:
0010000e000c02683208687474702f312e31

你說的alpn采用了特殊值應該是http/1.1 only的情況吧。
https://tlsfingerprint.io/alpn 看,http/1.1 only 的情況也不少,爲什麼能當做特徵看?
新發布的 4.23.3 和 4.23.2 一樣的,都是http/1.1,似乎沒有要改的意思啊。

@darhwa

@darhwa
Copy link

darhwa commented Jun 2, 2020

@icebluey 使用http/1.1 only的确实不少,但那基本都是旧版本的浏览器。golang程序使用http/1.1 only的又有多少呢?更准确地说,启用了TLS 1.3的golang程序还在用http/1.1 only的有多少呢?

@cjwddtc

This comment has been minimized.

@rickyzhang82

This comment has been minimized.

@cjwddtc

This comment has been minimized.

@toymil

This comment has been minimized.

@cjwddtc

This comment has been minimized.

@toymil

This comment has been minimized.

@rickyzhang82
Copy link

你说的『不代表所有翻墙的人都「against censorship」』你自己觉得逻辑在线吗?如果它不 against censorship,它为什么不满足于 GFW 的审查而要翻墙获取外面的信息呢?

Thank you! Finally, I met someone who can reason with logic unlike fifty cents party.

I'm done with discussion here. I can smell some accounts are fishy.

I want to state that the current fix is a sub-optimal solution given the fact that there are extremely small number of Go app as client within and beyond GFW. The commit only replaced the fixed ordered list of hard coded ciphers with a default Golang one. Thus, it blends in V2Ray traffic as Go app.

But please name me some client app written in Go that GFW doesn't want to ban. Docker CLI? Keep name it. There are NONE, no?

We need to investigate utls and see if it can help.

@proletarius101
Copy link

We need to investigate utls and see if it can help.

Parroting is supposed to work. The leaked info is limited to ClientHello (in this TLS related defect). Parroting it is enough to fix this problem. Further investigation should be taken on fully parroting Chrome or whatever popular browser in China, since the connection behavior including connection termination, error handling, etc.

On the contrary, randomization makes you suspicious, unless it really does blacklisting. The GFW's approach is more like multi-layered whitelisting. It scores the traffic and reacts.

@1265578519

This comment has been minimized.

@ghost
Copy link

ghost commented Jun 28, 2020

Did anyone mention naiveproxy already?

klzgrad/naiveproxy#94

klzgrad/naiveproxy#94 (comment)

#754

#754 (comment)

@github-actions
Copy link

This issue is stale because it has been open 90 days with no activity. Remove stale label or comment or this will be closed in 5 days

@sparkertim

This comment was marked as off-topic.

@rickyzhang82

This comment was marked as off-topic.

@sparkertim

This comment was marked as off-topic.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests