-
Notifications
You must be signed in to change notification settings - Fork 4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
splithttp transport #3412
splithttp transport #3412
Conversation
**how it works:** * each transported stream is identified by a random UUID `<uuid>` * download using long-living chunked-transfer from `GET /?session=<uuid>` * upload using `POST /?session=<uuid>` **personal design goals:** * transport that can pass more CDNs, CDN does not have to support websocket * support of not only CDNs, but anything that can proxy HTTP * someday, in the future: ability to split up/download to different CDN (without dialerProxy/loopback tricks) * simpler than meek **implementation notes, concerns:** * putting session ID in URL achieves two goals: cache-busting, and avoiding requirement that the HTTP proxy forwards HTTP headers correctly. this can be debated. * no decoupling of connection lifetime from inner stream lifetime. particularly, CDNs may enforce a total response timeout, which would affect connection quality. on the other hand, this makes it easier to prevent DoS attacks: the in-memory map for storing sessions is always bounded by the number of physical TCP connections * connection upload can be hijacked **without active MITM**, by reading the UUID passively and using it from another machine. I don't think this is a practical issue, the answer is always to use TLS in any case. * i noticed a lot of HTTP-ish transports write to the socket directly. However, I need HTTP connection reuse for performance, so I want to use http.Client. * I have a lot of trouble getting `http.Client` to work correctly with uTLS. as a result, ALPN protocol negotiation is not correctly implemented, and HTTP2 prior knowledge is assumed for https://. For http://, HTTP/1.1 is used anyway. **usage notes:** On server and client: ``` "streamSettings": { "network": "splithttp", "splithttpSettings": { "host": "example.com", "path": "/" } } ``` * if the connection hangs, most likely the CDN is buffering the response body of `GET`. i found that nginx and apache respond to `X-Accel-Buffering: no` response header and disable buffering.
Add test file
How many tested CDNs? |
In theory, this could solve many problems in Iran. |
I have tested two of the major ones + [redacted] I will check the tests you added |
@mmmray Don't mind about the test fail. It didn't happen in new test file, it's sometimes like this, not your problem(((( |
docs: XTLS/Xray-docs-next#514 |
it seems that the tests actually fail because they attempt to use the same ports as the websocket tests, while running concurrently with them
I will hardcode different port numbers, but I wonder if pulling in another dependency go-reuseport is ok for testing? |
Thanks for your great work @mmmray ! |
lol that is fast @mmmray |
The idea of PickPort() is that it tries to listen a port, and then quickly release self and pass this port number to the user. It is a simple yet proven solution. I highly recommend it over magic like SO_REUSEPORT ;) |
My bad. i already tried it before and saw some "address already in use" errors in logs. I can't reproduce them anymore, my code must have been faulty. |
非常感谢 PR!我觉得实现这样的传输层,最大的挑战在于 客户端的请求不一定会按顺序到达服务端,有针对这种情况做处理吗? |
另外目前对于 h2 的情况是一个连接占一条 h2 还是多个连接复用同一条 h2 |
the start of one HTTP request is hard-blocked on the successful 200 OK response of the previous data chunk. basically, there is a single
ah, currently there is one entire |
这样处理比较简单,但效率非常低,且无法应对实时性要求比较高的应用,实用性会打折扣,试试实现一个序号与服务端缓存机制 (不过对于 h1 好像只能是现在这样)
目前是你说的那种处理方式的话确实无法多个连接复用同一条 h2,换成我说的那种方式的话可以,代码参考 h2 transport |
比如说你可以让同一个连接用一个共同的 UUID 再加上序号,0 代表下行,1 2 3 4 5... 代表上行,一直发,不用等服务端返回 200 |
Reducing the bandwidth overhead is already attempted by batching fast sequential I find the upload speedtest to be okay, but it is definitely lower than websocket.
Ah, I see. I'm new to Go, so I didn't really understand how things like this can be used as map keys. I will just copy it. |
不是为了这件事,而是客户端要等服务端回复后再发第二个包的话,会造成高延迟和低带宽,所以 TCP 不会等 ACK 再发第二个包 |
好像找错地方了,相关代码在 这里,查看 |
it seems the answer is, golang does a lot of dynamic type checking to determine equality: https://stackoverflow.com/a/48289238/1544347 |
type dialerConf struct {
net.Destination
*internet.MemoryStreamConfig
} 其实 比如说两个出站的配置一样,但指针肯定不一样,还是会被认定为两个出站, |
I have run a single test file several times on my own device without any issues. I forgot that using the same port may cause them to fight 🙈 |
可以读取但不能阻塞下一个请求,否则需要被改变
是有这种可能,还有可能服务端偶尔会漏收一两个数据,所以客户端需要新建协程来等待 response 并重发数据,比如最多 3 次 对于 H2 和 HTTP/1.1 都可以有这个机制,但是会有点占内存,服务端那个基于序号的缓存机制也是如此 |
不过客户端这个重发机制是否有必要,需要实测,我觉得没被视作攻击时服务端基本上不会漏收数据,而被视作攻击时重发也没用 |
it can be tried.. should it be done in this PR? I think in order to implement this, http retransmission is automatically done by |
|
I will give it a try, but am not confident I will succeed quickly. I believe the code complexity will increase by a lot, and I think the code is already too complex. |
@yuhan6665 将那些信息转发给我了。我希望减少 HTTP/1.1 的连接数是因为 it's the right way to go,减少连接数不仅更优雅,还可以减少对资源的需求从而可以在更苛刻的条件下工作,当然还可能可以少吸引些来自 GFW 的注意。但如果这件事实在难以在短期内实现且伊朗人急需这个新 transport,我们可以先合并了这个 PR 并发一个新版本,@mmmray 你来决定。 |
@RPRX the upload optimization is now implemented for HTTP/1.1 specifically. for HTTP/2, the code is unchanged, because it never had lots of connections on upload anyway, and I am a bit afraid of regressions. I think the transport can now be merged. I think in the future this code needs some cleanup, but I want to defer more work until there is more feedback. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me!
LGTM 虽然我没有看代码 但是代码能跑就是好样的 这就发版+更新分享链接标准 不过 建议你在文档中写一下实现细节 方便其它软件实现这个 transport |
thanks rprx for releasing! thanks rprx and yuhan for reviewing and contributing! thanks fangliding for hotfixing docs!! EDIT: for implementors (but maybe also for people who need to debug their CDN) can be moved to developer documentation, up to you |
看了 @mmmray 的文档和 @yuhan6665 的 release note 之后,我把新版标注为了 pre-release,有几个地方:
以上。 |
that's unfortunate. I think those issues should be caught in code review. Or maybe I should have updated the documentation more actively while the PR was still open, I am not sure.
The RTT issue is true. This was done intentionally to prevent somebody from opening arbitrary amount of session IDs and never closing them (similar to http2 rapid reset attack). This issue can be solved two ways:
which approach is preferred?
The initial prototype was this way, but moved it to querystring for consistency with other transports (because no other transport uses subpaths for anything). I was not sure if subpaths would be more difficult for some users to route correctly. I only encountered one CDN that needed explicit configuration.
I hear upload is much worse than websocket in Iran. Given throttling behavior of Iran GFW based on detected features, I did not want to make definitive statements about performance just yet.
This is true, but I tried v2fly meek as a user to make my own opinion of what protocol is viable. I think it's ok to give credit to research even if this design and code is different. In general, what do you think about fixing these issues in later releases with more settings? For example, a setting to change from querystring to paths, and a setting to opt into 0-RTT could be added in 1.8.16 if desired. It is currently not clear to me whether those issues are important, and more feedback from Iran is really needed. They are currently just waiting for mobile clients to update. Currently it seems the biggest issue is upload bandwidth. |
我不认为现有的实现能够“prevent somebody from opening arbitrary amount of session IDs and never closing them”,毕竟 GET 也没有任何的鉴权,所以先把这个不必要的 1-RTT 减掉,防止恶意请求的逻辑我们可以放到以后的版本中再实现 此外把传参改成路径即可,还有文档照我说的修改即可(上行效率已经尽可能高了),希望我们能于明天发布下一个版本 |
What i mean is, in order to have N concurrent sessions, currently the attacker needs to open N concurrent TCP connections (in case of HTTP/1.1) or N h2 streams. Because when the GET is closed, the session ID is also closed. But if upload can also open sessions, only N POSTs are needed to create N sessions, they do not need to be concurrent and can potentially be performed on a single TCP connection. The cost is different. Let's say you want 6 million open sessions, currently you have to maintain 6 million TCP connections, it's a lot of resources for the attacker. But 6 million POST requests with different UUID each is feasible. Anyway, you are right we can fix it later. I will open sessions on either POST or GET, and close when GET is closed. EDIT: I will still have to implement session TTL to prevent memory leaks. The client does not have to be malicious, it could simply lose internet connection after early POST. I can prepare the PR and doc fix tomorrow at earliest, I think the deadline may not be met unfortunately. |
I believe HTTP/2 is our prior concern, and therefore the cost of attack is very low, considering our current code. But, of course it's not bad to have "session TTL", and you have enough time to prepare the PR and doc fix. Just ASAP, thanks! |
I think everything is addressed in XTLS/Xray-docs-next#516 XTLS/Xray-docs-next#516 except this:
this is in ZH docs, need somebody else to do it |
|
是我瞎了 我看英文版本的时候意识到这是ws翻译过来的描述为了行文统一就去复制过来了没想到改漏了地方 我自己读了两三遍还没发现 |
干得好,终于看到tcp拆成多个http的实现了。我只会把流量转成http3,然后把udp包在不同连接上乱发,反正一切都有http3来维护 |
how it works:
<uuid>
GET /?session=<uuid>
POST /?session=<uuid>
personal design goals:
implementation notes, concerns:
putting session ID in URL achieves two goals: cache-busting, and avoiding requirement that the HTTP proxy forwards HTTP headers correctly. this can be debated.
no decoupling of connection lifetime from inner stream lifetime. particularly, CDNs may enforce a total response timeout, which would affect connection quality.
on the other hand, this makes it easier to prevent DoS attacks: the in-memory map for storing sessions is always bounded by the number of physical TCP connections
connection upload can be hijacked without active MITM, by reading the UUID passively and using it from another machine. I don't think this is a practical issue, the answer is always to use TLS in any case.
i noticed a lot of HTTP-ish transports write to the socket directly. However, I need HTTP connection reuse for performance, so I want to use http.Client.
I have a lot of trouble getting
http.Client
to work correctly with uTLS. as a result, ALPN protocol negotiation is not correctly implemented, and HTTP2 prior knowledge is assumed for https://. For http://, HTTP/1.1 is used anyway.usage notes:
On server and client:
GET
. i found that nginx and apache respond toX-Accel-Buffering: no
response header and disable buffering.