Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

splithttp transport #3412

Merged
merged 32 commits into from
Jun 18, 2024
Merged

splithttp transport #3412

merged 32 commits into from
Jun 18, 2024

Conversation

mmmray
Copy link
Collaborator

@mmmray mmmray commented Jun 1, 2024

how it works:

  • each transported stream is identified by a random UUID <uuid>
  • download using long-living chunked-transfer from GET /?session=<uuid>
  • upload using POST /?session=<uuid>

personal design goals:

implementation notes, concerns:

  • putting session ID in URL achieves two goals: cache-busting, and avoiding requirement that the HTTP proxy forwards HTTP headers correctly. this can be debated.

  • no decoupling of connection lifetime from inner stream lifetime. particularly, CDNs may enforce a total response timeout, which would affect connection quality.

    on the other hand, this makes it easier to prevent DoS attacks: the in-memory map for storing sessions is always bounded by the number of physical TCP connections

  • connection upload can be hijacked without active MITM, by reading the UUID passively and using it from another machine. I don't think this is a practical issue, the answer is always to use TLS in any case.

  • i noticed a lot of HTTP-ish transports write to the socket directly. However, I need HTTP connection reuse for performance, so I want to use http.Client.

  • I have a lot of trouble getting http.Client to work correctly with uTLS. as a result, ALPN protocol negotiation is not correctly implemented, and HTTP2 prior knowledge is assumed for https://. For http://, HTTP/1.1 is used anyway.

usage notes:

On server and client:

      "streamSettings": {
        "network": "splithttp",
        "splithttpSettings": {
          "host": "example.com",
          "path": "/"
        }
      }
  • if the connection hangs, most likely the CDN is buffering the response body of GET. i found that nginx and apache respond to X-Accel-Buffering: no response header and disable buffering.

mmmray and others added 4 commits June 2, 2024 00:28
**how it works:**

* each transported stream is identified by a random UUID `<uuid>`
* download using long-living chunked-transfer from `GET /?session=<uuid>`
* upload using `POST /?session=<uuid>`

**personal design goals:**

* transport that can pass more CDNs, CDN does not have to support
  websocket
* support of not only CDNs, but anything that can proxy HTTP
* someday, in the future: ability to split up/download to different CDN
  (without dialerProxy/loopback tricks)
* simpler than meek

**implementation notes, concerns:**

* putting session ID in URL achieves two goals: cache-busting, and
  avoiding requirement that the HTTP proxy forwards HTTP headers
  correctly. this can be debated.

* no decoupling of connection lifetime from inner stream lifetime.
  particularly, CDNs may enforce a total response timeout, which would
  affect connection quality.

  on the other hand, this makes it easier to prevent DoS attacks: the
  in-memory map for storing sessions is always bounded by the number of
  physical TCP connections

* connection upload can be hijacked **without active MITM**, by reading
  the UUID passively and using it from another machine. I don't think
  this is a practical issue, the answer is always to use TLS in any
  case.

* i noticed a lot of HTTP-ish transports write to the socket directly.
  However, I need HTTP connection reuse for performance, so I want to
  use http.Client.

* I have a lot of trouble getting `http.Client` to work correctly with
  uTLS. as a result, ALPN protocol negotiation is not correctly
  implemented, and HTTP2 prior knowledge is assumed for https://. For
  http://, HTTP/1.1 is used anyway.

**usage notes:**

On server and client:

```
      "streamSettings": {
        "network": "splithttp",
        "splithttpSettings": {
          "host": "example.com",
          "path": "/"
        }
      }
```

* if the connection hangs, most likely the CDN is buffering the response
  body of `GET`. i found that nginx and apache respond to
  `X-Accel-Buffering: no` response header and disable buffering.
@Fangliding
Copy link
Member

How many tested CDNs?

@irgfw
Copy link

irgfw commented Jun 2, 2024

In theory, this could solve many problems in Iran.

@mmmray
Copy link
Collaborator Author

mmmray commented Jun 2, 2024

How many tested CDNs?

I have tested two of the major ones + [redacted]

I will check the tests you added

@Fangliding
Copy link
Member

Fangliding commented Jun 2, 2024

@mmmray Don't mind about the test fail. It didn't happen in new test file, it's sometimes like this, not your problem((((

@mmmray
Copy link
Collaborator Author

mmmray commented Jun 2, 2024

docs: XTLS/Xray-docs-next#514

@mmmray
Copy link
Collaborator Author

mmmray commented Jun 3, 2024

it seems that the tests actually fail because they attempt to use the same ports as the websocket tests, while running concurrently with them

testing.servers.tcp.port.PickPort appears to be the correct solution (used in http2 transport), but the implementation is broken for me on Ubuntu. It is missing SO_REUSEPORT, but the implementation of that is platform-specific.

I will hardcode different port numbers, but I wonder if pulling in another dependency go-reuseport is ok for testing?

@yuhan6665
Copy link
Member

Thanks for your great work @mmmray !
@Fangliding I've been improving xray's test for a long time. I'm 99.9% sure the test failure are caused by your test. I notice sometimes Test_listenSHAndDial fail and sometimes Test_listenWSAndDial fail. Maybe they share something in common and it is due to run them in parallel? Please help to fix it
@RPRX please take a look before we can merge

@yuhan6665
Copy link
Member

lol that is fast @mmmray

@yuhan6665
Copy link
Member

it seems that the tests actually fail because they attempt to use the same ports as the websocket tests, while running concurrently with them

testing.servers.tcp.port.PickPort appears to be the correct solution (used in http2 transport), but the implementation is broken for me on Ubuntu. It is missing SO_REUSEPORT, but the implementation of that is platform-specific.

I will hardcode different port numbers, but I wonder if pulling in another dependency go-reuseport is ok for testing?

The idea of PickPort() is that it tries to listen a port, and then quickly release self and pass this port number to the user. It is a simple yet proven solution. I highly recommend it over magic like SO_REUSEPORT ;)

@mmmray
Copy link
Collaborator Author

mmmray commented Jun 3, 2024

My bad. i already tried it before and saw some "address already in use" errors in logs. I can't reproduce them anymore, my code must have been faulty.

@RPRX
Copy link
Member

RPRX commented Jun 3, 2024

非常感谢 PR!我觉得实现这样的传输层,最大的挑战在于 客户端的请求不一定会按顺序到达服务端,有针对这种情况做处理吗?

@RPRX
Copy link
Member

RPRX commented Jun 3, 2024

另外目前对于 h2 的情况是一个连接占一条 h2 还是多个连接复用同一条 h2

@mmmray
Copy link
Collaborator Author

mmmray commented Jun 3, 2024

may not necessarily arrive at the server in order . Is there any handling for this situation?

the start of one HTTP request is hard-blocked on the successful 200 OK response of the previous data chunk. basically, there is a single for { .. } goroutine in dialer.go that sends http requests synchronously.

is one connection occupying one h2 or multiple connections reuse the same h2?

ah, currently there is one entire http.Client per connection. this could be improved, but I am not sure if there is another way than global sync.Map? I think the http.Client would have to be created per-outbound somehow.

@RPRX
Copy link
Member

RPRX commented Jun 3, 2024

may not necessarily arrive at the server in order . Is there any handling for this situation?

the start of one HTTP request is hard-blocked on the successful 200 OK response of the previous data chunk. basically, there is a single for { .. } goroutine in dialer.go that sends http requests synchronously.

这样处理比较简单,但效率非常低,且无法应对实时性要求比较高的应用,实用性会打折扣,试试实现一个序号与服务端缓存机制

(不过对于 h1 好像只能是现在这样)

is one connection occupying one h2 or multiple connections reuse the same h2?

ah, currently there is one entire http.Client per connection. this could be improved, but I am not sure if there is another way than global sync.Map? I think the http.Client would have to be created per-outbound somehow.

目前是你说的那种处理方式的话确实无法多个连接复用同一条 h2,换成我说的那种方式的话可以,代码参考 h2 transport

@RPRX
Copy link
Member

RPRX commented Jun 3, 2024

比如说你可以让同一个连接用一个共同的 UUID 再加上序号,0 代表下行,1 2 3 4 5... 代表上行,一直发,不用等服务端返回 200

@mmmray
Copy link
Collaborator Author

mmmray commented Jun 3, 2024

Reducing the bandwidth overhead is already attempted by batching fast sequential conn.Write calls into fewer, larger HTTP requests. However, there is a (hardcoded) limit to how large those HTTP requests can be, and it forces a tradeoff between bandwidth and latency. If I understood you right, the concurrent uploads could help with this, so I will try it.

I find the upload speedtest to be okay, but it is definitely lower than websocket.

For the code, please refer to h2 transport.

Ah, I see. I'm new to Go, so I didn't really understand how things like this can be used as map keys. I will just copy it.

@RPRX
Copy link
Member

RPRX commented Jun 3, 2024

Reducing the bandwidth overhead

不是为了这件事,而是客户端要等服务端回复后再发第二个包的话,会造成高延迟和低带宽,所以 TCP 不会等 ACK 再发第二个包

@RPRX
Copy link
Member

RPRX commented Jun 3, 2024

For the code, please refer to h2 transport.

Ah, I see. I'm new to Go, so I didn't really understand how things like this can be used as map keys. I will just copy it.

好像找错地方了,相关代码在 这里,查看 globalDialerMap

@mmmray
Copy link
Collaborator Author

mmmray commented Jun 3, 2024

I have seen globalDialerMap, I am just confused because it can use dialerConf as key, and dialerConf contains *MemoryStreamConfig, which indirectly contains a bunch of interfaces. How can a hashmap use this as a key at all? But it is more of a Golang beginner question, as said I will just copy it.

it seems the answer is, golang does a lot of dynamic type checking to determine equality: https://stackoverflow.com/a/48289238/1544347

@RPRX
Copy link
Member

RPRX commented Jun 3, 2024

type dialerConf struct {
	net.Destination
	*internet.MemoryStreamConfig
}

其实 *internet.MemoryStreamConfig 就是个指针,用于区分不同的出站,这里应该是只对比了指针本身,没有深层对比

比如说两个出站的配置一样,但指针肯定不一样,还是会被认定为两个出站,深层对比的话认定为同一个出站就不好了

@Fangliding
Copy link
Member

Thanks for your great work @mmmray ! @Fangliding I've been improving xray's test for a long time. I'm 99.9% sure the test failure are caused by your test. I notice sometimes Test_listenSHAndDial fail and sometimes Test_listenWSAndDial fail. Maybe they share something in common and it is due to run them in parallel? Please help to fix it @RPRX please take a look before we can merge

I have run a single test file several times on my own device without any issues. I forgot that using the same port may cause them to fight 🙈

@Fangliding Fangliding mentioned this pull request Jun 3, 2024
1 task
@RPRX
Copy link
Member

RPRX commented Jun 12, 2024

it uses http.Client which always reads response headers after writing

可以读取但不能阻塞下一个请求,否则需要被改变

i think it is (theoretically) possible that it is perceived as attack by some servers

是有这种可能,还有可能服务端偶尔会漏收一两个数据,所以客户端需要新建协程来等待 response 并重发数据,比如最多 3 次

对于 H2 和 HTTP/1.1 都可以有这个机制,但是会有点占内存,服务端那个基于序号的缓存机制也是如此

@RPRX
Copy link
Member

RPRX commented Jun 12, 2024

不过客户端这个重发机制是否有必要,需要实测,我觉得没被视作攻击时服务端基本上不会漏收数据,而被视作攻击时重发也没用

@mmmray
Copy link
Collaborator Author

mmmray commented Jun 12, 2024

Can be read but cannot block the next request, otherwise it needs to be changed

it can be tried.. should it be done in this PR? I think in order to implement this, http.Client needs to be removed entirely from HTTP/1.1 mode, it is a bigger change. client.Do does not allow to write and read separately, it does it all at once.

http retransmission is automatically done by http.Client (without extra code), if it is removed, it may need to be reimplemented.

@RPRX
Copy link
Member

RPRX commented Jun 12, 2024

我不熟悉 Golang 的这些库,但我觉得对于 HTTP/1.1 应该有方便的可并发且不阻塞的 request 方法吧,不然的确很 inelegant

@mmmray
Copy link
Collaborator Author

mmmray commented Jun 12, 2024

I will give it a try, but am not confident I will succeed quickly. I believe the code complexity will increase by a lot, and I think the code is already too complex.

@RPRX
Copy link
Member

RPRX commented Jun 15, 2024

@yuhan6665 将那些信息转发给我了。我希望减少 HTTP/1.1 的连接数是因为 it's the right way to go,减少连接数不仅更优雅,还可以减少对资源的需求从而可以在更苛刻的条件下工作,当然还可能可以少吸引些来自 GFW 的注意。但如果这件事实在难以在短期内实现且伊朗人急需这个新 transport,我们可以先合并了这个 PR 并发一个新版本,@mmmray 你来决定。

@mmmray
Copy link
Collaborator Author

mmmray commented Jun 16, 2024

@RPRX the upload optimization is now implemented for HTTP/1.1 specifically. for HTTP/2, the code is unchanged, because it never had lots of connections on upload anyway, and I am a bit afraid of regressions. I think the transport can now be merged.

I think in the future this code needs some cleanup, but I want to defer more work until there is more feedback.

Copy link
Member

@yuhan6665 yuhan6665 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me!

@RPRX
Copy link
Member

RPRX commented Jun 18, 2024

LGTM 虽然我没有看代码 但是代码能跑就是好样的 这就发版+更新分享链接标准

不过 建议你在文档中写一下实现细节 方便其它软件实现这个 transport

@RPRX RPRX merged commit c10bd28 into XTLS:main Jun 18, 2024
34 checks passed
@RPRX
Copy link
Member

RPRX commented Jun 18, 2024

https://github.com/XTLS/Xray-core/releases/tag/v1.8.15 #716

@mmmray mmmray deleted the splithttp-pr branch June 18, 2024 06:59
@mmmray
Copy link
Collaborator Author

mmmray commented Jun 18, 2024

thanks rprx for releasing!

thanks rprx and yuhan for reviewing and contributing!

thanks fangliding for hotfixing docs!!

EDIT:
https://xtls.github.io/en/config/transports/splithttp.html#protocol-details

for implementors (but maybe also for people who need to debug their CDN)

can be moved to developer documentation, up to you

@RPRX
Copy link
Member

RPRX commented Jun 20, 2024

看了 @mmmray 的文档和 @yuhan6665 的 release note 之后,我把新版标注为了 pre-release,有几个地方:

  1. 据文档的“协议细节”所述,客户端会等到 GET 返回 200 后才发送上行数据,这会徒增 1-RTT,此处需要修改为直接开始发
  2. “CDN必须禁用缓存, 或者缓存不忽略查询字符串”,这里需要一个 breaking change,把所有 ? 都改为路径,即 POST prefix/UUID/0,这样的话 CDN 天然不会认为是同一个路径从而不会缓存,就是说全用路径而不要用任何传参
  3. 文档中关于 ALPN 的“已知问题”是过期描述,现在就是这么设计的,应该移到“协议细节”
  4. 文档中关于效率的问题,下行效率肯定拉满,需要标注,只是上行效率可能不如 WebSocket,但也不会差太多。@yuhan6665 提到了 meek 和 v2fly 的实现,就是说 SplitHTTP 和 meek 除了都能穿透不支持 WebSocket 的 CDN 外有什么很大的共同点吗?SplitHTTP 是基于我提出的实现方式,实现细节也是我在这里说的,我没有仔细研究过 meek 且我相信 @mmmray 的实现没有参考过半点 v2fly 的 meek 代码,就是说可以给 credit 但不要乱给 credit,此外文档需要标注至少下行效率比 meek 更高
  5. 文档中有两处 WebSocket 需要改为 SplitHTTP

以上。

@mmmray
Copy link
Collaborator Author

mmmray commented Jun 20, 2024

that's unfortunate. I think those issues should be caught in code review. Or maybe I should have updated the documentation more actively while the PR was still open, I am not sure.

According to the "Protocol Details" of the document, the client will wait until GET returns 200 before sending upstream data. This will increase 1-RTT. This needs to be modified to start sending directly.

The RTT issue is true. This was done intentionally to prevent somebody from opening arbitrary amount of session IDs and never closing them (similar to http2 rapid reset attack). This issue can be solved two ways:

  • put TTL and a total limit on open sessions, then POST requests can be sent before GET without DoS concerns
  • initial GET request contains early data, like websocket, but GET must still come before POST. It's not clear how this would work with the "no use of headers" constraint though.

which approach is preferred?

"The CDN must disable caching, or the cache does not ignore the query string", a breaking change is needed here to change all ?All are changed to paths, that is, POST prefix/UUID/0, in this case the CDN will naturally not think it is the same path and will not cache it. That is to say, use the entire path without any parameters.

The initial prototype was this way, but moved it to querystring for consistency with other transports (because no other transport uses subpaths for anything). I was not sure if subpaths would be more difficult for some users to route correctly. I only encountered one CDN that needed explicit configuration. Perhaps the documentation is too intimidating, pessimistic and preemptively mentions too many potential problems. Should it be moved to a Troubleshooting section?

Regarding efficiency issues in the document, the downlink efficiency is definitely full and needs to be marked.

I hear upload is much worse than websocket in Iran. Given throttling behavior of Iran GFW based on detected features, I did not want to make definitive statements about performance just yet.

The implementation does not refer to the meek code of v2fly

This is true, but I tried v2fly meek as a user to make my own opinion of what protocol is viable. I think it's ok to give credit to research even if this design and code is different.

In general, what do you think about fixing these issues in later releases with more settings? For example, a setting to change from querystring to paths, and a setting to opt into 0-RTT could be added in 1.8.16 if desired. It is currently not clear to me whether those issues are important, and more feedback from Iran is really needed. They are currently just waiting for mobile clients to update. Currently it seems the biggest issue is upload bandwidth.

@RPRX
Copy link
Member

RPRX commented Jun 20, 2024

我不认为现有的实现能够“prevent somebody from opening arbitrary amount of session IDs and never closing them”,毕竟 GET 也没有任何的鉴权,所以先把这个不必要的 1-RTT 减掉,防止恶意请求的逻辑我们可以放到以后的版本中再实现

此外把传参改成路径即可,还有文档照我说的修改即可(上行效率已经尽可能高了),希望我们能于明天发布下一个版本

@mmmray
Copy link
Collaborator Author

mmmray commented Jun 20, 2024

What i mean is, in order to have N concurrent sessions, currently the attacker needs to open N concurrent TCP connections (in case of HTTP/1.1) or N h2 streams. Because when the GET is closed, the session ID is also closed.

But if upload can also open sessions, only N POSTs are needed to create N sessions, they do not need to be concurrent and can potentially be performed on a single TCP connection.

The cost is different. Let's say you want 6 million open sessions, currently you have to maintain 6 million TCP connections, it's a lot of resources for the attacker. But 6 million POST requests with different UUID each is feasible.

Anyway, you are right we can fix it later. I will open sessions on either POST or GET, and close when GET is closed.

EDIT: I will still have to implement session TTL to prevent memory leaks. The client does not have to be malicious, it could simply lose internet connection after early POST.

I can prepare the PR and doc fix tomorrow at earliest, I think the deadline may not be met unfortunately.

@RPRX
Copy link
Member

RPRX commented Jun 20, 2024

I believe HTTP/2 is our prior concern, and therefore the cost of attack is very low, considering our current code.

But, of course it's not bad to have "session TTL", and you have enough time to prepare the PR and doc fix. Just ASAP, thanks!

@mmmray
Copy link
Collaborator Author

mmmray commented Jun 20, 2024

I think everything is addressed in XTLS/Xray-docs-next#516 XTLS/Xray-docs-next#516

except this:

There are two places in the document where WebSocket needs to be changed to SplitHTTP

this is in ZH docs, need somebody else to do it

@RPRX
Copy link
Member

RPRX commented Jun 20, 2024

this is in ZH docs, need somebody else to do it

@Fangliding

@Fangliding
Copy link
Member

是我瞎了 我看英文版本的时候意识到这是ws翻译过来的描述为了行文统一就去复制过来了没想到改漏了地方 我自己读了两三遍还没发现

@liushengqi000
Copy link

干得好,终于看到tcp拆成多个http的实现了。我只会把流量转成http3,然后把udp包在不同连接上乱发,反正一切都有http3来维护

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants