-
Notifications
You must be signed in to change notification settings - Fork 199
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add token into stream v2 #47
base: master
Are you sure you want to change the base?
Conversation
…owReadBlocking test
Codecov Report
@@ Coverage Diff @@
## master #47 +/- ##
==========================================
+ Coverage 87.5% 89.83% +2.33%
==========================================
Files 4 4
Lines 392 531 +139
==========================================
+ Hits 343 477 +134
- Misses 44 49 +5
Partials 5 5
Continue to review full report at Codecov.
|
给力给力! |
https://github.com/cs8425/smux/blob/move-token-to-stream2/stream.go#L252 |
没办法啊,比如说服务器端与客户端这样的业务,同时存在新旧的客户端。 |
继续之前的讨论,你说要 “保證每個stream cmdSYN >> cmdPSH >> cmdFIN的封包順序”, cmdSYN 优先第一,这个现在代码就能保证了吧?因为只有 cmdSYN 之后,才能 OpenStream 或者 AcceptStream,后面只要保证先 Write 的 cmdPSH 比后 Write 的 cmdFIN 快就行了吧? 思路不变,但补充:
|
目前改進的有:
就算伺服端更新了新版 我認為2.2是比較好的選擇 至於1完全不相容是因為舊版使用比較激進的錯誤處理方式(強制斷線) |
除非是新特性实在跟旧客户端冲突,免得兼容的意义还是挺大的。毕竟强制所有用户更新客户端,从产品或业务的角度讲,伤害很大。我想的是如何更好的较缓和的升级服务器端。 兼容这个问题可以先放着,等解决了所有问题咱们再回头来看,看能达到什么兼容程度。 “保證cmdSYN >> cmdPSH >> cmdFIN的封包順序” 这个问题现在解决了吗? |
“保證cmdSYN >> cmdPSH >> cmdFIN的封包順序”這個應該是解決了
|
恩,看起来很简结,给力给力。 |
感觉用 go test -race 还是会死锁,不知道问题在哪里。代码看起来都很 OK 的。 |
-race我測是有過的啊 另外go v1.9.x的 |
感谢提供思路,我测试是超过 10 分钟没输出结果,就自动 killed 的了。 |
--- FAIL: TestSlowReadBlocking (3.57s)
总结下,卡住的地方有: |
我好像又發現原版的一個race問題 此問題容易發生在session.bucket很小 |
关于这一点:"我好像又發現原版的一個race問題"——好像没问题。 我把 |
不對 應該是不算... 補個卡住時候的輸出:
|
bucketNotify 毕竟是为 1 的 channel,只要发送信号,recvLoop 肯定能接收一次的。 session_test.go:1406: session.bucket 0 session.streams.len 4 这一行看,是不是 session.bucket 为 0 所以卡在 <-s.bucketNotify,这样是合理的。 内心想法是这样: 反过来如果现实 session.bucket 被卡住且为 0,那么就只能是所有的 stream 都没有去读,让 bucket 数据耗空了。 stream.chReadEvent 也是 size 为 1 的 channel,也就是说上一次 session.bucket 为 0 之前,肯定往一个 stream 发送了一次 chReadEvent 信号,且这个 stream 没有再去读数据,才造成 session.bucket 为 0 这个 stream 没有去读数据的可能,一种是本来 cpu 很慢超时了,一种是被 Close 掉了。如果是 close 掉则也会在 session.sessionClosed 先增加 session.bucket 的 总结哈,算起来整个代码逻辑应该还是对的。如果真出现了这种情况,是不是 token 在哪个环节原子操作没处理好,丢失了 token,一点一点丢到最后空了? |
我是覺得蠻奇怪的 我有試著把 |
TestSmallBufferReadWrite 我测试一直都测试通过! if strings.Contains(err.Error(), "i/o timeout") { |
I am trying to follow along using Google Translate but am not sure that it's working. I am wondering if this discussion is pertinent to an issue that we're encountering. We are using static connections between peers. We set the KeepAliveInterval to 5 seconds and the KeepAliveTimeout to 16 seconds. Smux sessions are frequently dropping, despite good network conditions. We've traced the cause down to ping timeouts... could this be related? |
@jannson @WinstonPrivacy
When you set KeepAliveInterval to 5 seconds and the KeepAliveTimeout to 16 seconds. |
implement stream token, fix for fast-write-slow-read stream cause whole
session freeze, *NOT compatible* with old version
Is this a problem when KCPTun is used as the underlying connection?
When you set KeepAliveInterval to 5 seconds and the KeepAliveTimeout to
16 seconds.
The *old keepalive implement* only check if there have any frame at 16 * N
seconds,
and the ping frame will send at 5 * N seconds.
So, on *the 80 seconds*, will *send* a ping frame,
This is interesting. I was under the impression that with 5/16 as the
settings, three pings would be sent (once every 5 seconds) and if a
response wasn't received within 16 seconds, the session would be dropped.
In fact, we have some unit tests built around this which seem to support
the case. Perhaps I should go back and check them.
We just merged the most recent ping code (master) and this appears to have
stabilized our connections. We are still seeing very occasional timeouts
but it is much improved.
Given that this improvement is a breaking change, we'll have to think
carefully about whether to incorporate it.
…On Thu, Mar 21, 2019 at 5:24 PM cs8425 ***@***.***> wrote:
@jannson <https://github.com/jannson> 抱歉 最近比較忙一點
原本是想說用delay跟固定大小的資料
來達成讀/寫的速度控制
後來發現這種寫法好像不夠精準
無法反應實際情況
我還要再想想該怎寫....
@WinstonPrivacy <https://github.com/WinstonPrivacy>
Yes, this branch fix:
1. fix buggy keepalive implement, compatible with old version in some
condition
2. race between Parallel Stream.Write(), compatible with old version
3. implement stream token, fix for fast-write-slow-read stream cause
whole session freeze, *NOT compatible* with old version
When you set KeepAliveInterval to 5 seconds and the KeepAliveTimeout to 16
seconds.
The *old keepalive implement* only check if there have any frame at 16 *
N seconds,
and the ping frame will send at 5 * N seconds.
So, on *the 80 seconds*, will *send* a ping frame,
*and check* if there are any frame receive on the *same time*.
At this situation,
if you session is *idle*, only ping frame sending, and *RTT big enough*,
then smux will *drop the session*.
This is what you met.
I will push a new branch that only have new keepalive implement latter.
You can take a try, will need to update both server and client side.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#47 (comment)>, or mute
the thread
<https://github.com/notifications/unsubscribe-auth/Ai89gSS0KNUusstgXXmjsFovWhUWYA6jks5vZAa2gaJpZM4bmnpE>
.
|
@WinstonPrivacy 'fast-write-slow-read stream cause whole session freeze' issue not only on KCPTun, but also any program that use smux. |
breaking changes WILL NOT be merged, but discussion on how to to prevent streams from starving is ENCOURAGED, a compatible change by turning on new feature is also acceptable. |
@xtaci We have:
|
@cs8425 忙就暂停点时间,好事多磨。最近一直在使用你的分支体验,明显感觉是比 master 分支体验好,没有突然卡顿的现象。 |
the essence of this problem is |
But the question is that we can't do it without add new packet ID. |
@cs8425 今天有一个场景确实需要 CloseWrite/CloseRead 的功能,跟你说一声。之前我不知道在哪个地方回复说不需要。打脸了。 |
接續 #19
重新提交PR
TestSlowReadBlocking()
這個test可以復現 #18用
net.Pipe()
測試過程中有踩到類似 #37 的問題發送
cmdACK
跟cmdFUL
會卡住(已透過另外開個控制包專用channel解決)有空再試試可否透過test復現
另外改變了keepalive的實作方式
最明顯的改變是KeepAliveInterval可以大於KeepAliveTimeout
送一個包會等KeepAliveTimeout的長度再去確認無回應才斷線
如果回應在timeout之前收到
則會重設timeout跟下次發包的時機
換句話說
ping的時間點不再是固定的KeepAliveInterval
而是KeepAliveInterval + RTT
recvLoop()
的部份收到不明的cmd ID不再強制斷線
可以直接於SMUX這層加入雜訊且不干擾stream token的估測
或者提高之後引入新ID的相容性