-
Notifications
You must be signed in to change notification settings - Fork 17.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
net: enable TCP keepalive on new connections from net.Dial #23459
Comments
Seems like a good idea. More concretely, I think the |
But the default behave should be once the connection read something successful, the read deadline reset base on current time add the timeout. So that |
See Slowloris. |
@cznic interesting. The default network timeout will make the simple download very large files task more complex. So we can not use the same/simple configure for download very large files and avoid Slowloris problem. And I am also wrong about write timeout. The write timeout should be useful as the server send data to a disconnected client. The read/write deadline reset base on current time add the timeout is bad in some cases when consider Slowloris attack. |
I'll claim that all things should have timeouts, but some may need to be set to specific values instead of what you might prefer. I tend to use the idiom
And thanks, by the way: I'm now off to make sure all my library timeouts are set properly |
The net API doesn't have timeouts at all right now. It has deadlines, which are simpler in various ways but not really great for the problem you described. Would it be enough to set TCP keepalive on by default, as we do in the HTTP server, instead of setting some kind of persistent timeout in Go itself? |
As a possible smaller change, I think it may be worth considering adding a variant of |
Just to frame my post in a little bit more of context: servers don't just accept connections, they also open connections - both in response to events (e.g. incoming requests) as well as independently as part of their lifecycles. That is to say, the "listener side" is certainly in scope but it's not the full picture. The point I was making in my initial proposal is more about the general approach/policy rather than about how to go implement it... especially since I have the feeling that a "holistic solution" will be out of reach in go1 (as @rsc's comment seem to imply) Maybe the go1 implementation concerns should be split into a separate proposal? Or maybe this proposal would be about the "general approach/policy", and then (if an agreement is reached) we can open go1 and go2 issues to discuss how to implement the policy? |
Yes, let's discuss a general approach first. |
FWIW, this is one of the problems which the One problem is that it's currently quite difficult to hook a context's cancellation signal to low-level network operations' deadline-based cancellation; see #20280, and in particular the post linked from @zombiezen's comment at #20280 (comment) |
@neild agreed, although if you replace "add an explicit timeout" with "correctly wire a context" the point (that the default for network ops should not be "wait forever") still stands:
|
Let's enable TCP keepalive by default and then see what problems still remain to be solved. It seems like TCP keepalives would handle essentially all "accidental" (not malicious) long-lived connections. Does anyone object to enabling TCP keepalive by default? |
I have no strong opinion on enabling TCP keepalive by default. The following is just random thoughts.
|
@rsc I am in favor of enabling TCP keepalives by default. I actually think they will solve part of the problem but, just to clarify, I don't think we can say that
TCP keepalives are normally handled by the TCP implementation in the kernel, so a "stuck" application would not be detected (until it's killed or crashes). Also TCP keepalives have the downside of being applicable only to TCP. 😅 These are the reasons I think starting with TCP keepalives is a good first step. |
OK, let's do TCP keepalives on by default. |
Great! Should I also open a followup go2 proposal to review the net APIs in light of what was discussed in this issue? |
I was giving this a shot but I ran into something unexpected. Currently in Not being able to control this parameter means we can't precisely set the target total time before the keepalive mechanism considers the connection broken, since the total time is (initial delay + interval * count) and in go initial delay == interval. What is the desired course of action here?
[1] As a side note, it seems the openbsd support for keepalives states that openbsd doesn't support keepalives, but this does not seem to be the case. |
@rsc if you have nothing against this, I will try to go with the third and simplest option ("use the same interval regardless of the platform") |
Change https://golang.org/cl/107196 mentions this issue: |
This is just the first step in attempting to make all network connection have timeouts as a "safe default". TCP keepalives only protect against certain classes of network and host issues (e.g. server/OS crash), but do nothing against application-level issues (e.g. an application that accepts connections but then fails to serve requests). The actual keep-alive duration (15s) is chosen to cause broken connections to be closed after 2~3 minutes (depending on the OS, see golang#23549 for details). We don't make the actual default value part of the public API for a number of reasons: - because it's not very useful by itself: as discussed in golang#23549 the actual "timeout" after which the connection is torn down is duration*(KEEPCNT+1), and we use the OS-wide value for KEEPCNT because there's currently no way to set it from Go. - because it may change in the future: if users need to rely on a specific value they should explicitly set this value instead of relying on the default. Fixes golang#23459 Change-Id: I348c03be97588d5001e6de0f377e7a93b51957fd
Does using net.ipv4.tcp_keepalive_time as the default keepalives interval sound like a good idea to you? |
@zhyon404 we don't do that (and this issue is already closed, so you may want to open a new one if you found a different problem...) Update: are you suggesting we should be using the OS default? |
In our experience, missing network timeouts are one of the first sources of resiliency bugs in server applications. Unfortunately, these bugs are rare to experience during development and are not very easy to write tests for. As a result, we often see components hanging because somewhere in a dependency three layers down someone forgot to add an explicit timeout to a network connection/request/call. This is the result of many factors, one being that many languages (Go included) does not have default timeouts for network operations even though having a default timeout would align better with the common case (how many times do network operations require to not have timeouts?)
I realize this may be seen as an over-reaction to the problem, but I would argue that if the Go standard packages had default timeouts set on all network operations a whole class of resiliency problems would, for the majority of uses, disappear.
A couple of notes:
To summarize the proposal:
http.DefaultClient
)I'm unsure if this would break go1 compat, so feel free to classify as Go2 if required.
The text was updated successfully, but these errors were encountered: