-
Notifications
You must be signed in to change notification settings - Fork 29
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
TCPWriter: Close connections on network timeout errors #267
Conversation
Generate changelog in
|
@@ -106,7 +106,7 @@ func (d *TCPWriter) Write(p []byte) (n int, err error) { | |||
n, err := conn.Write(envelope[total:]) | |||
total += n | |||
if err != nil { | |||
if nerr, ok := err.(net.Error); !(ok && (nerr.Temporary() || nerr.Timeout())) { | |||
if nerr, ok := err.(net.Error); !ok || !nerr.Temporary() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am slightly nervous that a timeout error could also be considered temporary, but looking at the tls.Conn code it doesn't seem like that is the case. Mind double checking this logic?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just went ahead and added the explicit check for a timeout as well to be safe.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 2 of 3 files at r2, 1 of 1 files at r3.
Reviewable status: all files reviewed, 1 unresolved discussion (waiting on @tabboud)
Before this PR
Connections could get stuck in a "connection timed out" state and the underlying net.Conn would never be reset causing logs to be dropped.
According to the docs for
tls.Conn.SetDeadline()
(https://pkg.go.dev/crypto/tls#Conn.SetDeadline), after a write has timed out once, the TLS state is corrupt and thus the same error is returned. My hypothesis is that once we hit the timeout, we got stuck in this state since we never closed the connection.After this PR
==COMMIT_MSG==
Close TCP connections if there are network timeouts
==COMMIT_MSG==
Notes
I wasn't able to repro the exact issues we were seeing below, since the
connection timed out
error comes from the ETIMEDOUT syscall on Linux.However when digging into
syscall.ETIMEDOUT
, I realized it happens to implement thenet.Error
interface and returns true forTimeout()
and ALSO returns true forTemporary()
if there was a timeout!Relevant issue: golang/go#31449
Possible downsides?
This change is