Avoid reporting socket errors via Sentry observer #1026
Merged
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
After merging #1009, we started getting a small number of socket errors reported via Sentry:
I traced this down to two potential causes, neither of which are user-impacting:
Cause 1: The Sentry observer reports unhandled exceptions from greenlets, even though we handle the exceptions
The Sentry observer hooks into gevent's
print_exception
function in order to report all unhandled exceptions to Sentry:baseplate.py/baseplate/observers/sentry.py
Lines 123 to 131 in efd4459
This means that unhandled errors will always get reported via Sentry, even though the
read_requestline
function is intended to raise socket errors which are handled in the calling function:https://github.com/gevent/gevent/blob/a50ca62df1bb84c2960ba34c655f18129d845b19/src/gevent/pywsgi.py#L769-L776
(
socket.error
is an alias forOSError
, so this catches allOSError
exceptions and uses them as a signal to close the connection.)To fix this, I wrapped the function so that the greenlet returns a tuple
(result, exception)
rather than raising the exception. This no longer gets reported to Sentry.Cause 2: There's a possible race condition where the socket can be closed before the
read_requestline()
greenlet runsImagine the shutdown event becomes set. The "main" greenlet for the connection does the below:
baseplate.py/baseplate/server/wsgi.py
Lines 80 to 83 in efd4459
Returning
None
causes the caller to close the connection which is normally not a problem, but depending on the order the greenlet execution happens and when thekill
signal is delivered, it could close the connection while theread_requestline
greenlet is still running, which causes it to try to read from a closed socket.I was able to reproduce this via adding
gevent.sleep()
s to force the race condition, but I suspect this is not the cause of the production errors. To fix this, I added ajoin()
on theread_requestline
greenlet to ensure it finishes executing before we return.Validation
I was able to reproduce both of the above cases locally using the test I added, but it's difficult to consistently reproduce the issues in a test since it's hard to force race conditions without adding
gevent.sleep()
in random places.