Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

More info for debugging EOFErrors #199

Closed
rofinn opened this issue Feb 16, 2018 · 5 comments · Fixed by #236
Closed

More info for debugging EOFErrors #199

rofinn opened this issue Feb 16, 2018 · 5 comments · Fixed by #236

Comments

@rofinn
Copy link

rofinn commented Feb 16, 2018

I keep hitting random EOFErrors while running an HTTP.post request against a server. Is there a way to get HTTP.jl to print out whatever it found prior to getting the EOFError?

@samoconnor
Copy link
Contributor

Hi @rofinn,

Passing verbose=2 to HTTP.post will give you more debug info.

- `verbose = 0`, set to `1` or `2` for extra message logging.

Editing DEBUG_LEVEL in src/HTTP.jl to 1, 2, 3 or 4 will also yield more info.

const DEBUG_LEVEL = 0

Please copy/paste the debug output, the stack trace from the EOFError, and the arguments you're passing to HTTP.post here and I'll take a look.

@rofinn
Copy link
Author

rofinn commented Feb 16, 2018

Awesome, thanks. Setting the DEBUG_LEVEL to 4 gives me the following on a failed post request.

DEBUG: a345 AWS4 canonical_form: POST
/Main/query

content-length:50
content-md5:3rb1HT7+W0ovAXGBuQhUCQ==
content-type:application/json
host:xxxxxx.execute-api.us-east-1.amazonaws.com
x-amz-content-sha256:<sha256>
x-amz-date:20180216T023751Z

content-length;content-md5;content-type;host;x-amz-content-sha256;x-amz-date
<sha256>
DEBUG: a345 AWS4 string_to_sign: AWS4-HMAC-SHA256
20180216T023751Z
20180216/us-east-1/execute-api/aws4_request
xxxxxxxxxxxxxxxxxxxxx
DEBUG: a345 AWS4 signature: <Signature>
DEBUG: a345 ♻️  Idle:           🔗    1↑     1↓    xxxxxx.us-east-1.amazonaws.com:443:52144 ≣16 RawFD(27)
DEBUG: a345 👁  Start write:T1  🔗    1↑     1↓    xxxxxx.execute-api.us-east-1.amazonaws.com:443:52144 ≣16 RawFD(27)
HTTP.Messages.Request:
"""
POST /Main/query HTTP/1.1
Content-Type: application/json
Host: xxxxxx.execute-api.us-east-1.amazonaws.com
Content-Length: 50
x-amz-content-sha256: <sha256>
x-amz-date: 20180216T023751Z
Content-MD5: 3rb1HT7+W0ovAXGBuQhUCQ==
Authorization: AWS4-HMAC-SHA256 Credential=AKIAXXXXXXXXXXXX/20180216/us-east-1/execute-api/aws4_request, SignedHeaders=content-length;content-md5;content-type;host;x-amz-content-sha256;x-amz-date, Signature=<sha256>

{"QueryId":"xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxx"}"""
DEBUG: a345 👁  Start read: T1  🔗    1↑🔒    1↓    xxxxxx.execute-api.us-east-1.amazonaws.com:443:52144 ≣16 RawFD(27)
DEBUG: dce9 🔒  POST non-idempotent, holding write lock: T1  🔗    1↑🔒    1↓    xxxxxx.execute-api.us-east-1.amazonaws.com:443:52144 ≣16 RawFD(27)
DEBUG: a345 eof(::Transaction) -> eof(typeof(c.io)): T1  🔜💀   1↑🔒    1↓🔒   xxxxxxx.execute-api.us-east-1.amazonaws.com:443:52144 ≣16 RawFD(27)
DEBUG: a345 ❗️  ConnectionLayer EOFError(). Closing: T1  💀    1↑🔒    1↓🔒   xxxxxx.execute-api.us-east-1.amazonaws.com:443:52144 ≣16 RawFD(-1)
DEBUG: a345 🗣  Write done: T1  💀    2↑     1↓🔒   xxxxxx.execute-api.us-east-1.amazonaws.com:443:52144 ≣16 RawFD(-1)
DEBUG: a345 ✉️  Read done:  T1  💀    2↑     2↓    xxxxxx.execute-api.us-east-1.amazonaws.com:443:52144 ≣16 RawFD(-1)
DEBUG: a345 🚷  No Retry: POST /Main/query HTTP/1.1, POST non-idempotent

@samoconnor
Copy link
Contributor

samoconnor commented Feb 19, 2018

Hi @rofinn,

I think I understand what is going on here.

♻️ Idle: 🔗 1↑ 1↓
There is a pooled connection that has been sitting idle for a while.

Start write:T1 🔗 1↑ 1↓
The request is sent out...

eof(::Transaction)...: T1 🔜💀 1↑🔒 1↓🔒
When the read loop calls eof() we see that the socket has been closed by the server (probably a some time ago, but we only just noticed)`.

🚷 No Retry: POST /Main/query HTTP/1.1, POST non-idempotent
The RetryLayer refuses to retry because POST is a non-idempotent method.
"MUST NOT automatically retry a request with a non-idempotent method"
https://tools.ietf.org/html/rfc7230#section-6.3.1

In this case, it may be possible to figure out that the request was never really sent, so it is safe to retry it, however, it may not be generally possible. There may be an edge case where the connection failure happens after the request is sent, but the failure looks the same as this, in which case we shouldn't retry automatically per section 6.3.1.

In this case, if you know at the application level that your requests are safe to retry (i.e. you won't be depositing money in the account twice by retrying), then you should either have an application-level retry loop that retries on EOFError, or you could pass retry_non_idempotent=true to HTTP.post.

isrecoverable(e, req, retry_non_idempotent) =
isrecoverable(e) &&
!(req.body === body_was_streamed) &&
!(req.response.body === body_was_streamed) &&
(retry_non_idempotent || isidempotent(req))
# "MUST NOT automatically retry a request with a non-idempotent method"
# https://tools.ietf.org/html/rfc7230#section-6.3.1

I have reproduced the behaviour you were seeing and confirmed that passing retry_non_idempotent=true resolves the problem:

DEBUG: 4e1d 👁  Start read: T13 🔗   13🔒   13
DEBUG: 1651 🔒  POST non-idempotent, holding write lock: T13 🔗   13🔒   13
DEBUG: 4e1d ❗️  ConnectionLayer EOFError(). Closing: T13 💀   13🔒   13🔒 
DEBUG: 4e1d 🗣  Write done: T13 💀   14    13🔒   
DEBUG: 4e1d ✉️  Read done:  T13 💀   14    14
DEBUG: 4e1d 🔄  Retry IOError(EOFError() during request()
DEBUG: 4e1d 🗑  Deleted:        💀   14    14
DEBUG: 4e1d SSL connect: 
DEBUG: 4e1d TCP connect:
DEBUG: 4e1d 🔗  New:0     0
DEBUG: 4e1d 👁  Start write:T0  ⏸    0     0
➡️   "POST /Main/query HTTP/1.1\r\n"

@samoconnor
Copy link
Contributor

Also note: I've added a verbose = 3 option that prints out everything that is send/received. https://github.com/JuliaWeb/HTTP.jl/blob/master/src/DebugRequest.jl

@rofinn
Copy link
Author

rofinn commented Feb 20, 2018

Awesome, thanks for the info @samoconnor. It would be nice if we could distinguish between the normal timing out of idle connections vs an eof condition (maybe via some kind of client side timeout setting on the connection pool?), but this should be enough to solve my issues.

@rofinn rofinn closed this as completed Feb 20, 2018
vtjnash pushed a commit that referenced this issue Apr 17, 2018
When a connection is returned to the (read) pool,
add a monitor to it for receiving unexpected data (or EOF),
and kill / close the Connection object if any activity occurs
before the next write (when it should have simply been waiting idle in the pool)

per JuliaLang/MbedTLS.jl#145 (comment)

closes #214
closes #199
closes #220
closes JuliaWeb/GitHub.jl#106
quinnj pushed a commit that referenced this issue May 31, 2018
* ConnectionPool: monitor idle connections

When a connection is returned to the (read) pool,
add a monitor to it for receiving unexpected data (or EOF),
and kill / close the Connection object if any activity occurs
before the next write (when it should have simply been waiting idle in the pool)

per JuliaLang/MbedTLS.jl#145 (comment)

closes #214
closes #199
closes #220
closes JuliaWeb/GitHub.jl#106

*  - Encapsulate read|writebusy/sequence/count logic in new isbusy function.
 - Move close() on eof() || !isbusy() to new monitor_idle_connection function.
 - Make monitor_idle_connection() a noop for ::Connection{SSLContext}

* require Julia 0.6.3 #236 (comment)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants