Skip to content
This repository has been archived by the owner on Apr 22, 2023. It is now read-only.

test: address timing issues in simple http tests #9432

Closed
wants to merge 1 commit into from
Closed

test: address timing issues in simple http tests #9432

wants to merge 1 commit into from

Conversation

gireeshpunathil
Copy link
Member

simple tests test-http-request-end.js, test-http-default-encoding.js
hangs in AIX. The root cause for both the failures is related to the
timing with which packets are sent between the client and server.
On the client side, one factor that affects the timing is Nagle's
algorithm. With Nagle enabled there may be a delay between two packets
as the stack may wait until either:
a. An acknowledgement for the first packet is received, or
b. 200 ms elapses.
before sending the second packet.

Similarly at the server side 2 sequential packages can be delivered to
the application either together or separately.

On AIX we see that they are delivered separately to the server, while on
Linux delivered together. If we change the timing, for example disabling
Nagle on AIX we see the 2 packets delivered together and the tests pass.

In the test case simple/test-http-request-end.js, the client request
handler of the server receives and stores the data in a data callback,
closes the server in a request end callback, and writes to the client
and ends the response, in-line with the request receipt. An HTTP parser
module parses the incoming message, and invokes callback routines which
are registered for HTTP events (such as header, body, end etc.)

Because the termination sequence arrive in a separate packet, there is a
delay in parsing that message and identify that the client request ended
(and thereby invoke the request end call back handler). Due to this delay,
the response close happens first, which in-turn destroys the server
socket leading to the fd and watcher removal from the uv loop abandoning
further events on this connection, and end call back never being called,
causing the reported hang. simple/test-http-default-encoding.js suffers
from the same problem.

This pull request is to make the test case behave consistently,
irrespective of the timing window between the execution of asynchronous
call back and the synchronous termination of response, as well as
irrespective of the characteristics of the packet reception - coalesced
or in separate packets.

Adding a constant delay (such as 200 ms) before ending the server
response resolves he problem, but for program correctness and
consistency, the below approach is proposed - the request end callback
is a place where the request end is guaranteed, and is safe to write
back to the client.

@mhdawson
Copy link
Member

I had discussed this with Gireesh ealier. lgtm.

@mhdawson mhdawson added this to the 0.12.2 milestone Mar 18, 2015
@jasnell
Copy link
Member

jasnell commented Mar 18, 2015

+1. LGTM

@mhdawson mhdawson modified the milestones: 0.12.2, 0.12.3 Mar 19, 2015
@tjfontaine
Copy link

Thanks for the excellent write up, and for improving the reliability of our tests!

Frankly, I'm of the opinion that tests that have their own "exit early" timers should have those timers removed. So from that standpoint I would have you remove that timer. The test runner already has a timer and it should be the authority on nuking a test that's exceeding the timeout value.

If you want you can remove this timer now, or someone can come back later and do that, I'm fine either way.

Otherwise LGTM.

This does raise an important point, and I will file it as a separate issue, but we clearly have a need for (at least the test suite) to verify that our core data structures have indeed transitioned through their state machines. Arguably this we would want this at run time as well.

@gireeshpunathil
Copy link
Member Author

@tjfontaine, thanks for the comments and suggestion - I have removed the timer code.

@mhdawson
Copy link
Member

Ok. Given the approvals above I'll pull this in.

@mhdawson
Copy link
Member

Gireesh there are failing tests, I can see the current pull request that you missed one reference to timeout. Please update and validate that the modified test passes.

@mhdawson
Copy link
Member

Gireesh you'll also have to squash into 1 commit using git rebase -i

@gireeshpunathil
Copy link
Member Author

I did some mess-up with the rebase, let me correct those.

simple tests test-http-request-end.js, test-http-default-encoding.js
hangs in AIX. The root cause for both the failures is related to the
timing with which packets are sent between the client and server.
On the client side, one factor that affects the timing is Nagle's
algorithm. With Nagle enabled there may be a delay between two packets
as the stack may wait until either:
  a. An acknowledgement for the first packet is received, or
  b. 200 ms elapses.
before sending the second packet.

Similarly at the server side 2 sequential packages can be delivered to
the application either together or separatly.

On AIX we see that they are delivered separately to the server, while on
Linux delivered together. If we change the timing, for example disabling
Nagle on AIX we see the 2 packets delivered together and the tests pass.

In the test case simple/test-http-request-end.js, the client request
handler of the server receives and stores the data in a data callback,
closes the server in a request end callback, and writes to the client
and ends the response, in-line with the request receipt. An HTTP parser
module parses the incoming message, and invokes callback routines which
are registered for HTTP events (such as header, body, end etc.)

Because the termination sequence arrive in a separate packet, there is a
delay in parsing that message and identify that the client request ended
(and thereby invoke the request end call backhandler). Due to this delay,
the response close happens first, which in-turn destroys the server
socket leading to the fd and watcher removal from the uv loop abandoning
further events on this connection, and end call back never being called,
causing the reported hang.  simple/test-http-default-encoding.js suffers
from the same problem.

Also, remove the timer logic from the test case. Test harness anyways
contain a timer which controls the individual tests so remove such
controls from the test case, as suggested by @tjfontaine
@gireeshpunathil
Copy link
Member Author

@mdawsonibm, squashed the commits into one (7b07e9f), re-ran the tests successfully. Thanks.

mhdawson pushed a commit that referenced this pull request Mar 27, 2015
simple tests test-http-request-end.js, test-http-default-encoding.js
hangs in AIX. The root cause for both the failures is related to the
timing with which packets are sent between the client and server.
On the client side, one factor that affects the timing is Nagle's
algorithm. With Nagle enabled there may be a delay between two packets
as the stack may wait until either:
  a. An acknowledgement for the first packet is received, or
  b. 200 ms elapses.
before sending the second packet.

Similarly at the server side 2 sequential packages can be delivered to
the application either together or separatly.

On AIX we see that they are delivered separately to the server, while on
Linux delivered together. If we change the timing, for example disabling
Nagle on AIX we see the 2 packets delivered together and the tests pass.

In the test case simple/test-http-request-end.js, the client request
handler of the server receives and stores the data in a data callback,
closes the server in a request end callback, and writes to the client
and ends the response, in-line with the request receipt. An HTTP parser
module parses the incoming message, and invokes callback routines which
are registered for HTTP events (such as header, body, end etc.)

Because the termination sequence arrive in a separate packet, there is a
delay in parsing that message and identify that the client request ended
(and thereby invoke the request end call backhandler). Due to this delay,
the response close happens first, which in-turn destroys the server
socket leading to the fd and watcher removal from the uv loop abandoning
further events on this connection, and end call back never being called,
causing the reported hang.  simple/test-http-default-encoding.js suffers
from the same problem.

Also, remove the timer logic from the test case. Test harness anyways
contain a timer which controls the individual tests so remove such
controls from the test case, as suggested by @tjfontaine

Reviewed-By: Michael Dawson <michael_dawson@ca.ibm.com>
PR-URL: #9432
@mhdawson
Copy link
Member

Landed as 13e1131

@mhdawson mhdawson closed this Mar 31, 2015
brendanashworth pushed a commit to brendanashworth/io.js that referenced this pull request Aug 4, 2015
simple tests test-http-request-end.js, test-http-default-encoding.js
hangs in AIX. The root cause for both the failures is related to the
timing with which packets are sent between the client and server.
On the client side, one factor that affects the timing is Nagle's
algorithm. With Nagle enabled there may be a delay between two packets
as the stack may wait until either:
  a. An acknowledgement for the first packet is received, or
  b. 200 ms elapses.
before sending the second packet.

Similarly at the server side 2 sequential packages can be delivered to
the application either together or separatly.

On AIX we see that they are delivered separately to the server, while on
Linux delivered together. If we change the timing, for example disabling
Nagle on AIX we see the 2 packets delivered together and the tests pass.

In the test case simple/test-http-request-end.js, the client request
handler of the server receives and stores the data in a data callback,
closes the server in a request end callback, and writes to the client
and ends the response, in-line with the request receipt. An HTTP parser
module parses the incoming message, and invokes callback routines which
are registered for HTTP events (such as header, body, end etc.)

Because the termination sequence arrive in a separate packet, there is a
delay in parsing that message and identify that the client request ended
(and thereby invoke the request end call backhandler). Due to this delay,
the response close happens first, which in-turn destroys the server
socket leading to the fd and watcher removal from the uv loop abandoning
further events on this connection, and end call back never being called,
causing the reported hang.  simple/test-http-default-encoding.js suffers
from the same problem.

Also, remove the timer logic from the test case. Test harness anyways
contain a timer which controls the individual tests so remove such
controls from the test case, as suggested by @tjfontaine

Reviewed-By: Michael Dawson <michael_dawson@ca.ibm.com>
PR-URL: nodejs/node-v0.x-archive#9432

PORT-FROM: joyent/node @ 13e1131
PR-URL: nodejs#2294
Reviewed-By: Fedor Indutny <fedor.indutny@gmail.com>
brendanashworth pushed a commit to nodejs/node that referenced this pull request Aug 6, 2015
simple tests test-http-request-end.js, test-http-default-encoding.js
hangs in AIX. The root cause for both the failures is related to the
timing with which packets are sent between the client and server.
On the client side, one factor that affects the timing is Nagle's
algorithm. With Nagle enabled there may be a delay between two packets
as the stack may wait until either:
  a. An acknowledgement for the first packet is received, or
  b. 200 ms elapses.
before sending the second packet.

Similarly at the server side 2 sequential packages can be delivered to
the application either together or separatly.

On AIX we see that they are delivered separately to the server, while on
Linux delivered together. If we change the timing, for example disabling
Nagle on AIX we see the 2 packets delivered together and the tests pass.

In the test case simple/test-http-request-end.js, the client request
handler of the server receives and stores the data in a data callback,
closes the server in a request end callback, and writes to the client
and ends the response, in-line with the request receipt. An HTTP parser
module parses the incoming message, and invokes callback routines which
are registered for HTTP events (such as header, body, end etc.)

Because the termination sequence arrive in a separate packet, there is a
delay in parsing that message and identify that the client request ended
(and thereby invoke the request end call backhandler). Due to this delay,
the response close happens first, which in-turn destroys the server
socket leading to the fd and watcher removal from the uv loop abandoning
further events on this connection, and end call back never being called,
causing the reported hang.  simple/test-http-default-encoding.js suffers
from the same problem.

Also, remove the timer logic from the test case. Test harness anyways
contain a timer which controls the individual tests so remove such
controls from the test case, as suggested by @tjfontaine

Reviewed-By: Michael Dawson <michael_dawson@ca.ibm.com>
PR-URL: nodejs/node-v0.x-archive#9432

PORT-FROM: joyent/node @ 13e1131
PR-URL: #2294
Reviewed-By: Fedor Indutny <fedor.indutny@gmail.com>
Reviewed-By: Ben Noordhuis <info@bnoordhuis.nl>
Fishrock123 pushed a commit to Fishrock123/node that referenced this pull request Aug 11, 2015
simple tests test-http-request-end.js, test-http-default-encoding.js
hangs in AIX. The root cause for both the failures is related to the
timing with which packets are sent between the client and server.
On the client side, one factor that affects the timing is Nagle's
algorithm. With Nagle enabled there may be a delay between two packets
as the stack may wait until either:
  a. An acknowledgement for the first packet is received, or
  b. 200 ms elapses.
before sending the second packet.

Similarly at the server side 2 sequential packages can be delivered to
the application either together or separatly.

On AIX we see that they are delivered separately to the server, while on
Linux delivered together. If we change the timing, for example disabling
Nagle on AIX we see the 2 packets delivered together and the tests pass.

In the test case simple/test-http-request-end.js, the client request
handler of the server receives and stores the data in a data callback,
closes the server in a request end callback, and writes to the client
and ends the response, in-line with the request receipt. An HTTP parser
module parses the incoming message, and invokes callback routines which
are registered for HTTP events (such as header, body, end etc.)

Because the termination sequence arrive in a separate packet, there is a
delay in parsing that message and identify that the client request ended
(and thereby invoke the request end call backhandler). Due to this delay,
the response close happens first, which in-turn destroys the server
socket leading to the fd and watcher removal from the uv loop abandoning
further events on this connection, and end call back never being called,
causing the reported hang.  simple/test-http-default-encoding.js suffers
from the same problem.

Also, remove the timer logic from the test case. Test harness anyways
contain a timer which controls the individual tests so remove such
controls from the test case, as suggested by @tjfontaine

Reviewed-By: Michael Dawson <michael_dawson@ca.ibm.com>
PR-URL: nodejs/node-v0.x-archive#9432

PORT-FROM: joyent/node @ 13e1131
PR-URL: nodejs#2294
Reviewed-By: Fedor Indutny <fedor.indutny@gmail.com>
Reviewed-By: Ben Noordhuis <info@bnoordhuis.nl>
rvagg pushed a commit to nodejs/node that referenced this pull request Aug 17, 2015
simple tests test-http-request-end.js, test-http-default-encoding.js
hangs in AIX. The root cause for both the failures is related to the
timing with which packets are sent between the client and server.
On the client side, one factor that affects the timing is Nagle's
algorithm. With Nagle enabled there may be a delay between two packets
as the stack may wait until either:
  a. An acknowledgement for the first packet is received, or
  b. 200 ms elapses.
before sending the second packet.

Similarly at the server side 2 sequential packages can be delivered to
the application either together or separatly.

On AIX we see that they are delivered separately to the server, while on
Linux delivered together. If we change the timing, for example disabling
Nagle on AIX we see the 2 packets delivered together and the tests pass.

In the test case simple/test-http-request-end.js, the client request
handler of the server receives and stores the data in a data callback,
closes the server in a request end callback, and writes to the client
and ends the response, in-line with the request receipt. An HTTP parser
module parses the incoming message, and invokes callback routines which
are registered for HTTP events (such as header, body, end etc.)

Because the termination sequence arrive in a separate packet, there is a
delay in parsing that message and identify that the client request ended
(and thereby invoke the request end call backhandler). Due to this delay,
the response close happens first, which in-turn destroys the server
socket leading to the fd and watcher removal from the uv loop abandoning
further events on this connection, and end call back never being called,
causing the reported hang.  simple/test-http-default-encoding.js suffers
from the same problem.

Also, remove the timer logic from the test case. Test harness anyways
contain a timer which controls the individual tests so remove such
controls from the test case, as suggested by @tjfontaine

Reviewed-By: Michael Dawson <michael_dawson@ca.ibm.com>
PR-URL: nodejs/node-v0.x-archive#9432

PORT-FROM: joyent/node @ 13e1131
PR-URL: #2294
Reviewed-By: Fedor Indutny <fedor.indutny@gmail.com>
Reviewed-By: Ben Noordhuis <info@bnoordhuis.nl>
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants