connect_timeout and inactivity_timeout not working with eventmachine 1.0.1 #222

kenn · 2013-03-08T04:07:07Z

Here's the source to reproduce the problem:

# gem 'eventmachine', '1.0.0'
require 'em-http-request'

puts EM::VERSION

EM.run do
  start_at = Time.now
  url = 'http://bot.lingr.com/delay/6'
  http = EventMachine::HttpRequest.new(url, :connect_timeout => 2, :inactivity_timeout => 2).head
  http.callback do
    puts "success! elapsed #{Time.now - start_at}"
    EM.stop
  end
  http.errback do
    puts "error! elapsed #{Time.now - start_at}"
    EM.stop
  end
end

The server returns a response in 6 seconds, while the timeout values are set an earlier point.

When I run this, I get

1.0.1
success! elapsed 6.280314

As you can see, it's not timing out and wait (indefinitely) until we get the response.

However, when I activate the gem pinning to 1.0.0, then run

1.0.0
error! elapsed 4.003471

So it times out after 4 seconds as expected.

Probably it's a regression in EventMachine 1.0.1 but I wasn't able to find out what change is causing this. Any advice on fix / workaround?

The text was updated successfully, but these errors were encountered:

igrigorik · 2013-03-08T05:46:53Z

Nope, needs to fixed in EM. /cc @tmm1 ... Short of adding extra timers to handle this yourself, this is not something em-http should address.

tmm1 · 2013-03-08T05:50:27Z

The only related commit in 1.0.1 is eventmachine/eventmachine@3c39365 (eventmachine/eventmachine#374). Can you try with and without that commit?

kenn · 2013-03-08T06:19:33Z

ok, I tried and confirmed that reverting eventmachine/eventmachine@3c39365 fixed the problem. So, it seems like a regression indeed.

tmm1 · 2013-03-08T06:46:12Z

Interesting. If you use :connect_timeout => 2, :inactivity_timeout => 1, does it time out in 3 seconds?

kenn · 2013-03-08T07:57:42Z

Here's the result matrix - I find it's quite interesting indeed.

Note that I built "fixed" EM as 1.0.1.1, and used http://bot.lingr.com/delay/8 as the endpoint so that the server returns a response in 8 seconds.

connect_timeout inactivity_timeout    1.0.0     1.0.1    1.0.1.1
              1                  1    2.019553  success  2.040749
              1                  2    3.018771  success  3.003296
              1                  3    4.002432  success  4.003432
              1                  4    5.019328  success  5.017295
              2                  1    2.021929  success  2.003186
              2                  2    4.020711  success  4.003911
              2                  3    5.023549  success  5.020934
              2                  4    6.018697  success  6.271681
              3                  1    2.003149  success  2.022989
              3                  2    4.020217  success  4.003389
              3                  3    6.003739  success  6.019589
              3                  4    7.002843  success  7.003071

tmm1 · 2013-03-08T09:10:17Z

Ok, this is clearly broken in 1.0.1. There's even a failing test on master. I'll try to come up with a solution that fixes the test and the original issue in eventmachine/eventmachine#374

See also #149

tmm1 · 2013-03-08T09:18:38Z

I believe this patch will fix all the timeout weirdness. Can you try your combination test again?

diff --git a/ext/ed.cpp b/ext/ed.cpp
index 9ebd4b1..48f3c5c 100644
--- a/ext/ed.cpp
+++ b/ext/ed.cpp
@@ -460,8 +460,7 @@ ConnectionDescriptor::SetConnectPending
 void ConnectionDescriptor::SetConnectPending(bool f)
 {
        bConnectPending = f;
-       if (f == false && NextHeartbeat)
-               MyEventMachine->ClearHeartbeat(NextHeartbeat, this);
+       MyEventMachine->QueueHeartbeat(this);
        _UpdateEvents();
 }

this reverts a change made for #374, which inadvertently disabled inactivity timeouts altogether. instead we always QueueHeartbeat when bConnectPending changes, so GetNextHeartbeat can calculate a timeout based on the new connection state. i confirmed this fixes the broken test on master, and the original "20s delay" bug (using the repro in #393). this should also address the timeout stacking issue reported in igrigorik/em-http-request#149, as well as the original report in igrigorik/em-http-request#222

kenn · 2013-03-08T10:52:05Z

@tmm1 Good news - it worked! Also, the most accurate timeout ever - results are almost exactly the same as inactivity_timeout. 👍

connect_timeout inactivity_timeout    1.0.0     1.0.1    1.0.1.1   1.0.1.2
              1                  1    2.019553  success  2.040749  1.134821
              1                  2    3.018771  success  3.003296  2.100689
              1                  3    4.002432  success  4.003432  3.098528
              1                  4    5.019328  success  5.017295  4.095405
              2                  1    2.021929  success  2.003186  1.106963
              2                  2    4.020711  success  4.003911  2.130382
              2                  3    5.023549  success  5.020934  3.09778
              2                  4    6.018697  success  6.271681  4.099069
              3                  1    2.003149  success  2.022989  1.122953
              3                  2    4.020217  success  4.003389  2.097198
              3                  3    6.003739  success  6.019589  3.096631
              3                  4    7.002843  success  7.003071  4.098878

tmm1 · 2013-03-08T12:28:57Z

Excellent. I've released EM v1.0.2.

igrigorik · 2013-03-08T18:12:46Z

Awesome - great job guys! I take it, we can close this bug then? :)

kenn · 2013-03-08T18:15:40Z

@tmm1 @igrigorik I've just deployed EM v1.0.3 to production and confirmed the problems is fixed for real. Thank you!

# Changelog ## 1.0.x * EM.system was broken in 1.0.2 release [#413] ## 1.0.2 (March 8, 2013) * binary win32 gems now include fastfilereader shim [#222] * fix long-standing connection timeout issues [27fdd5b, igrigorik/em-http-request#222] * http and line protocol cleanups [#193, #151] * reactor return value cleanup [#225] * fix double require from gemspec [#284] * fix smtp server reset behavior [#351] * fix EM.system argument handling [#322] * ruby 1.9 compat in smtp server and stomp protocols [#349, #315] * fix pause from post_init [#380] ## 1.0.1 (February 27, 2013) * use rb_wait_for_single_fd() on ruby 2.0 to fix rb_thread_select() deprecation * fix epoll/kqueue mode in ruby 2.0 by removing calls to rb_enable_interrupt() [#248, #389] * fix memory leak when verifying ssl cerificates [#403] * fix initial connection delay [#393, #374]

kenn closed this as completed Mar 8, 2013

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

connect_timeout and inactivity_timeout not working with eventmachine 1.0.1 #222

connect_timeout and inactivity_timeout not working with eventmachine 1.0.1 #222

kenn commented Mar 8, 2013

igrigorik commented Mar 8, 2013

tmm1 commented Mar 8, 2013

kenn commented Mar 8, 2013

tmm1 commented Mar 8, 2013

kenn commented Mar 8, 2013

tmm1 commented Mar 8, 2013

tmm1 commented Mar 8, 2013

kenn commented Mar 8, 2013

tmm1 commented Mar 8, 2013

igrigorik commented Mar 8, 2013

kenn commented Mar 8, 2013

connect_timeout and inactivity_timeout not working with eventmachine 1.0.1 #222

connect_timeout and inactivity_timeout not working with eventmachine 1.0.1 #222

Comments

kenn commented Mar 8, 2013

igrigorik commented Mar 8, 2013

tmm1 commented Mar 8, 2013

kenn commented Mar 8, 2013

tmm1 commented Mar 8, 2013

kenn commented Mar 8, 2013

tmm1 commented Mar 8, 2013

tmm1 commented Mar 8, 2013

kenn commented Mar 8, 2013

tmm1 commented Mar 8, 2013

igrigorik commented Mar 8, 2013

kenn commented Mar 8, 2013