-
Notifications
You must be signed in to change notification settings - Fork 80
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use a socket timeout instead of Ruby's timeout #121
Comments
Thanks for the report. We will look into this. My personal recollection is that IO.select was used at one time (a number of years ago), but was abandoned because: a) It failed on Windows, or Out of curiosity, what profiling tool(s) were used to obtain your results ? |
The gem that I used is MemoryProfiler. So far, it has been the best gem that I can find for memory profiling. That is a shame if |
is called five places in gem code ( ignoring unit tests ). For each of:
Are your results 'localized', or for all calls ? |
Unfortunately, the results that I was looking at were generalized to For the code that I was running, I was persisting the client and looping over a set of messages. Most likely, because of how this was running, connect and start timeout would be negligible. The code that would be executed most is frame parse and socket read. I did memory profile a modified version of my code that reconnected for every message that was sent. This caused the allocated memory to triple that of a persisted connection. I can attempt to rerun a profile excluding Ruby source code. I do not know if this will have the effect of hiding the results from Ruby source or if it will cause all of the results to point one level higher. As far as I know, there should be socket level settings to handle the timeout of connect and read. I would think there is also a socket setting for frame parse. There is also a socket setting for connect timeout. Looking at the code, I assume that the main reason for the start timeout in |
I have not forgotten about this. I spent some time this week end starting to put together a test bed, in order to get some before and after results. I confused myself for a while because of the similarity of gem names:
In any case I would like a little more information from you.
I really think I need to know a little more about the overall design and implementation of your application. Is the application source available to the public? Your statement that TImeout.timeout was written in Ruby surprised me. I am looking at a ruby 2.3.2p113 system and you are indeed correct. My surprise was because I have looked at Timeout.timeout before, a few years ago. At one point (1.8 or 1.9 maybe) I am pretty confident it was written in C. I wonder why the Ruby developers changed that. |
Unfortunately the project that I working on is closed source. I will work on putting together a generic implementation of the code to give to you. To give you some more information on this, the reason that I started looking into this is that we found that our ActiveMQ workers were leaking memory. Running memory_profiler, the retained memory pointed to Without a more detailed example right now, the closest example to our setup is the publisher example. We initialize a connection, then rely on rescue to retrieve messages on a queue that will be sent out.
We publish a message and wait for a receipt. The only messages received by this worker are receipts. The implementation is: receipt = false
client.publish(destination, message, headers) { |r| receipt = r }
Thread.pass until receipt
Here is an example of our connection parameters Stomp::Client.new(:connect_headers=>{"accept-version"=>"1.1", "host"=>"host", "heart-beat"=>"15000,20000"},
:hosts=>[{:host=>"host", :port=>port, :login=>"login", :passcode=>"passcode"}, {:host=>"host", :port=>port, :login=>"login", :passcode=>"passcode"}],
:reliable=>true,
:randomize=>false,
:closed_check=>true,
:initial_reconnect_delay=>1,
:max_reconnect_delay=>30,
:max_reconnect_attempts=>0,
:max_hbread_fails=>2,
:max_hbrlck_fails=>2,
:fast_hbs_adjust=>0.1,
:use_exponential_back_off=>false,
:connect_timeout=>5,
:start_timeout=>nil,
:logger=> #<StompLogger>)
I had to open up resque and wrap a chunk of their work code in memory_profiler. I will work on getting a better example.
5.13.3
Average number of messages sent in a week is 3.4 million. Which comes to about 20,000 per hour, 337 per minute, and 5.6 per second. During busy times we can peak closer to 40,000 messages per hour.
This may take some digging to get a better number. Without using a networking tool to get the exact size, the contents of one of our messages is about 4K of text. It is a json message with about 196 fields. I hope that this helps. I will work on putting a generic version of this code to post. |
Very good information, all that will be helpful, I am sure. Thanks much. You say:
How does the worker get data to send ? If you can supply a little more technical detail I would appreciate that. I am thinking: file? dbus? zmq? .... how? You also say:
And later:
A very simple example of the memory_profiler wrapping would be very good. I think I need that. I also need to think about your use of resque. That might (or might not) change what I need to be thinking about regarding this issue. And last but not least, thanks for the detailed connect parameters. That alone has given me several items to think about. Thanks, Guy |
You will get tired of my questions. Is yours a Rails application? |
More questions are a good thing. I found out that the average message size is 15993 bytes (15.62K), the minimum message size is 1292 bytes, and the maximum message size is 67782 bytes. Yes, this is a Rails application. We are running Ruby 2.2.4 with Rails 3.2.22.2 (working on upgrading to 4.2.6) and Resque 1.25.2
We get the data from Resque and Redis. In our application we will build a message, enqueue it onto a Redis queue and let Resque pick up the work for the worker. We did this to move direct communication with ActiveMQ out of our web app and web services.
We are experiencing what I believe to be 2 issues.
I am putting together a couple of examples now. I am hoping to post them soon. |
Here is a gist of a generic version of our code. The code will not run exactly as is, but I think it is close enough to see what is going on. Here is a modified version of the publisher example that uses memory profiler and a similar set up to our code. I think that this would be able to show the same issues. Let me know if you have any other questions. |
Thanks much. I have not looked at those yet ..... but I will. |
So ...... a brief update. I set up a test bed for this. And it works at this point. The good news is that I can recreate the effect that you are seeing, that is the memory profiler reports extremely high numbers for file timeout.rb. I went a bit overboard and have tested with both memory profiler gems (the underscore one which you are using, and the one with a dash in the name). I want very much to understand why I am seeing what I see. Each call to Timeout::timeout has a specific purpose in the gem. Consider:
There are a couple of other calls in the unit tests, under the test directory. I am ignoring those. I know these are the only calls because of:
This leads me to think that the extreme timeout memory use must be triggered by ..... the 'receipt listener' logic is a guess at this point. I want to investigate some more, and report again later. |
Just to note that the memory balloon is confirmed. I can recreate using a Client#publish, and supplying a receipt listener. If no receipt listener is used Timeout memory is much smaller. I wanted to rule out the threading logic in Client. And I did that using Connection#publish. If the #publish headers ask for a receipt, and an immediate call to Connection#receive is used to retrieve the RECEIPT frame, memory use increases considerably. Timeout is at the top again. At this point I will consider exactly what would need to change to eliminate the memory explosion. I want to consider whether there are alternatives to the idea behind socket timeouts. |
I am glad that you were able to reproduce the bloated memory issue, and that it wasn't just something weird in our setup. I would be glad to help code some solutions to this. My only reservation is that I don't feel as though I know the ActiveMQ protocol well enough to know that a change that I make won't have some other side effect.
Always good to have a second opinion.
I agree with this. It seems to be the only recurring call to
What are your reservations to using socket timeouts? Are there any other ideas that you have had? I was thinking that socket timeouts were the way to go to fix this since it is implemented in a lower level protocol. There should be settings for different operations allowing separate values for connect, read, write, and anything else that might be needed.
Looking at the code, these all appear to use the same timeout value which may help complexity of a change. |
Mainly recreating the effect documented in issue stompgem#121. * towork: Update README for adhoc directory. Demo issue stompgem#121 using Stomp::Conection. Use different memory profiler. Enhance reporting by sender. Attempt to homogenize examples 01 and 02. Initial cut at part 02 of testing. Remove unused and incorrect code. Cleanup/remove extraneous comments. Update ignore list. Add first cut at a message payhload generator. First real memory profiling: Very initial work on issue stompgem#121 analysis. Initial README for the adhoc directory. Initial ignore file for adhoc directory.
Regarding help with the coding ..... please fire away if you have the bandwidth. Send me a proposal, and we can talk about it. Push a test branch to your git hub clone of the gem. I can fetch it from there. Regarding reservations ..... At a 10K foot level, I need to be concerned about impacts to existing gem users. The gem needs to function on:
On a more detailed level:
I will also mention the gem needs to be robust with:
I almost always test against those four brokers (Artemis being fairly recent). One can write spec tests forever ..... and still not know how client code will function in front of a running server. I am sure there are other factors I should consider, but just have not thought about yet. I will document those as they occur to me. My approach is going to be:
Alternatives: One at least is that I was thinking that what I really want is a semaphore/lock implementation with a timeout mechanism. |
Confirmed that issue stompgem#121 also occurs on Windows. - Win 7, Home Premium - ruby 2.3.0p0 (2015-12-25 revision 53290) [x64-mingw32] Commits: - Make file names work anywhere - Remove unnecessary require of rubygems - Attempt to get perms straignt for msys
A short note to confirm that the excessive memory use by timeout is also present on:
|
This is to document the expectations for a solution to this issue. I show the kind of results I would like to see. Two different scenarios are shown below. Both scenarios use:
Scenario 1: the gem as it is today. The beginning of the profiler report is here.
Scenario 2. Here the calls to Timeout::timeout in the _receive method (netio.rb) were just commented out. No other changes were made. The beginning of the profiler report follows.
The timeout.rb file supplied by Ruby does not appear on the second report at all. |
If I toss you a gem over the wall in a few days, are you in a position to smoke test it ? |
Yes, I can make some time to smoke test a new version of the gem. |
Is e-mail attachment OK? The address you have here on github? File name: stomp-1.4.0.gem Points:
Let me know about e-mail delivery please. I have given it a pretty good shakedown on Linux and OSX. I still have some todo's, so a new gem release won't be immediate, but reasonably soon. Partly depending on your results of course. My todo's are:
|
On a more technical note: the solution being put together is based on IO::select. The socket timeout route (SO_RCVTIMEO) will not work. Why? The receive and send timeout functionality is broken, and has been since the early 1.9 versions. Apparently when the Ruby developers switched socket handling to epoll. |
An email attachment will work great for me. That is interesting about the broken implementation of socket timeout. It doesn't surprise me too much though. |
Thanks for sending me a copy of the gem. I am seeing a huge improvement on the amount of allocated memory. The new version of the gem allocates 1/3 of the memory. I have run a couple thousand messages through the connection and all of the messages were successfully sent. I plan on wiring up a stress for sending and receiving messages. Memory stats based on 1000 messages and the setup that I mentioned before. These are running using Ruby 2.2.4 on Red Hat 6. stomp 1.3.5
stomp 1.4.0
I have a few commits that I will make a PR for shortly that should help with testing (wiring up Travis CI and Code Climate). |
I will push the 1.4.0 version out ..... as soon as I have resolved some difficulties with jruby. As usual, jruby behavior is not consistent with ruby. And not even consistent across jruby versions. |
That is really annoying about jruby. What kind of inconsistencies are you running into? |
The primary difficulty is determining whether (or not) a socket has data to read. In normal Ruby, using:
gives that information. Jruby is ..... well different. Some versions return true / false for that as you would wish. But some versions return a Fixnum (supposedly an indication of how much data there is to read) Additionally calls to .ready? can be inconsistent. Meaning call it once, it returns a Jruby version of true. Call it again immediately, and it returns a Jruby false. Much of how to deal with the implementation is ..... trial and error. The result of that is I tend to ignore it for long periods of time. With the magnitude of this change I felt I needed to review / test with a couple of different versions, making changes according to results. The last time I really tested was with a 1.6 version. Current versions are 1.7.x (corresponding to Ruby 1.9.x) and 9.x (corresponding to Ruby 2.x). They are different than 1.6, and different from each other. Look at the checks for @jruby in netio.rb to get a flavor. And there are more to come, this time around. I am close, but not quite there yet. |
OK, version 1.4.0 is available on rubygems. Reference: 78446f5 Install in the usual way. One and all are urged to test this release carefully. |
Awesome. After reading your first message, I was worried that it was going to take significantly longer to get the build working with JRuby. |
So was I .... and I make no guarantees other that the specific releases |
Given that no serious issues have resulted from this change, I am closing this issue. If you have concerns with this, please document them. |
Sounds good to me. Thanks again for your help on this. |
Ruby's
timeout
has some serious issues. Rather than using Ruby'stimeout
for socket timeouts, it would be better to use setting on the socket itself orIO.select
.I was recently profiling a worker that we have that sends messages out to ActiveMQ. During the profiling, I found that nearly 70% of all the memory being allocated for this worker comes from Ruby's
timeout
. It is a very simple resque worker that gets a message and writes it to ActiveMQ using stomp. We maintain the sameStomp::Client
for all of the messages that we send. I profiled this worker with 1000 messages. The total allocated memory was 1527649456 bytes. The total memory being allocated bytimeout
was 1052953152 bytes.Example of using timeout on the socket - https://www.mikeperham.com/2009/03/15/socket-timeouts-in-ruby/
Example using
IO.select
- https://spin.atomicobject.com/2013/09/30/socket-connection-timeout-ruby/The text was updated successfully, but these errors were encountered: