back to io/wait #368

HoneyryderChuck · 2016-08-08T10:54:20Z

should take care of #357

HoneyryderChuck · 2016-08-08T11:27:12Z

@tarcieri we can continue the discussion here.

Checking ruby's source code, I see the same "faulty" wait_* methods handling which caused the known bug. The main difference being, net/http closes connections by default.

In the case of this gem, the preferred default is Keep-Alive, which is a design choice.

This is the sequence of events which randomly triggers the issue (refers to https://github.com/httprb/http/blob/master/lib/http/timeout/per_operation.rb#L61-L72):

read_nonblock(size, :exception => false) returns :wait_readable
wait_readable(read_timeout) returns nil.
- the current code (as net/http) interprets nil as timeout, when according to the documentation, can also mean "EOF is reached". As it is, it ends in a timeout, although the whole response has been buffered.
- if I choose to ignore nil and let it run (I interpret it as "EOF is reached"), next read_nonblock(size, :exception => false) will return nil, and this means also "EOF is reached", therefore would be successfully completed.

So, this might have been already fixed in ruby trunk (according to the code it now nly handles time outs, although implementation could mean something else), which means we can't rely on the false return value.

So, I see these possible solutions:

first nil for timeout is ignored, only "marked for timeout", and if the read_nonblock delivers anything other than nil, we trigger the timeout (otherwise EOF);

tout = false
loop do                                                               
  result = @socket.read_nonblock(size, :exception => false)           

  return :eof if result.nil?                                        
  raise TimeoutError, "Read timed out after #{read_timeout} seconds" if tout
  return result if result != :wait_readable                           

  unless @socket.to_io.wait_readable(read_timeout)                    
    tout = true
  end                                                                 
end

gem could follow the HTTP spec and use the header length to fetch only the necessary body length. From what I understood, this gem (as net/http) is using a default window size to read content from socket to buffer, and uses io events to signal end of read/write tasks. This seems wrong (is it?) and logic could be rewritten use the window to read to buffer only until the Message-Length value is exhausted. This probably has to be handled differently for chunked responses.

What do you think? First option seems to demand less overhead, but second option seems more correct, but might demand more internal refactoring.

tarcieri · 2016-08-09T06:08:20Z

@TiagoCardoso1983 for what it's worth, a GSoC student has been working on an implementation of a Java-like (fixed-sized) ByteBuffer type which bypasses Ruby I/O and does native I/O on native off-heap buffers (or at least that's the goal, some implementations still pending, but GSoC isn't over yet): https://github.com/celluloid/nio4r/pull/95/files

I think, implemented correctly, this is probably the best way to get consistent I/O semantics. There are backends in C, Java, and pure Ruby.

HoneyryderChuck · 2016-08-09T09:18:46Z

@tarcieri that's good to hear, but is it cancelling this fix? IMO it shouldn't be hard to get something which just works with the stdlib, even with the current weird io/wait semantics. What do you think of the 2 suggestions?

tarcieri · 2016-08-10T07:19:02Z

@TiagoCardoso1983 I don't entirely understand what you're proposing in regard to the HTTP semantics you describe. I guess I'll have to further study your links to have an opinion.

Regarding MRI async I/O errata, this is the sort of thing that really feels like it needs regression tests. I would really like to capture the previous regressions we've seen in an integration test.

HoneyryderChuck · 2016-08-13T11:37:41Z

Going forward with it, and judginy by this and this, the Response object has access to the response headers before reading the http response body, hence the 'Message-Length'. We could therefore pass this value to the Response Body object, which means that the response body would know when to stop reading the body without relying in the IO events.
Anyway, this is all theoretical by now and I'll try to add a proof-of-concept for the failing request.

tarcieri · 2016-08-13T17:22:43Z

@TiagoCardoso1983 I can't find any information on this Message-Length header you keep talking about. Can you point to a specific part of the HTTP RFC where it is described?

Regardless, you still need to handle cases where these sorts of headers aren't provided by non-compliant servers.

HoneyryderChuck · 2016-08-13T17:45:39Z

I already linked in a message above. And it's Content-Length .

tarcieri · 2016-08-13T17:46:12Z

@TiagoCardoso1983 we already support Content-Length, however it's not always available

tarcieri · 2016-08-13T17:57:56Z

Ok, looking at the code again it seems the response Content-Length is no longer used when reading the response body. Once upon a time it used to be.

That said, not all servers send it or even know it in advance, so at best it can be used opportunistically.

HoneyryderChuck · 2016-08-13T18:19:14Z

@tarcieri can you have a look at the rfc link I've linked a few messages above? There are some corner cases regarding the response body length which this gem might not address, besides that one.

Regarding the example from the related issue (the https link to google play store), which issues a 404 status code, the response comes with "Transfer-Encoding: chunked". The corrected(?) link which responds to 200 has the Content-Length set. Use curl for these two links to see it for yourself:

The way I see it, even if not being fully compliant with the RFC, this gem could at least handle these two cases, in which when length is present, you only read as much, and when response is chunked, you read until the termination character (here's how it's defined in rack.

I don't agree with what you said. Servers which do not know content length in advance, encode the response as chunked. If you decide to ignore those headers because some non-compliant servers exist, you are opening yourself to a DoS vector, in that you read the body indefinitely until the "malicious" server stops sending data (right?).

HoneyryderChuck · 2016-08-13T18:24:38Z

Just for comparison sake, here's how net-http reads the body: https://github.com/ruby/ruby/blob/trunk/lib/net/http/response.rb#L280-L301

tarcieri · 2016-08-13T19:14:52Z

Yes, it should use Content-Length when available (I believe previous implementations did)

HoneyryderChuck · 2016-08-14T14:27:18Z

@tarcieri I think I got it this time, and the failing example from #298

So, the Content-Length issue is now handled, but the failing case which caused the revert was on a chunked response. I investigated how this happens here, and it seems that the chunks are read by the connection and passed to the parser (HTTP::Parser instance). This will handle the chunked responses internally and give back a proper string back, which will be added to the contents.

The caveat on the http side is that it relies on io events to decide what to do next. I don't think that is correct, but I don't know what is the correct alternative. So here I'll be shooting in the air from now on:

http-parser object should provide a way to say it is done handling the chunks. I saw that the output coming from the socket contained the termination character 0\r\n and the parser correctly concatenated the output, but it doesn't provide a hook stating that body parsing is complete. If it did, the client could terminate body parsing as per the spec instead of "waiting on the socket". Do you know if it supports this already?
Either io/waitdoesn't support the case in which the socket eof's, or there was a race condition between reading/waiting which IO.select somehow bypasses. Check this example

require 'socket'
require 'io/wait'

Thread.new do
  server = TCPServer.new 2000 # Server bind to port 2000
  loop do
    client = server.accept    # Wait for a client to connect
    client.puts "Hello !"
    client.puts "Time is #{Time.now}"
    sleep 3
    client.close
  end
end

puts "server is starting..."
sleep 2

puts "connecting now"
s = TCPSocket.new 'localhost', 2000

puts "connected"

10.times do
  k = s.wait_readable(2)
  puts "wait: #{k.inspect}"
  v = s.read_nonblock(8, exception: false)
  puts "val: #{v.inspect}"

  break if v == nil  
end

I will fill a bug to see what the core team thinks, but this makes indeed leveraging io/wait harder.

HoneyryderChuck · 2016-08-14T14:42:50Z

https://bugs.ruby-lang.org/issues/12674

tarcieri · 2016-08-15T00:24:07Z

@TiagoCardoso1983 I believe the on_message_complete callback should fire at the end of each chunk, and an empty chunk should signify the end of the body.

ixti · 2016-08-17T15:35:05Z

lib/http/errors.rb

@@ -20,4 +20,7 @@ class TimeoutError < Error; end

  # Header name is invalid
  class InvalidHeaderNameError < Error; end
+
+  # Header value is of unexpected format (similar to Net::HTTPHeaderSyntaxError)
+  class HeaderSyntaxError < Error; end


What do you think of changing this to:

class HeaderError < Error; end class InvalidHeaderNameError < HeaderError; end class InvalidHeaderValueError < HeaderError; end

humm, this one is tricky... I think the class of errors that can come from headers are not covered by these 2. Maybe the separator is wrong, for example. Also, this is in theory a ResponseError, as it's not a request header, and etc.

IMO I think the best solution could be just to create a simple HeaderError and use it everywhere, and add the details in the message. Unless I'm missing the motive for so much error granularity.

Let me know what you think, I can change to your suggestion for now, as soon as I come back to this. I work behind a proxy during the week, which is contatenating chunked responses, thereby blocking my use case.

Mmm. I tend to agree that single class HeaderError < Error; end should be enough.
But in order to make it backward compatible I propose to do this:

class HeaderError < Error; end # @deprecated will be removed in v3.0.0 InvalidHeaderNameError = HeaderError

HoneyryderChuck · 2016-08-23T11:49:59Z

So I've reverted the exception for content length, and it returns nil, and the response can be instantiated. Which means that the body will be fetched until end of connection, potentially ad eternum.

Besides the security reason there, I think that this feature doesn't conform to the HTTP/1.1 spec, and even net/http does it. I don't know why one needs to instantiate the response if the response/headers is/are invalid, but then again, I only contributed a few commits and you know more about the usage history of this gem.

In case one wants to satisfy both requirements, the way the response is instantiated has to change, and this should maybe be the task of the core team.

tarcieri · 2016-08-23T15:27:24Z

@TiagoCardoso1983 chalk it up to Postel's law. That said, the "security" issue you're describing exists for a compliant implementation which follows an endless chunked response and exists in every compliant HTTP/1.1 client.

Looks like there are some conflicts that need to be resolved.

ixti · 2016-08-23T16:12:04Z

lib/http/response.rb

      value = @headers[Headers::CONTENT_LENGTH]
      return unless value

      begin
        Integer(value)
      rescue ArgumentError
-        nil
      end


Can you, please, return nil in here. It will make rubocop a bit pleased ;))

HoneyryderChuck · 2016-08-24T07:32:09Z

@tarcieri that is correct, I meant more as an unexpected security concern, as the protocol defines those use cases. My goal is just to document it in case future issues arise from it.

ixti · 2016-08-24T09:48:24Z

lib/http/response.rb

+      # Clause 3: "If a message is received with both a Transfer-Encoding
+      # and a Content-Length header field, the Transfer-Encoding overrides the Content-Length.        
+      return nil if @headers.include?(Headers::TRANSFER_ENCODING) ||
+                    !@headers.include?(Headers::CONTENT_LENGTH)


I would split those in two separate statements for sake of simplicity (took me a while to understand that this is correct statement -- was pretty sure there's a mistake) as it was.

return nil if @headers.include?(Headers::TRANSFER_ENCODING) value = @headers[Headers::CONTENT_LENGTH] return nil unless value # ...

Calling #include? and then #[] will involve extra call in case of "happy path". Although #include? is more efficient than #[], when you end up having those two - it worth to keep #[] only (at least in this particular place). Just to explain:

class Headers def include?(name) @headers_array.any? { |key, val| key == name } end def [](name) values = @headers_array.each_with_object([]) { |(k, v), a| a << v if k == name } return nil if values.empty? return values.first if 1 == values.count values end end

I see. Actually what one needs is something equivalent do Enumerable#find. As Headers is Enumerable, this shouldn't be an issue.

Yeah. You can call find on Headers instance:

HTTP.get("https://ya.ru").headers.find { |key, val| "Content-Length" == key } # => ["Content-Length", "9578"]

In fact I think toward changing the way headers are kept internally, that will make all operations on Headers fast and efficient. Right now it's an array of [key, val] arrays.
The only good thing about this format is that it allows "recreate" headers list as is. BUT. We normalize header names, so it's not actually guaranteed. So I lean towards changing data under the hood to be Hash{String, Array<String>}. That will make most operations O(1).

The only thing is that if that is gonna happen (change to underlying data structure) it will be released only as next major release due to introduction of backward incompatible changes.

I might also try extracting the timeout code into a separate gem...

I was pretty sure @zanker was doing something related to that, although I might misunderstood something.

Pretty sure he abandoned his efforts.

Indeed. Just out of curiosity, what's the idea of normalizing the header key?

tarcieri · 2017-01-27T17:46:00Z

@HoneyryderChuck this still looks like good stuff (at least for 3.0). Mind rebasing?

ixti

Minor tweaks and I'll be happy to merge!

ixti · 2017-02-03T01:49:26Z

lib/http/errors.rb

+  # Header value is of unexpected format (similar to Net::HTTPHeaderSyntaxError)
+  class HeaderError < Error; end
+
+  # @deprecated will be removed in v3.0.0


As we're gonna release this PR as part of v3.0.0 it worth to change this to be will be removed in v3.1.0

But if it is to be removed in 3.0.0 and this PR is about 3.0.0, then why not just go and remove support? I'd save this tweak to upstream though, it's beyond the scope of the PR IMO.

Mmm. I guess I'm agree. Just remove this deprecated class then.

ixti · 2017-02-03T01:51:16Z

lib/http/response/body.rb

@@ -53,7 +53,10 @@ def to_s
          @streaming  = false
          @contents   = String.new("").force_encoding(encoding)

-          while (chunk = @stream.readpartial)
+          length = @length || Float::INFINITY


I think we should follow your initial idea of limiting amount of data to given Content-Length when available and should be respected according to RFC.

HoneyryderChuck · 2017-02-03T19:52:29Z

can I make a few suggestions regarding rubocop? Because, if you notice the ammount of useful commits against rubocop tweaks in this PR, you can realize how annoying that might have been. I get the usefulness of maintaining a standard, but there are some inconsistencies which make one lose focus. Main one being, the hash-rocket insistence. Main reasons are:

you can't use kwargs and hash-rockets at the same time.

# syntax error
def bla(:a => nil, :b => nil)

You can pass hash-rocket like parameters as kwargs

# if I don't call #bla as this, rubocop complaints
bla(:a => 1, :b => 2)

I also suggest dropping this spec given that minimum requirement of this gem is ruby 2.0, and the "new" hash syntax exists already since 1.9 .

ixti · 2017-02-06T19:27:11Z

Will merge this today.
Re hash syntax: #342

HoneyryderChuck · 2017-02-06T21:21:05Z

Will merge this today.

hopefully you make it, as "before the weekend" just passed ;) not to sound like an a-hole, but I've been mostly fighting conflicts and rubocop for the last week, which is time and nerve-consuming, for a gem I haven't been using lately.

tarcieri · 2017-02-06T21:23:05Z

@ixti remember to push master to 2-x-stable prior to merging this, as it's a breaking change, so with this master will officially become the 3.x development branch.

We should probably also add some README text to clarify the version numbers and development state of each branch (I can take care of that)

@HoneyryderChuck I'm 👍 on moving to the new hash syntax across the board, @ixti is the detractor there.

re: RuboCop, are you aware of rubocop -a? (which will autofix your code)

This is a squashed commit of PR: #368

HoneyryderChuck · 2017-02-06T21:42:51Z

@tarcieri I actually wasn't, thx for the tip 👍

ixti · 2017-02-06T21:57:07Z

I have squashed and merged this: cb44c5d

ixti · 2017-02-06T22:34:51Z

@tarcieri 2-x-stable was actually created as soon as we released v2.0.0, and now it's in sync with v.2.2.1, while master contains this PR

ixti · 2017-02-06T22:40:02Z

Re: hash syntax. Yes. I am strongly against new hash syntax.

tarcieri · 2017-02-06T22:57:26Z

@ixti yeah, just making sure it was up-to-date before merging this. I'll add some clarifying information to the READMEs tonight.

re: hash syntax, I'm only weakly in favor of the new syntax, so I guess your strong objection wins 😜

HoneyryderChuck · 2017-02-07T13:24:32Z

@ixti respect your opinion, and as one of the moderators of this project, your counts the most. In terms of creating arbitrary hashes, I have to say that I don't have a strong enough counter-argument, since as stated in the linked issue, you can't represent non-symbol types with the new syntax.

However, I think one could "relax" the policy and open an exception, namely when you pass the options directly in the method signature. From what I understood, you're not against kwargs. I'd suggest "enforcing" them in the internal http API (at least), as it adds argument-not-type checks on method call which map to more meaningful error messages, and will remove options hash custom handling like this one bit here (see #fetch calls). Makes a maintainer's life much easier and lessens "primitive obsession".

That being said, I think that insisting in the old hash-rocket for method calls creates confusion, as there is dissonance between defining a method and calling a method:

def person(first: , last: )
...

# same
person(first: "Dirty", last: "Harry")
# not same
person(:first => "Dirty", :last => "Harry")

To summarize, I agree with you in the hash thing, but the fact that you can't define kwargs in an hash-rocket-compatible forces you to handle the hash syntax inconsistency one one or another. The only way to prevent this would be not to use kwargs at all, but I hope I made a strong argument about why you should use kwargs. And therefore, I think that the new syntax could be accepted as long as the new syntax could only be used in method calls directly and not to instantiate hashes. Whether rubocop handles such an exception, that's another topic.

Those are my 2 cents. As stated, I respect the logic of the choice.

ixti · 2017-02-07T16:49:01Z

@HoneyryderChuck We're all for kwargs. It's just we dropped Ruby relatively not so long ago, so nobody was working on rewriting internal API. I believe that's a great idea (to get rid of opts hashes) for 3.0.0 release.

I agree with you that current restriction introduces a small confusion (in method calling vs method definition), but I find it not so bad, after all method definition differs from method calling in general, e.g.:

def say(what = "wut?")
  # ...
end

# you can:
say("hi!")

# but can't (obviously)
say(what = "hi")

So I prefer to look at this particular problem (of kwargs definition/usage difference) through this prism :D Relaxing of rubocop on has syntax will add extra burden on those who review to check styling as well as one will have to verify if used Hash is not a hash but kwargs.

tarcieri · 2017-02-07T19:10:32Z

I think, at least, we can enable the OptionHash cop so as to forbid option hashes:

https://github.com/bbatsov/rubocop/pull/2078/files

We should start favoring keyword arguments See: #368 (comment)

@ixti

pkgsrc changes: - sort DEPENDS Upstream changes (from CHANGES.md): ## 3.0.0 (2017-10-01) * Drop support of Ruby `2.0` and Ruby `2.1`. ([@ixti]) * [#410](httprb/http#410) Infer `Host` header upon redirects. ([@janko-m]) * [#409](httprb/http#409) Enables request body streaming on any IO object. ([@janko-m]) * [#413](httprb/http#413), [#414](httprb/http#414) Fix encoding of body chunks. ([@janko-m]) * [#368](httprb/http#368), [#357](httprb/http#357) Fix timeout issue. ([@HoneyryderChuck])

HoneyryderChuck force-pushed the issue_357 branch from 99f2f83 to 4b2bdcf Compare August 13, 2016 23:13

ixti reviewed Aug 17, 2016
View reviewed changes

ixti reviewed Aug 23, 2016
View reviewed changes

HoneyryderChuck force-pushed the issue_357 branch from 2ddadff to 3e43c3d Compare August 24, 2016 07:27

ixti reviewed Aug 24, 2016
View reviewed changes

ixti self-requested a review January 30, 2017 15:12

ixti requested changes Feb 3, 2017

View reviewed changes

Tiago Cardoso added 6 commits February 3, 2017 20:12

reintegrated kwargs in the body API, and passing the content length down

108e127

rubo rubo rubocop

607e42b

rubocop, make up your mind...

4a90c6c

removing deprecation

21c0f60

removed tests with deprecated error..

7fc577b

rubocop again...

5b19215

more ruboscosps...

8d84978

ixti approved these changes Feb 3, 2017

View reviewed changes

Merge branch 'master' into issue_357

0346723

HoneyryderChuck and others added 2 commits February 6, 2017 22:11

Merge branch 'master' into issue_357

a77e80e

rbc

a6afeeb

ixti pushed a commit that referenced this pull request Feb 6, 2017

back to io/wait

cb44c5d

This is a squashed commit of PR: #368

ixti closed this Feb 6, 2017

tarcieri added a commit that referenced this pull request Feb 9, 2017

Enable the Style/OptionHash cop

1d99481

We should start favoring keyword arguments See: #368 (comment)

tarcieri mentioned this pull request Feb 9, 2017

Enable the Style/OptionHash cop in RuboCop #400

Merged

back to io/wait #368

back to io/wait #368

Conversation

HoneyryderChuck commented Aug 8, 2016

HoneyryderChuck commented Aug 8, 2016

tarcieri commented Aug 9, 2016 • edited Loading

HoneyryderChuck commented Aug 9, 2016

tarcieri commented Aug 10, 2016

HoneyryderChuck commented Aug 13, 2016

tarcieri commented Aug 13, 2016

HoneyryderChuck commented Aug 13, 2016

tarcieri commented Aug 13, 2016

tarcieri commented Aug 13, 2016

HoneyryderChuck commented Aug 13, 2016

HoneyryderChuck commented Aug 13, 2016

tarcieri commented Aug 13, 2016 • edited Loading

HoneyryderChuck commented Aug 14, 2016

HoneyryderChuck commented Aug 14, 2016

tarcieri commented Aug 15, 2016

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ixti Aug 17, 2016 • edited Loading

Choose a reason for hiding this comment

HoneyryderChuck commented Aug 23, 2016

tarcieri commented Aug 23, 2016 • edited Loading

Choose a reason for hiding this comment

HoneyryderChuck commented Aug 24, 2016

ixti Aug 24, 2016 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

tarcieri commented Jan 27, 2017

ixti left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

HoneyryderChuck commented Feb 3, 2017

ixti commented Feb 6, 2017

HoneyryderChuck commented Feb 6, 2017

tarcieri commented Feb 6, 2017

HoneyryderChuck commented Feb 6, 2017

ixti commented Feb 6, 2017

ixti commented Feb 6, 2017

ixti commented Feb 6, 2017

tarcieri commented Feb 6, 2017

HoneyryderChuck commented Feb 7, 2017

ixti commented Feb 7, 2017

tarcieri commented Feb 7, 2017

tarcieri commented Aug 9, 2016 •

edited

Loading

tarcieri commented Aug 13, 2016 •

edited

Loading

ixti Aug 17, 2016 •

edited

Loading

tarcieri commented Aug 23, 2016 •

edited

Loading

ixti Aug 24, 2016 •

edited

Loading