Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

back to io/wait #368

Closed
wants to merge 29 commits into from
Closed

Conversation

HoneyryderChuck
Copy link

should take care of #357

@HoneyryderChuck
Copy link
Author

@tarcieri we can continue the discussion here.

Checking ruby's source code, I see the same "faulty" wait_* methods handling which caused the known bug. The main difference being, net/http closes connections by default.

In the case of this gem, the preferred default is Keep-Alive, which is a design choice.

This is the sequence of events which randomly triggers the issue (refers to https://github.com/httprb/http/blob/master/lib/http/timeout/per_operation.rb#L61-L72):

  • read_nonblock(size, :exception => false) returns :wait_readable
  • wait_readable(read_timeout) returns nil.
    • the current code (as net/http) interprets nil as timeout, when according to the documentation, can also mean "EOF is reached". As it is, it ends in a timeout, although the whole response has been buffered.
    • if I choose to ignore nil and let it run (I interpret it as "EOF is reached"), next read_nonblock(size, :exception => false) will return nil, and this means also "EOF is reached", therefore would be successfully completed.

So, this might have been already fixed in ruby trunk (according to the code it now nly handles time outs, although implementation could mean something else), which means we can't rely on the false return value.

So, I see these possible solutions:

  • first nil for timeout is ignored, only "marked for timeout", and if the read_nonblock delivers anything other than nil, we trigger the timeout (otherwise EOF);
tout = false
loop do                                                               
  result = @socket.read_nonblock(size, :exception => false)           

  return :eof if result.nil?                                        
  raise TimeoutError, "Read timed out after #{read_timeout} seconds" if tout
  return result if result != :wait_readable                           

  unless @socket.to_io.wait_readable(read_timeout)                    
    tout = true
  end                                                                 
end                                                                   

What do you think? First option seems to demand less overhead, but second option seems more correct, but might demand more internal refactoring.

@tarcieri
Copy link
Member

tarcieri commented Aug 9, 2016

@TiagoCardoso1983 for what it's worth, a GSoC student has been working on an implementation of a Java-like (fixed-sized) ByteBuffer type which bypasses Ruby I/O and does native I/O on native off-heap buffers (or at least that's the goal, some implementations still pending, but GSoC isn't over yet): https://github.com/celluloid/nio4r/pull/95/files

I think, implemented correctly, this is probably the best way to get consistent I/O semantics. There are backends in C, Java, and pure Ruby.

@HoneyryderChuck
Copy link
Author

@tarcieri that's good to hear, but is it cancelling this fix? IMO it shouldn't be hard to get something which just works with the stdlib, even with the current weird io/wait semantics. What do you think of the 2 suggestions?

@tarcieri
Copy link
Member

@TiagoCardoso1983 I don't entirely understand what you're proposing in regard to the HTTP semantics you describe. I guess I'll have to further study your links to have an opinion.

Regarding MRI async I/O errata, this is the sort of thing that really feels like it needs regression tests. I would really like to capture the previous regressions we've seen in an integration test.

@HoneyryderChuck
Copy link
Author

Going forward with it, and judginy by this and this, the Response object has access to the response headers before reading the http response body, hence the 'Message-Length'. We could therefore pass this value to the Response Body object, which means that the response body would know when to stop reading the body without relying in the IO events.
Anyway, this is all theoretical by now and I'll try to add a proof-of-concept for the failing request.

@tarcieri
Copy link
Member

@TiagoCardoso1983 I can't find any information on this Message-Length header you keep talking about. Can you point to a specific part of the HTTP RFC where it is described?

Regardless, you still need to handle cases where these sorts of headers aren't provided by non-compliant servers.

@HoneyryderChuck
Copy link
Author

I already linked in a message above. And it's Content-Length .

@tarcieri
Copy link
Member

@TiagoCardoso1983 we already support Content-Length, however it's not always available

@tarcieri
Copy link
Member

Ok, looking at the code again it seems the response Content-Length is no longer used when reading the response body. Once upon a time it used to be.

That said, not all servers send it or even know it in advance, so at best it can be used opportunistically.

@HoneyryderChuck
Copy link
Author

@tarcieri can you have a look at the rfc link I've linked a few messages above? There are some corner cases regarding the response body length which this gem might not address, besides that one.

Regarding the example from the related issue (the https link to google play store), which issues a 404 status code, the response comes with "Transfer-Encoding: chunked". The corrected(?) link which responds to 200 has the Content-Length set. Use curl for these two links to see it for yourself:

The way I see it, even if not being fully compliant with the RFC, this gem could at least handle these two cases, in which when length is present, you only read as much, and when response is chunked, you read until the termination character (here's how it's defined in rack.

I don't agree with what you said. Servers which do not know content length in advance, encode the response as chunked. If you decide to ignore those headers because some non-compliant servers exist, you are opening yourself to a DoS vector, in that you read the body indefinitely until the "malicious" server stops sending data (right?).

@HoneyryderChuck
Copy link
Author

Just for comparison sake, here's how net-http reads the body: https://github.com/ruby/ruby/blob/trunk/lib/net/http/response.rb#L280-L301

@tarcieri
Copy link
Member

tarcieri commented Aug 13, 2016

Yes, it should use Content-Length when available (I believe previous implementations did)

@HoneyryderChuck
Copy link
Author

@tarcieri I think I got it this time, and the failing example from #298

So, the Content-Length issue is now handled, but the failing case which caused the revert was on a chunked response. I investigated how this happens here, and it seems that the chunks are read by the connection and passed to the parser (HTTP::Parser instance). This will handle the chunked responses internally and give back a proper string back, which will be added to the contents.

The caveat on the http side is that it relies on io events to decide what to do next. I don't think that is correct, but I don't know what is the correct alternative. So here I'll be shooting in the air from now on:

  • http-parser object should provide a way to say it is done handling the chunks. I saw that the output coming from the socket contained the termination character 0\r\n and the parser correctly concatenated the output, but it doesn't provide a hook stating that body parsing is complete. If it did, the client could terminate body parsing as per the spec instead of "waiting on the socket". Do you know if it supports this already?
  • Either io/waitdoesn't support the case in which the socket eof's, or there was a race condition between reading/waiting which IO.select somehow bypasses. Check this example
require 'socket'
require 'io/wait'

Thread.new do
  server = TCPServer.new 2000 # Server bind to port 2000
  loop do
    client = server.accept    # Wait for a client to connect
    client.puts "Hello !"
    client.puts "Time is #{Time.now}"
    sleep 3
    client.close
  end
end

puts "server is starting..."
sleep 2

puts "connecting now"
s = TCPSocket.new 'localhost', 2000

puts "connected"

10.times do
  k = s.wait_readable(2)
  puts "wait: #{k.inspect}"
  v = s.read_nonblock(8, exception: false)
  puts "val: #{v.inspect}"

  break if v == nil  
end

I will fill a bug to see what the core team thinks, but this makes indeed leveraging io/wait harder.

@HoneyryderChuck
Copy link
Author

@tarcieri
Copy link
Member

@TiagoCardoso1983 I believe the on_message_complete callback should fire at the end of each chunk, and an empty chunk should signify the end of the body.

@@ -20,4 +20,7 @@ class TimeoutError < Error; end

# Header name is invalid
class InvalidHeaderNameError < Error; end

# Header value is of unexpected format (similar to Net::HTTPHeaderSyntaxError)
class HeaderSyntaxError < Error; end
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you think of changing this to:

class HeaderError < Error; end
class InvalidHeaderNameError < HeaderError; end
class InvalidHeaderValueError < HeaderError; end

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

humm, this one is tricky... I think the class of errors that can come from headers are not covered by these 2. Maybe the separator is wrong, for example. Also, this is in theory a ResponseError, as it's not a request header, and etc.

IMO I think the best solution could be just to create a simple HeaderError and use it everywhere, and add the details in the message. Unless I'm missing the motive for so much error granularity.

Let me know what you think, I can change to your suggestion for now, as soon as I come back to this. I work behind a proxy during the week, which is contatenating chunked responses, thereby blocking my use case.

Copy link
Member

@ixti ixti Aug 17, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mmm. I tend to agree that single class HeaderError < Error; end should be enough.
But in order to make it backward compatible I propose to do this:

class HeaderError < Error; end

# @deprecated will be removed in v3.0.0
InvalidHeaderNameError = HeaderError

@HoneyryderChuck
Copy link
Author

So I've reverted the exception for content length, and it returns nil, and the response can be instantiated. Which means that the body will be fetched until end of connection, potentially ad eternum.

Besides the security reason there, I think that this feature doesn't conform to the HTTP/1.1 spec, and even net/http does it. I don't know why one needs to instantiate the response if the response/headers is/are invalid, but then again, I only contributed a few commits and you know more about the usage history of this gem.

In case one wants to satisfy both requirements, the way the response is instantiated has to change, and this should maybe be the task of the core team.

@tarcieri
Copy link
Member

tarcieri commented Aug 23, 2016

@TiagoCardoso1983 chalk it up to Postel's law. That said, the "security" issue you're describing exists for a compliant implementation which follows an endless chunked response and exists in every compliant HTTP/1.1 client.

Looks like there are some conflicts that need to be resolved.

value = @headers[Headers::CONTENT_LENGTH]
return unless value

begin
Integer(value)
rescue ArgumentError
nil
end
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you, please, return nil in here. It will make rubocop a bit pleased ;))

@HoneyryderChuck
Copy link
Author

@tarcieri that is correct, I meant more as an unexpected security concern, as the protocol defines those use cases. My goal is just to document it in case future issues arise from it.

# Clause 3: "If a message is received with both a Transfer-Encoding
# and a Content-Length header field, the Transfer-Encoding overrides the Content-Length.
return nil if @headers.include?(Headers::TRANSFER_ENCODING) ||
!@headers.include?(Headers::CONTENT_LENGTH)
Copy link
Member

@ixti ixti Aug 24, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would split those in two separate statements for sake of simplicity (took me a while to understand that this is correct statement -- was pretty sure there's a mistake) as it was.

return nil if @headers.include?(Headers::TRANSFER_ENCODING)

value = @headers[Headers::CONTENT_LENGTH]
return nil unless value

# ...

Calling #include? and then #[] will involve extra call in case of "happy path". Although #include? is more efficient than #[], when you end up having those two - it worth to keep #[] only (at least in this particular place). Just to explain:

class Headers
  def include?(name)
    @headers_array.any? { |key, val| key == name }
  end

  def [](name)
    values = @headers_array.each_with_object([]) { |(k, v), a| a << v if k == name }
    return nil if values.empty?
    return values.first if 1 == values.count
    values
  end
end

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see. Actually what one needs is something equivalent do Enumerable#find. As Headers is Enumerable, this shouldn't be an issue.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah. You can call find on Headers instance:

HTTP.get("https://ya.ru").headers.find { |key, val| "Content-Length" == key } 
# => ["Content-Length", "9578"]

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In fact I think toward changing the way headers are kept internally, that will make all operations on Headers fast and efficient. Right now it's an array of [key, val] arrays.
The only good thing about this format is that it allows "recreate" headers list as is. BUT. We normalize header names, so it's not actually guaranteed. So I lean towards changing data under the hood to be Hash{String, Array<String>}. That will make most operations O(1).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The only thing is that if that is gonna happen (change to underlying data structure) it will be released only as next major release due to introduction of backward incompatible changes.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I might also try extracting the timeout code into a separate gem...

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was pretty sure @zanker was doing something related to that, although I might misunderstood something.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pretty sure he abandoned his efforts.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh. Ok.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed. Just out of curiosity, what's the idea of normalizing the header key?

@tarcieri
Copy link
Member

@HoneyryderChuck this still looks like good stuff (at least for 3.0). Mind rebasing?

@ixti ixti self-requested a review January 30, 2017 15:12
Copy link
Member

@ixti ixti left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor tweaks and I'll be happy to merge!

# Header value is of unexpected format (similar to Net::HTTPHeaderSyntaxError)
class HeaderError < Error; end

# @deprecated will be removed in v3.0.0
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As we're gonna release this PR as part of v3.0.0 it worth to change this to be will be removed in v3.1.0

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But if it is to be removed in 3.0.0 and this PR is about 3.0.0, then why not just go and remove support? I'd save this tweak to upstream though, it's beyond the scope of the PR IMO.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mmm. I guess I'm agree. Just remove this deprecated class then.

@@ -53,7 +53,10 @@ def to_s
@streaming = false
@contents = String.new("").force_encoding(encoding)

while (chunk = @stream.readpartial)
length = @length || Float::INFINITY
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should follow your initial idea of limiting amount of data to given Content-Length when available and should be respected according to RFC.

@HoneyryderChuck
Copy link
Author

can I make a few suggestions regarding rubocop? Because, if you notice the ammount of useful commits against rubocop tweaks in this PR, you can realize how annoying that might have been. I get the usefulness of maintaining a standard, but there are some inconsistencies which make one lose focus. Main one being, the hash-rocket insistence. Main reasons are:

  • you can't use kwargs and hash-rockets at the same time.
# syntax error
def bla(:a => nil, :b => nil)
  • You can pass hash-rocket like parameters as kwargs
# if I don't call #bla as this, rubocop complaints
bla(:a => 1, :b => 2)

I also suggest dropping this spec given that minimum requirement of this gem is ruby 2.0, and the "new" hash syntax exists already since 1.9 .

@ixti
Copy link
Member

ixti commented Feb 6, 2017

Will merge this today.
Re hash syntax: #342

@HoneyryderChuck
Copy link
Author

Will merge this today.

hopefully you make it, as "before the weekend" just passed ;) not to sound like an a-hole, but I've been mostly fighting conflicts and rubocop for the last week, which is time and nerve-consuming, for a gem I haven't been using lately.

@tarcieri
Copy link
Member

tarcieri commented Feb 6, 2017

@ixti remember to push master to 2-x-stable prior to merging this, as it's a breaking change, so with this master will officially become the 3.x development branch.

We should probably also add some README text to clarify the version numbers and development state of each branch (I can take care of that)

@HoneyryderChuck I'm 👍 on moving to the new hash syntax across the board, @ixti is the detractor there.

re: RuboCop, are you aware of rubocop -a? (which will autofix your code)

ixti pushed a commit that referenced this pull request Feb 6, 2017
This is a squashed commit of PR: #368
@HoneyryderChuck
Copy link
Author

@tarcieri I actually wasn't, thx for the tip 👍

@ixti
Copy link
Member

ixti commented Feb 6, 2017

I have squashed and merged this: cb44c5d

@ixti ixti closed this Feb 6, 2017
@ixti
Copy link
Member

ixti commented Feb 6, 2017

@tarcieri 2-x-stable was actually created as soon as we released v2.0.0, and now it's in sync with v.2.2.1, while master contains this PR

@ixti
Copy link
Member

ixti commented Feb 6, 2017

Re: hash syntax. Yes. I am strongly against new hash syntax.

@tarcieri
Copy link
Member

tarcieri commented Feb 6, 2017

@ixti yeah, just making sure it was up-to-date before merging this. I'll add some clarifying information to the READMEs tonight.

re: hash syntax, I'm only weakly in favor of the new syntax, so I guess your strong objection wins 😜

@HoneyryderChuck
Copy link
Author

@ixti respect your opinion, and as one of the moderators of this project, your counts the most. In terms of creating arbitrary hashes, I have to say that I don't have a strong enough counter-argument, since as stated in the linked issue, you can't represent non-symbol types with the new syntax.

However, I think one could "relax" the policy and open an exception, namely when you pass the options directly in the method signature. From what I understood, you're not against kwargs. I'd suggest "enforcing" them in the internal http API (at least), as it adds argument-not-type checks on method call which map to more meaningful error messages, and will remove options hash custom handling like this one bit here (see #fetch calls). Makes a maintainer's life much easier and lessens "primitive obsession".

That being said, I think that insisting in the old hash-rocket for method calls creates confusion, as there is dissonance between defining a method and calling a method:

def person(first: , last: )
...

# same
person(first: "Dirty", last: "Harry")
# not same
person(:first => "Dirty", :last => "Harry")

To summarize, I agree with you in the hash thing, but the fact that you can't define kwargs in an hash-rocket-compatible forces you to handle the hash syntax inconsistency one one or another. The only way to prevent this would be not to use kwargs at all, but I hope I made a strong argument about why you should use kwargs. And therefore, I think that the new syntax could be accepted as long as the new syntax could only be used in method calls directly and not to instantiate hashes. Whether rubocop handles such an exception, that's another topic.

Those are my 2 cents. As stated, I respect the logic of the choice.

@ixti
Copy link
Member

ixti commented Feb 7, 2017

@HoneyryderChuck We're all for kwargs. It's just we dropped Ruby relatively not so long ago, so nobody was working on rewriting internal API. I believe that's a great idea (to get rid of opts hashes) for 3.0.0 release.

I agree with you that current restriction introduces a small confusion (in method calling vs method definition), but I find it not so bad, after all method definition differs from method calling in general, e.g.:

def say(what = "wut?")
  # ...
end

# you can:
say("hi!")

# but can't (obviously)
say(what = "hi")

So I prefer to look at this particular problem (of kwargs definition/usage difference) through this prism :D Relaxing of rubocop on has syntax will add extra burden on those who review to check styling as well as one will have to verify if used Hash is not a hash but kwargs.

@tarcieri
Copy link
Member

tarcieri commented Feb 7, 2017

I think, at least, we can enable the OptionHash cop so as to forbid option hashes:

https://github.com/bbatsov/rubocop/pull/2078/files

tarcieri added a commit that referenced this pull request Feb 9, 2017
We should start favoring keyword arguments

See: #368 (comment)
netbsd-srcmastr pushed a commit to NetBSD/pkgsrc that referenced this pull request Dec 9, 2017
pkgsrc changes:
- sort DEPENDS

Upstream changes (from CHANGES.md):

## 3.0.0 (2017-10-01)

* Drop support of Ruby `2.0` and Ruby `2.1`.
  ([@ixti])

* [#410](httprb/http#410)
  Infer `Host` header upon redirects.
  ([@janko-m])

* [#409](httprb/http#409)
  Enables request body streaming on any IO object.
  ([@janko-m])

* [#413](httprb/http#413),
  [#414](httprb/http#414)
  Fix encoding of body chunks.
  ([@janko-m])

* [#368](httprb/http#368),
  [#357](httprb/http#357)
  Fix timeout issue.
  ([@HoneyryderChuck])
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants