-
-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Handle Range
requests in HTTP::StaticFileHandler
#12886
Handle Range
requests in HTTP::StaticFileHandler
#12886
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd like to see some additional specs for an empty file and where start byte is in range, but the end byte is not (I suppose it should clamp to the end of data?).
The Multipart.parse
method is an independent feature. Could you extract that into a separate PR, please? We should try to separate concerns.
if (range = context.request.headers["Range"]?) && (match = range.match(/bytes=([0-9\-,]+)/)) | ||
ranges = match[1].split(',').map do |range| | ||
start, finish = range.strip.split('-', 2) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This parsing mechanism looks like it doesn't do proper error handling. It breaks on invalid range values such as bytes=1
or bytes=1-2-3
.
Also, there is some duplication between matching the overall (incomplete) format via regex and then splitting into substrings.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This parsing mechanism looks like it doesn't do proper error handling. It breaks on invalid range values such as
bytes=1
orbytes=1-2-3
.
This is not actionable feedback. Define “breaks”. What do you think the expected and actual behaviors are here?
Also, there is some duplication between matching the overall (incomplete) format via regex and then splitting into substrings.
What do you mean here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Additionally, how likely is this header to be malformed? How critical is it to get this precisely correct before the feature is merged at all? If it has to be perfect before it gets merged I’ll just close the PR and publish it as a shard.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
bytes=1
raises IndexError
and bytes=1-2-3
is interpreted the same as bytes=1-
. I think both kind of errors should cause an HTTP error response.
Also, there is some duplication between matching the overall (incomplete) format via regex and then splitting into substrings.
What do you mean here?
It's just that validation is essentially split between the regex and the string methods. So some of that is redundant. For example the regex would be unnecessary, if the parsing was bit more precise.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Additionally, how likely is this header to be malformed? How critical is it to get this precisely correct before the feature is merged at all? If it has to be perfect before it gets merged I’ll just close the PR and publish it as a shard.
I doesn't has to be perfect. But the basics should be covered and I think proper error handling is necessary for that.
As said below, I can take care of finishing it up and I expect this shouldn't take long to be merged.
The base implementation looks solid, just needs some adjustments to handle edge cases / errors properly.
I was sure I had written a test for omitting the end of the range. Looks like it got deleted or rewritten into one of the ones I did end up committing. But it does clamp to the end of the file as written.
It was required to be able to parse the response for multiple ranges. This PR can’t exist without it. |
This PR can depend on the other one. |
If you would like to go to all that effort yourself of creating multiple PRs and wiring up their dependencies, you’re more than welcome to, but I will not be splitting PRs for the sake of accounting that ultimately makes no difference. I’ve already spent over a day on this PR already between trying to understand how these requests work, implementing the feature itself, and doing additional testing with a browser, not counting the time I spent on a previous attempt at this several months ago. |
Sure, no problem. I can take care of these administrative aspects. Thanks for all the work you have already put into it. I realize that it seems a bit cumbersome to go the extra steps, but we've realized that it's worth it in the long run for the project to have small and self-contained changes. And we'll take the burdon of this extra work (it's not that much actually, though). |
I extracted the implementation of |
I said it before and I'll say it again: the Ruby core team happily merges incomplete or almost good PRs. Then they introduce extra commits (here we'll use extra PRs) with those additional changes. The end result is that contributory are happier, they don't get frustrated, they don't get to do extra work, and as a bonus, they don't get the final word on how things should be. The last one might be a downside to them, but I'm sure the happiness is of contributing to is greater (I contributed to Ruby a couple of times and that was my experience) |
Like, instead of saying "please do this and that" it's so much easier to do it ourselves with exactly the code that we have in mind. |
That said, everything is depends on the chosen process. If contribution is a friction, providing a diff of the intended change could also work. That way the rest of the work will surely depend on the core team. Happy new year everyone! |
1641e22
to
65b6f26
Compare
Also extract implementation into helper method
Co-authored-by: Jamie Gaskins <jgaskins@hey.com>
Co-authored-by: Jamie Gaskins <jgaskins@hey.com>
Going through the RFC again, there are a couple more edge cases missing. I'll address that later. |
I don't know if that's expected, but CI is failing for these specs already. |
Yes, it's because I applied one of the suggestions which only changed the spec, not the implementation. |
it "start >= file_size" do | ||
headers = HTTP::Headers{"Range" => "bytes=12-"} | ||
response = handle HTTP::Request.new("GET", "/range.txt", headers) | ||
|
||
response.status_code.should eq(416) | ||
response.headers["Content-Range"]?.should eq "bytes */12" | ||
end |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Then this should also be grouped into the above describe
, because if that refers to syntactic validity, it would be confusing if this describe
refers to semantic invalidity
When fetching a large file, such as a media file, that gets picked up by
HTTP::StaticFileHandler
, the handler copies the entire file to the response, even if the client is just trying to fetch metadata about the file. This can make that metadata request take a lot longer and lead to extra bandwidth being used, which can be expensive.At the moment, this PR does not handle the
If-Range
header, which it looks like is used to ensure that multipleRange
requests to get small chunks of a file will return chunks of the same file, falling back to returning the entire file in case of anEtag
orLast-Modified
mismatch.Fixes #8954