Handle `Range` requests in `HTTP::StaticFileHandler` #12886

jgaskins · 2022-12-31T00:15:02Z

When fetching a large file, such as a media file, that gets picked up by HTTP::StaticFileHandler, the handler copies the entire file to the response, even if the client is just trying to fetch metadata about the file. This can make that metadata request take a lot longer and lead to extra bandwidth being used, which can be expensive.

At the moment, this PR does not handle the If-Range header, which it looks like is used to ensure that multiple Range requests to get small chunks of a file will return chunks of the same file, falling back to returning the entire file in case of an Etag or Last-Modified mismatch.

Fixes #8954

straight-shoota

I'd like to see some additional specs for an empty file and where start byte is in range, but the end byte is not (I suppose it should clamp to the end of data?).

The Multipart.parse method is an independent feature. Could you extract that into a separate PR, please? We should try to separate concerns.

straight-shoota · 2022-12-31T09:20:55Z

src/http/server/handlers/static_file_handler.cr

+        if (range = context.request.headers["Range"]?) && (match = range.match(/bytes=([0-9\-,]+)/))
+          ranges = match[1].split(',').map do |range|
+            start, finish = range.strip.split('-', 2)


This parsing mechanism looks like it doesn't do proper error handling. It breaks on invalid range values such as bytes=1 or bytes=1-2-3.
Also, there is some duplication between matching the overall (incomplete) format via regex and then splitting into substrings.

This parsing mechanism looks like it doesn't do proper error handling. It breaks on invalid range values such as bytes=1 or bytes=1-2-3.

This is not actionable feedback. Define “breaks”. What do you think the expected and actual behaviors are here?

Also, there is some duplication between matching the overall (incomplete) format via regex and then splitting into substrings.

What do you mean here?

Additionally, how likely is this header to be malformed? How critical is it to get this precisely correct before the feature is merged at all? If it has to be perfect before it gets merged I’ll just close the PR and publish it as a shard.

bytes=1 raises IndexError and bytes=1-2-3 is interpreted the same as bytes=1-. I think both kind of errors should cause an HTTP error response.

Also, there is some duplication between matching the overall (incomplete) format via regex and then splitting into substrings.

What do you mean here?

It's just that validation is essentially split between the regex and the string methods. So some of that is redundant. For example the regex would be unnecessary, if the parsing was bit more precise.

Additionally, how likely is this header to be malformed? How critical is it to get this precisely correct before the feature is merged at all? If it has to be perfect before it gets merged I’ll just close the PR and publish it as a shard.

I doesn't has to be perfect. But the basics should be covered and I think proper error handling is necessary for that.
As said below, I can take care of finishing it up and I expect this shouldn't take long to be merged.
The base implementation looks solid, just needs some adjustments to handle edge cases / errors properly.

src/http/server/handlers/static_file_handler.cr

jgaskins · 2022-12-31T14:51:46Z

I'd like to see some additional specs for an empty file and where start byte is in range, but the end byte is not (I suppose it should clamp to the end of data?).

I was sure I had written a test for omitting the end of the range. Looks like it got deleted or rewritten into one of the ones I did end up committing. But it does clamp to the end of the file as written.

The Multipart.parse method is an independent feature. Could you extract that into a separate PR, please? We should try to separate concerns.

It was required to be able to parse the response for multiple ranges. This PR can’t exist without it.

straight-shoota · 2022-12-31T14:55:19Z

It was required to be able to parse the response for multiple ranges. This PR can’t exist without it.

This PR can depend on the other one.
It's only used for the specs, so you could alternatively turn it a spec helper for this PR in order to merge it independently.

jgaskins · 2023-01-01T17:13:09Z

This PR can depend on the other one.

If you would like to go to all that effort yourself of creating multiple PRs and wiring up their dependencies, you’re more than welcome to, but I will not be splitting PRs for the sake of accounting that ultimately makes no difference.

I’ve already spent over a day on this PR already between trying to understand how these requests work, implementing the feature itself, and doing additional testing with a browser, not counting the time I spent on a previous attempt at this several months ago.

straight-shoota · 2023-01-01T17:25:30Z

Sure, no problem. I can take care of these administrative aspects. Thanks for all the work you have already put into it.

I realize that it seems a bit cumbersome to go the extra steps, but we've realized that it's worth it in the long run for the project to have small and self-contained changes. And we'll take the burdon of this extra work (it's not that much actually, though).

straight-shoota · 2023-01-01T17:37:21Z

I extracted the implementation of MIME::Multipart.parse into #12890. This PR should be merged first, then this one is good to go on top.

asterite · 2023-01-02T00:01:12Z

I said it before and I'll say it again: the Ruby core team happily merges incomplete or almost good PRs. Then they introduce extra commits (here we'll use extra PRs) with those additional changes. The end result is that contributory are happier, they don't get frustrated, they don't get to do extra work, and as a bonus, they don't get the final word on how things should be. The last one might be a downside to them, but I'm sure the happiness is of contributing to is greater (I contributed to Ruby a couple of times and that was my experience)

asterite · 2023-01-02T00:02:08Z

Like, instead of saying "please do this and that" it's so much easier to do it ourselves with exactly the code that we have in mind.

asterite · 2023-01-02T00:10:57Z

That said, everything is depends on the chosen process. If contribution is a friction, providing a diff of the intended change could also work. That way the rest of the work will surely depend on the core team.

Happy new year everyone!

…dler

Also extract implementation into helper method

Co-authored-by: Jamie Gaskins <jgaskins@hey.com>

straight-shoota · 2023-02-01T18:44:17Z

Going through the RFC again, there are a couple more edge cases missing. I'll address that later.

beta-ziliani · 2023-02-01T19:05:33Z

I don't know if that's expected, but CI is failing for these specs already.

straight-shoota · 2023-02-01T19:19:46Z

Yes, it's because I applied one of the suggestions which only changed the spec, not the implementation.

src/http/server/handlers/static_file_handler.cr

spec/std/http/server/handlers/static_file_handler_spec.cr

…dler

src/http/server/handlers/static_file_handler.cr

spec/std/http/server/handlers/static_file_handler_spec.cr

HertzDevil · 2023-03-27T17:03:43Z

spec/std/http/server/handlers/static_file_handler_spec.cr

+      it "start >= file_size" do
+        headers = HTTP::Headers{"Range" => "bytes=12-"}
+        response = handle HTTP::Request.new("GET", "/range.txt", headers)
+
+        response.status_code.should eq(416)
+        response.headers["Content-Range"]?.should eq "bytes */12"
+      end


Then this should also be grouped into the above describe, because if that refers to syntactic validity, it would be confusing if this describe refers to semantic invalidity

jgaskins added 2 commits December 30, 2022 17:41

Handle Range requests in HTTP::StaticFileHandler

1e53790

Optimize single-range requests

9423983

Blacksmoke16 added kind:feature topic:stdlib:networking labels Dec 31, 2022

Add docs/spec for MIME::Multipart.parse(response)

65b6f26

straight-shoota reviewed Dec 31, 2022

View reviewed changes

straight-shoota mentioned this pull request Jan 1, 2023

Add MIME::Multipart.parse(HTTP::Client::Response, &) #12890

Merged

straight-shoota changed the base branch from master to feature/multipart-parse January 1, 2023 17:36

straight-shoota marked this pull request as draft January 1, 2023 17:36

straight-shoota deleted the branch crystal-lang:master January 25, 2023 18:50

straight-shoota closed this Jan 25, 2023

straight-shoota reopened this Jan 25, 2023

straight-shoota changed the base branch from feature/multipart-parse to master January 25, 2023 19:12

straight-shoota marked this pull request as ready for review January 31, 2023 22:07

straight-shoota marked this pull request as draft February 1, 2023 14:21

straight-shoota force-pushed the support-range-requests-for-static-file-handler branch from 1641e22 to 65b6f26 Compare February 1, 2023 14:22

straight-shoota added 4 commits February 1, 2023 15:24

Merge branch 'master' into support-range-requests-for-static-file-han…

3efeed9

…dler

Refactor range boundaries into Range

a82e4ab

Avoid unnecesary allocation of IO::Sized

174d093

Improve 416 error handling

23c71da

straight-shoota marked this pull request as ready for review February 1, 2023 14:45

Fix edge cases in range parsing

985a101

Also extract implementation into helper method

straight-shoota and others added 2 commits February 1, 2023 19:08

Update spec/std/http/server/handlers/static_file_handler_spec.cr

8521c47

Co-authored-by: Jamie Gaskins <jgaskins@hey.com>

Update src/http/server/handlers/static_file_handler.cr

0fdbbdb

Co-authored-by: Jamie Gaskins <jgaskins@hey.com>

straight-shoota marked this pull request as draft February 1, 2023 18:47

Further refinements

dd50e5e

straight-shoota marked this pull request as ready for review February 1, 2023 21:44

HertzDevil reviewed Feb 15, 2023

View reviewed changes

src/http/server/handlers/static_file_handler.cr Outdated Show resolved Hide resolved

Fix for suffix range exceeds byte size

279acdd

Sija reviewed Feb 15, 2023

View reviewed changes

src/http/server/handlers/static_file_handler.cr Show resolved Hide resolved

HertzDevil reviewed Feb 15, 2023

View reviewed changes

src/http/server/handlers/static_file_handler.cr Outdated Show resolved Hide resolved

HertzDevil reviewed Feb 15, 2023

View reviewed changes

spec/std/http/server/handlers/static_file_handler_spec.cr Show resolved Hide resolved

HertzDevil reviewed Feb 15, 2023

View reviewed changes

spec/std/http/server/handlers/static_file_handler_spec.cr Outdated Show resolved Hide resolved

straight-shoota self-assigned this Feb 28, 2023

straight-shoota added 5 commits February 28, 2023 21:07

Fix range begin at file end

0c4ff14

Add support for empty file

737a9c5

Add expectations for Content-Range header

ca3e1d8

Indicate spec failure with fail

c598521

Merge branch 'master' into support-range-requests-for-static-file-han…

0728f2a

…dler

HertzDevil reviewed Mar 27, 2023

View reviewed changes

src/http/server/handlers/static_file_handler.cr Show resolved Hide resolved

src/http/server/handlers/static_file_handler.cr Outdated Show resolved Hide resolved

straight-shoota added 2 commits March 27, 2023 18:43

Fix bytes=-

45dc902

Fix satisfiability for empty file

7eb8f66

HertzDevil reviewed Mar 27, 2023

View reviewed changes

src/http/server/handlers/static_file_handler.cr Show resolved Hide resolved

spec/std/http/server/handlers/static_file_handler_spec.cr Show resolved Hide resolved

HertzDevil reviewed Mar 27, 2023

View reviewed changes

straight-shoota added 2 commits March 27, 2023 20:57

Reorder specs

d289c44

Fix edge cases with empty file

28e08eb

HertzDevil approved these changes Mar 27, 2023

View reviewed changes

straight-shoota added this to the 1.8.0 milestone Mar 27, 2023

straight-shoota merged commit 61edd54 into crystal-lang:master Mar 28, 2023

This pull request was closed.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Handle `Range` requests in `HTTP::StaticFileHandler` #12886

Handle `Range` requests in `HTTP::StaticFileHandler` #12886

jgaskins commented Dec 31, 2022

straight-shoota left a comment

straight-shoota Dec 31, 2022

jgaskins Jan 1, 2023

jgaskins Jan 1, 2023

straight-shoota Jan 1, 2023 •

edited

Loading

straight-shoota Jan 1, 2023

jgaskins commented Dec 31, 2022 •

edited

Loading

straight-shoota commented Dec 31, 2022

jgaskins commented Jan 1, 2023

straight-shoota commented Jan 1, 2023

straight-shoota commented Jan 1, 2023

asterite commented Jan 2, 2023

asterite commented Jan 2, 2023

asterite commented Jan 2, 2023

straight-shoota commented Feb 1, 2023 •

edited

Loading

beta-ziliani commented Feb 1, 2023

straight-shoota commented Feb 1, 2023

HertzDevil Mar 27, 2023 •

edited

Loading

Handle Range requests in HTTP::StaticFileHandler #12886

Handle Range requests in HTTP::StaticFileHandler #12886

Conversation

jgaskins commented Dec 31, 2022

straight-shoota left a comment

Choose a reason for hiding this comment

straight-shoota Dec 31, 2022

Choose a reason for hiding this comment

jgaskins Jan 1, 2023

Choose a reason for hiding this comment

jgaskins Jan 1, 2023

Choose a reason for hiding this comment

straight-shoota Jan 1, 2023 • edited Loading

Choose a reason for hiding this comment

straight-shoota Jan 1, 2023

Choose a reason for hiding this comment

jgaskins commented Dec 31, 2022 • edited Loading

straight-shoota commented Dec 31, 2022

jgaskins commented Jan 1, 2023

straight-shoota commented Jan 1, 2023

straight-shoota commented Jan 1, 2023

asterite commented Jan 2, 2023

asterite commented Jan 2, 2023

asterite commented Jan 2, 2023

straight-shoota commented Feb 1, 2023 • edited Loading

beta-ziliani commented Feb 1, 2023

straight-shoota commented Feb 1, 2023

HertzDevil Mar 27, 2023 • edited Loading

Choose a reason for hiding this comment

Handle `Range` requests in `HTTP::StaticFileHandler` #12886

Handle `Range` requests in `HTTP::StaticFileHandler` #12886

straight-shoota Jan 1, 2023 •

edited

Loading

jgaskins commented Dec 31, 2022 •

edited

Loading

straight-shoota commented Feb 1, 2023 •

edited

Loading

HertzDevil Mar 27, 2023 •

edited

Loading