Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for HTTP Range to FileResponse #2697

Merged
merged 4 commits into from
Sep 23, 2024

Conversation

Kludex
Copy link
Member

@Kludex Kludex commented Sep 22, 2024

Inspired by Baize and copied from my own implementation: https://github.com/Kludex/file-response .

Copy link
Member

@abersheeran abersheeran left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hahaha, I like this.

@abersheeran abersheeran merged commit 69ed26a into master Sep 23, 2024
6 checks passed
@abersheeran abersheeran deleted the add-http-range-to-fileresponse branch September 23, 2024 01:20
@trim21
Copy link
Contributor

trim21 commented Sep 23, 2024

We should pre-compile regex pattern. Althrough re has a built-in cache, but it has size limit and for project using many regex it still may be invalidated and pattern get re-compiled.

@Kludex
Copy link
Member Author

Kludex commented Sep 23, 2024

We should pre-compile regex pattern. Althrough re has a built-in cache, but it has size limit and for project using many regex it still may be invalidated and pattern get re-compiled.

PR welcome. 👍

file_size: int,
send_header_only: bool,
) -> None:
boundary = "".join(random_choices("abcdefghijklmnopqrstuvwxyz0123456789", k=13))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm a bit concerned that this uses predictable randomness

I'd be tempted to go for 19 characters using SystemRandom.choices to get 96 bits of entropy

Copy link
Contributor

@trim21 trim21 Sep 23, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why join(random_choices) anyway?

It's also slower than secrets.token_hex(6/7/13) unless 13 has some special meaning here.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do other frameworks do something different from this?

Would you like to open a PR for it?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

13 has some special meaning here

Nothing special. It's just a random number I picked, and the spec doesn't make any specific requirements on this length.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm a bit concerned that this uses predictable randomness

I'd be tempted to go for 19 characters using SystemRandom.choices to get 96 bits of entropy

This is predicted to be secure...it does not serve any cryptographic purpose.

Copy link
Member

@graingert graingert Sep 23, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Urllib3 does this https://github.com/urllib3/urllib3/blob/2458bfcd3dacdf6c196e98d077fc6bb02a5fc1df/src/urllib3/filepost.py#L26 (it's for form data but same sort of concept)

nginx is just totally weird https://github.com/nginx/nginx/blob/b1e07409b1071564e8dd8c70436cba5da19a1575/src/core/ngx_file.c#L17 it uses a random number generator based on successive additions of 123456

Apache is bizarre also, it's using predictable randomness, but it's a constant defined each time config is loaded https://github.com/apache/httpd/blob/64deb516d6a3b8d872dab9d27a1f0cba17548190/modules/http/http_core.c#L350

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, urllib3 implementation is just equivalent to secrets.token_hex()

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I'd probably go for secrets.token_hex()

async with await anyio.open_file(self.path, mode="rb") as file:
for start, end in ranges:
await file.seek(start)
chunk = await file.read(min(self.chunk_size, end - start))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does it send all content within the range?

Copy link
Member Author

@Kludex Kludex Sep 23, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure I understand the question... Could you rephrase it?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IIUC, this statement may send chunk with self.chunk_size, not the full content between end - start, and I didn't find similar logic to that in handle_single_range(pasted below) to loop until no more body to send:

while more_body:
    chunk = await file.read(min(self.chunk_size, end - start))
    start += len(chunk)
    more_body = len(chunk) == self.chunk_size and start < end
    await send({"type": "http.response.body", "body": chunk, "more_body": more_body})

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we may benefit from some comments/docstrings around here. But I don't see anything wrong?

Could you try to come up with a test scenario that you think it would be problematic here?

This is a basic test for multiple ranges (to help):

def test_file_response_merge_ranges(file_response_client: TestClient) -> None:
    response = file_response_client.get("/", headers={"Range": "bytes=0-100, 50-200"})
    assert response.status_code == 206
    assert response.headers["content-length"] == "201"
    assert response.headers["content-range"] == f"bytes 0-200/{len(README.encode('utf8'))}"

Copy link
Contributor

@frostming frostming Sep 23, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, the above test works but needs to adjust the chunk_size to less than 100.

I thought you would immediately see my points. This statement

chunk = await file.read(min(self.chunk_size, end - start))

may not send full content if the chunk_size is smaller, and you didn't send the rest anywhere in this branch.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh... I see your point.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PR is welcome here, otherwise I'll work on it later.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @frostming 🙏

nixroxursox pushed a commit to nixroxursox/starlette that referenced this pull request Sep 30, 2024
* Add support for HTTP Range to `FileResponse`

* Remove pragmas

* Single line
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Support for HTTP range requests (for FileResponse)
5 participants