Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Size snapshot #46340

Closed
wants to merge 5 commits into from
Closed

Size snapshot #46340

wants to merge 5 commits into from

Conversation

sunfishcode
Copy link
Member

@sunfishcode sunfishcode commented Nov 28, 2017

This introduces Read::size_snapshot() and it modifies read_to_end to allocate exactly the number of required bytes up front, and read exactly the number of bytes at once, when possible.

size_snapshot() is not treated as a hint. In the common case, this allows it to do a single read syscall (compared to two in #45928, with the second one to confirm that EOF is reached), which roughly makes up for doing an extra syscall to get the size, and to request exactly the right size buffer (compared to requesting the size + 1 in #45928).

The main downside of this patch compared to #45928 is that it is possible to observe subtle behavior differences due to obtaining the size up front rather than waiting until read returns 0 in some specialized cases. That said, they all involve read_to_end racing with some independent writer, and the new consequences are just that read_to_end might return more of the old data than it should, or less of the new data than it should, if one knows enough about the underlying OS to reason about what "should" be possible.

This patch also depends on metadata() being reliable and not memoized. On OS's I'm familiar with it's reliable (when the size isn't 0). And in the current codebase, it's not memoized.

EDITED: #45928 now requests size_hint + 1 bytes, fixing the main performance difference.

@rust-highfive
Copy link
Collaborator

r? @aidanhs

(rust_highfive has picked a reviewer for you, use r? to override)

@SimonSapin
Copy link
Contributor

As mentioned in #45928, a file could grow after its size was obtained and before the final read.

@sunfishcode
Copy link
Member Author

If the file grows after its size is obtained, it's already a race whether read_to_end sees the extra bytes, because if gets to the old EOF before the write happens, it'll stop and it won't see the write.

@kennytm kennytm added the S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. label Nov 28, 2017
@aidanhs
Copy link
Member

aidanhs commented Nov 28, 2017

r? @sfackler

@rust-highfive rust-highfive assigned sfackler and unassigned aidanhs Nov 28, 2017
@SimonSapin
Copy link
Contributor

Except for slightly different implementation strategy this is mostly the same as #45928. Both should be discussed together.

@aidanhs
Copy link
Member

aidanhs commented Dec 7, 2017

ping @sfackler for review! Also pinging on IRC

@aidanhs
Copy link
Member

aidanhs commented Dec 7, 2017

Oh, also @sunfishcode I see there are merge conflicts to resolve. I'll keep this as to-review though since it sounds like this may be a multipart discussion.

…apshot

In the case of reading a normal file, this creates a buffer of the desired
size up front so it avoids over-allocation and re-allocation.

It adds an fstat syscall, but in the case of reading a normal file, this
eliminates the final 0-byte read syscall at the end.

This also changes the default read size pattern from 32,64,128,256,... to
32,32,64,128,... so that default allocation sizes and file reads are aligned
at power-of-two boundaries.
@sunfishcode
Copy link
Member Author

Merge conflicts now resolved.

@shepmaster shepmaster added S-waiting-on-team Status: Awaiting decision from the relevant subteam (see the T-<team> label). and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Dec 9, 2017
@shepmaster
Copy link
Member

I'm going to make an executive decision and mark this and #45928 as waiting on a team decision as it seems that they cannot both exist together. Let me know if you disagree.

@carols10cents carols10cents added the T-libs-api Relevant to the library API team, which will review and decide on the PR/issue. label Jan 9, 2018
@sfackler
Copy link
Member

The @rust-lang/libs team talked about this and #45928 during our triage meeting and decided that we're not looking to expand the surface area of Read in this way just yet. Instead, we can modify the implementation of fs::read to do a length lookup which should cover 99% of the use cases for this kind of thing.

@sfackler sfackler closed this Jan 10, 2018
kennytm added a commit to kennytm/rust that referenced this pull request Jan 11, 2018
Pre-allocate in fs::read and fs::read_string

This is a simpler alternative to rust-lang#46340 and rust-lang#45928, as requested by the libs team.
kennytm added a commit to kennytm/rust that referenced this pull request Jan 11, 2018
Pre-allocate in fs::read and fs::read_string

This is a simpler alternative to rust-lang#46340 and rust-lang#45928, as requested by the libs team.
kennytm added a commit to kennytm/rust that referenced this pull request Jan 12, 2018
Pre-allocate in fs::read and fs::read_string

This is a simpler alternative to rust-lang#46340 and rust-lang#45928, as requested by the libs team.
@sunfishcode sunfishcode deleted the size_snapshot branch July 16, 2021 00:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
S-waiting-on-team Status: Awaiting decision from the relevant subteam (see the T-<team> label). T-libs-api Relevant to the library API team, which will review and decide on the PR/issue.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants