Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add File constructors that nudge people toward buffered I/O #446

Closed
cuviper opened this issue Sep 23, 2024 · 7 comments
Closed

Add File constructors that nudge people toward buffered I/O #446

cuviper opened this issue Sep 23, 2024 · 7 comments
Labels
ACP-accepted API Change Proposal is accepted (seconded with no objections) api-change-proposal A proposal to add or alter unstable APIs in the standard libraries T-libs-api

Comments

@cuviper
Copy link
Member

cuviper commented Sep 23, 2024

Proposal

Problem statement

Using unbuffered I/O is a common performance footgun which has users asking why their Rust program is so slow.

Motivating examples or use cases

Issue #445 opened this discussion, "Attempt to help users avoid unbuffered small reads":

Users regularly come to the /r/rust subreddit looking for help when their Rust code doesn't perform as well as they expect. The second most common culprit (after debug vs release build) is doing unbuffered small reads from a file.

I'm wondering if there's anything we could do to help users catch or avert this more easily.

Solution sketch

Add two File constructors that wrap the result in buffers:

impl File {
    pub fn open_buffered<P: AsRef<Path>>(path: P) -> io::Result<io::BufReader<File>> {
        File::open(path).map(io::BufReader::new)
    }

    pub fn create_buffered<P: AsRef<Path>>(path: P) -> io::Result<io::BufWriter<File>> {
        File::create(path).map(io::BufWriter::new)
    }
}

These are trivial, but the main advantage is that such explicit methods will appear in auto-complete lists. Hopefully that will raise more programmer awareness that the basic open and create methods are not buffered, by omission, so they may consider if that's something they want.

Users with deeper OpenOptions will still need to decide which kind of buffering makes sense in their case. Similarly, if we want create_new_buffered, we would probably choose BufWriter, whereas create_new is opened in read-write mode.

Alternatives

  • The status quo is that File documents this:

    File does not buffer reads and writes. For efficiency, consider wrapping the file in a BufReader or BufWriter when performing many small read or write calls, unless unbuffered reads and writes are required.

    We could deem that sufficient.

  • Attempt to help users avoid unbuffered small reads #445 has several other ideas like debug assertions, new lints, or cargo tooling for performance footguns. Any of those could still be implemented in addition to this proposal.

What happens now?

This issue contains an API change proposal (or ACP) and is part of the libs-api team feature lifecycle. Once this issue is filed, the libs-api team will review open proposals as capability becomes available. Current response times do not have a clear estimate, but may be up to several months.

Possible responses

The libs team may respond in various different ways. First, the team will consider the problem (this doesn't require any concrete solution or alternatives to have been proposed):

  • We think this problem seems worth solving, and the standard library might be the right place to solve it.
  • We think that this probably doesn't belong in the standard library.

Second, if there's a concrete solution:

  • We think this specific solution looks roughly right, approved, you or someone else should implement this. (Further review will still happen on the subsequent implementation PR.)
  • We're not sure this is the right solution, and the alternatives or other materials don't give us enough information to be sure about that. Here are some questions we have that aren't answered, or rough ideas about alternatives we'd want to see discussed.
@cuviper cuviper added api-change-proposal A proposal to add or alter unstable APIs in the standard libraries T-libs-api labels Sep 23, 2024
@the8472
Copy link
Member

the8472 commented Sep 23, 2024

I have a bit of an issue with the motivation / framing. Basically it amounts to "people don't read the docs, so we move the dos into method names". This doesn't scale, so it shouldn't be the policy.

Perhaps a better angle is adding a convenience for the thing that's one commonly wants anyway.

Similarly, if we want create_new_buffered, we would probably choose BufWriter,

We could change Buf{Reader,Writer} to support both read and write buffering when nested. Or a full-duplex buffer that flushes on mode-switching.

@cuviper
Copy link
Member Author

cuviper commented Sep 23, 2024

I have a bit of an issue with the motivation / framing. Basically it amounts to "people don't read the docs, so we move the dos into method names". This doesn't scale, so it shouldn't be the policy.

Perhaps a better angle is adding a convenience for the thing that's one commonly wants anyway.

I think we can combine both angles, "people don't read the docs, so they miss the thing one commonly wants." It's mildly convenient for experienced Rustaceans too, but saving a single map() is not justified on its own IMO.

Similarly, if we want create_new_buffered, we would probably choose BufWriter,

We could change Buf{Reader,Writer} to support both read and write buffering when nested. Or a full-duplex buffer that flushes on mode-switching.

That's enough to make me not want to include create_new yet, but buffered R/W could be its own proposal.

@the8472
Copy link
Member

the8472 commented Sep 23, 2024

We can do better than a simple map, by turning allocation failures into errors instead of panics.

@cuviper
Copy link
Member Author

cuviper commented Sep 23, 2024

We can do better than a simple map, by turning allocation failures into errors instead of panics.

True, but then that should also be made possible manually, like Buf{Reader,Writer}::try_{new,with_capacity}.

@programmerjake
Copy link
Member

We could change Buf{Reader,Writer} to support both read and write buffering when nested. Or a full-duplex buffer that flushes on mode-switching.

a duplex buffer could share the buffer between read and write modes, improving memory use by a bit. it could also optionally forward bytes from the write buffer to reads, avoiding extra syscalls in some cases (wrapped types would have to opt-in, e.g. if a File was wrapping a FIFO then it shouldn't forward bytes)

@cuviper
Copy link
Member Author

cuviper commented Sep 23, 2024

by turning allocation failures into errors instead of panics.

Hmm, ideally the allocation should happen before the file open, so the filesystem isn't affected yet if it fails. I also imagine any try constructors for buffers would need to give you the file back on error, so you can deal with that somehow.

@the8472
Copy link
Member

the8472 commented Sep 24, 2024

We discussed this during today's libs-API meeting and we're on board with adding these methods.
It was also agreed that this decision should not be seen as a precedent of solving documentation problems (or the documentation not being read) by adding signpost methods.

@the8472 the8472 added the ACP-accepted API Change Proposal is accepted (seconded with no objections) label Sep 24, 2024
@cuviper cuviper closed this as completed Sep 24, 2024
bors added a commit to rust-lang-ci/rust that referenced this issue Sep 25, 2024
Add `File` constructors that return files wrapped with a buffer

In addition to the light convenience, these are intended to raise visibility that buffering is something you should consider when opening a file, since unbuffered I/O is a common performance footgun to Rust newcomers.

ACP: rust-lang/libs-team#446
Tracking Issue: rust-lang#130804
github-actions bot pushed a commit to rust-lang/miri that referenced this issue Sep 26, 2024
Add `File` constructors that return files wrapped with a buffer

In addition to the light convenience, these are intended to raise visibility that buffering is something you should consider when opening a file, since unbuffered I/O is a common performance footgun to Rust newcomers.

ACP: rust-lang/libs-team#446
Tracking Issue: #130804
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ACP-accepted API Change Proposal is accepted (seconded with no objections) api-change-proposal A proposal to add or alter unstable APIs in the standard libraries T-libs-api
Projects
None yet
Development

No branches or pull requests

3 participants