Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tracking Issue for std::io::read_to_string #80218

Closed
4 tasks done
camelid opened this issue Dec 20, 2020 · 15 comments · Fixed by #100337
Closed
4 tasks done

Tracking Issue for std::io::read_to_string #80218

camelid opened this issue Dec 20, 2020 · 15 comments · Fixed by #100337
Assignees
Labels
A-io Area: `std::io`, `std::fs`, `std::net` and `std::path` C-tracking-issue Category: An issue tracking the progress of sth. like the implementation of an RFC disposition-merge This issue / PR is in PFCP or FCP with a disposition to merge it. finished-final-comment-period The final comment period is finished for this PR / Issue. Libs-Tracked Libs issues that are tracked on the team's project board. T-libs-api Relevant to the library API team, which will review and decide on the PR/issue.

Comments

@camelid
Copy link
Member

camelid commented Dec 20, 2020

Feature gate: #![feature(io_read_to_string)]

This is a tracking issue for the std::io::read_to_string function.

The equivalent of std::fs::read_to_string, but generalized to all
Read impls.

As the documentation on std::io::read_to_string says, the advantage of
this function is that it means you don't have to create a variable first
and it provides more type safety since you can only get the buffer out
if there were no errors. If you use Read::read_to_string, you have to
remember to check whether the read succeeded because otherwise your
buffer will be empty.

It's friendlier to newcomers and better in most cases to use an explicit
return value instead of an out parameter.

Public API

pub fn read_to_string<R: Read>(reader: R) -> Result<String>;

Steps / History

Unresolved Questions

@camelid camelid changed the title Tracking Issue for io::read_to_string Tracking Issue for std::io::read_to_string Dec 20, 2020
@camelid camelid added C-tracking-issue Category: An issue tracking the progress of sth. like the implementation of an RFC T-libs-api Relevant to the library API team, which will review and decide on the PR/issue. A-io Area: `std::io`, `std::fs`, `std::net` and `std::path` labels Dec 20, 2020
@m-ou-se m-ou-se added the Libs-Tracked Libs issues that are tracked on the team's project board. label Jan 6, 2021
@kartva
Copy link
Contributor

kartva commented Jun 4, 2021

What is the status on stabilizing this addition to the library?

@fee1-dead
Copy link
Member

fee1-dead commented Jun 23, 2021

@m-ou-se / @rust-lang/libs

Can we start an FCP?

@m-ou-se
Copy link
Member

m-ou-se commented Jul 5, 2021

Thinking a bit about the signature: Considering Read is already implemented for &mut impl Read, shouldn't this function just take an R instead of &mut R?

@camelid
Copy link
Member Author

camelid commented Jul 9, 2021

Thinking a bit about the signature: Considering Read is already implemented for &mut impl Read, shouldn't this function just take an R instead of &mut R?

I modeled it after Read::read_to_string, which takes an &mut self, but that has to use &mut self because it's defined as part of the trait. I don't know enough about the idiom for std here, so if that works, it seems okay to me.

@jkugelman
Copy link
Contributor

I'm working on a related PR #89582 to have File::read_to_string enlarge the string buffer before reading to accommodate the full file size. A nice side effect is that it will make io::read_to_string more performant when the reader is a file.

If it lands do you think there's anything in this comment that should be updated?

Performance

The downside of this function's increased ease of use and type safety is that it gives you less control over performance. For example, you can't pre-allocate memory like you can using String::with_capacity and Read::read_to_string. Also, you can't re-use the buffer if an error occurs while reading.

In many cases, this function's performance will be adequate and the ease of use and type safety tradeoffs will be worth it. However, there are cases where you need more control over performance, and in those cases you should definitely use Read::read_to_string directly.

@camelid
Copy link
Member Author

camelid commented Oct 6, 2021

I'm working on a related PR #89582 to have File::read_to_string enlarge the string buffer before reading to accommodate the full file size. A nice side effect is that it will make io::read_to_string more performant when the reader is a file.

That's great; thanks for working on this!

If it lands do you think there's anything in this comment that should be updated?

Perhaps we could add another paragraph at the end like this:

Note that in some special cases, such as when reading files, this function will pre-allocate memory based on the size of the input is reading. In those cases, the performance should be as good as if you had used Read::read_to_string() with a manually pre-allocated buffer.

@jkugelman
Copy link
Contributor

Done.

@jkugelman
Copy link
Contributor

jkugelman commented Oct 6, 2021

It seems like std::io::read_to_string should be paired with a std::io::read_to_end method to match the other pairings in the standard library:

Would it be read or read_to_end? I'd say read_to_end to match the Read trait since the reader might not be at the beginning.

jkugelman added a commit to jkugelman/rust that referenced this issue Oct 7, 2021
Reading a file into an empty vector or string buffer can incur
unnecessary `read` syscalls and memory re-allocations as the buffer
"warms up" and grows to its final size. This is perhaps a necessary evil
with generic readers, but files can be read in smarter by checking the
file size and reserving that much capacity.

`std::fs::read` and `read_to_string` already perform this optimization:
they open the file, reads its metadata, and call `with_capacity` with
the file size. This ensures that the buffer does not need to be resized
and an initial string of small `read` syscalls.

However, if a user opens the `File` themselves and calls
`file.read_to_end` or `file.read_to_string` they do not get this
optimization.

```rust
let mut buf = Vec::new();
file.read_to_end(&mut buf)?;
```

I searched through this project's codebase and even here are a *lot* of
examples of this. They're found all over in unit tests, which isn't a
big deal, but there are also several real instances in the compiler and
in Cargo. I've documented the ones I found in a comment here:

rust-lang#89516 (comment)

Most telling, the `Read` trait and the `read_to_end` method both show
this exact pattern as examples of how to use readers. What this says to
me is that this shouldn't be solved by simply fixing the instances of it
in this codebase. If it's here it's certain to be prevalent in the wider
Rust ecosystem.

To that end, this commit adds specializations of `read_to_end` and
`read_to_string` directly on `File`. This way it's no longer a minor
footgun to start with an empty buffer when reading a file in.

A nice side effect of this change is that code that accesses a `File` as
a bare `Read` constraint or via a `dyn Read` trait object will benefit.
For example, this code from `compiler/rustc_serialize/src/json.rs`:

```rust
pub fn from_reader(rdr: &mut dyn Read) -> Result<Json, BuilderError> {
    let mut contents = Vec::new();
    match rdr.read_to_end(&mut contents) {
```

Related changes:

- I also added specializations to `BufReader` to delegate to
  `self.inner`'s methods. That way it can call `File`'s optimized
  implementations if the inner reader is a file.

- The private `std::io::append_to_string` function is now marked
  `unsafe`.

- `File::read_to_string` being more efficient means that the performance
  note for `io::read_to_string` can be softened. I've added @camelid's
  suggested wording from:

  rust-lang#80218 (comment)
bors added a commit to rust-lang-ci/rust that referenced this issue Oct 9, 2021
…r=joshtriplett

Optimize File::read_to_end and read_to_string

Reading a file into an empty vector or string buffer can incur unnecessary `read` syscalls and memory re-allocations as the buffer "warms up" and grows to its final size. This is perhaps a necessary evil with generic readers, but files can be read in smarter by checking the file size and reserving that much capacity.

`std::fs::read` and `std::fs::read_to_string` already perform this optimization: they open the file, reads its metadata, and call `with_capacity` with the file size. This ensures that the buffer does not need to be resized and an initial string of small `read` syscalls.

However, if a user opens the `File` themselves and calls `file.read_to_end` or `file.read_to_string` they do not get this optimization.

```rust
let mut buf = Vec::new();
file.read_to_end(&mut buf)?;
```

I searched through this project's codebase and even here are a *lot* of examples of this. They're found all over in unit tests, which isn't a big deal, but there are also several real instances in the compiler and in Cargo. I've documented the ones I found in a comment here:

rust-lang#89516 (comment)

Most telling, the documentation for both the `Read` trait and the `Read::read_to_end` method both show this exact pattern as examples of how to use readers. What this says to me is that this shouldn't be solved by simply fixing the instances of it in this codebase. If it's here it's certain to be prevalent in the wider Rust ecosystem.

To that end, this commit adds specializations of `read_to_end` and `read_to_string` directly on `File`. This way it's no longer a minor footgun to start with an empty buffer when reading a file in.

A nice side effect of this change is that code that accesses a `File` as `impl Read` or `dyn Read` will benefit. For example, this code from `compiler/rustc_serialize/src/json.rs`:

```rust
pub fn from_reader(rdr: &mut dyn Read) -> Result<Json, BuilderError> {
    let mut contents = Vec::new();
    match rdr.read_to_end(&mut contents) {
```

Related changes:

- I also added specializations to `BufReader` to delegate to `self.inner`'s methods. That way it can call `File`'s optimized  implementations if the inner reader is a file.

- The private `std::io::append_to_string` function is now marked `unsafe`.

- `File::read_to_string` being more efficient means that the performance note for `io::read_to_string` can be softened. I've added `@camelid's` suggested wording from rust-lang#80218 (comment).

r? `@joshtriplett`
camelid added a commit to camelid/rust that referenced this issue Jan 13, 2022
`@m-ou-se` [realized][1] that because `Read` is implemented for `&mut impl
Read`, there's no need to take `&mut` in `io::read_to_string`.

Removing the `&mut` from the signature allows users to remove the `&mut`
from their calls (and thus pass an owned reader) if they don't use the
reader later.

[1]: rust-lang#80218 (comment)
@camelid
Copy link
Member Author

camelid commented Jan 14, 2022

Now that the PR to remove the &mut from the signature is in the merge queue, should a stabilization FCP be proposed?

matthiaskrgr added a commit to matthiaskrgr/rust that referenced this issue Jan 15, 2022
…ou-se

Remove `&mut` from `io::read_to_string` signature

``@m-ou-se`` [realized][1] that because `Read` is implemented for `&mut impl
Read`, there's no need to take `&mut` in `io::read_to_string`.

Removing the `&mut` from the signature allows users to remove the `&mut`
from their calls (and thus pass an owned reader) if they don't use the
reader later.

r? `@m-ou-se`

[1]: rust-lang#80218 (comment)
@camelid
Copy link
Member Author

camelid commented May 20, 2022

I will open a stabilization PR next week or so.

@camelid
Copy link
Member Author

camelid commented Jun 21, 2022

Actually, I think a libs team FCP needs to be completed first. @m-ou-se could you kick that off? (Or do I need to do something before that?)

@camelid camelid self-assigned this Jun 24, 2022
@joshtriplett
Copy link
Member

@rfcbot merge

@rfcbot
Copy link

rfcbot commented Jul 12, 2022

Team member @joshtriplett has proposed to merge this. The next step is review by the rest of the tagged team members:

No concerns currently listed.

Once a majority of reviewers approve (and at most 2 approvals are outstanding), this will enter its final comment period. If you spot a major issue that hasn't been raised at any point in this process, please speak up!

See this document for info about what commands tagged team members can give me.

@rfcbot rfcbot added proposed-final-comment-period Proposed to merge/close by relevant subteam, see T-<team> label. Will enter FCP once signed off. disposition-merge This issue / PR is in PFCP or FCP with a disposition to merge it. labels Jul 12, 2022
@rfcbot rfcbot added the final-comment-period In the final comment period and will be merged soon unless new substantive objections are raised. label Aug 9, 2022
@rfcbot
Copy link

rfcbot commented Aug 9, 2022

🔔 This is now entering its final comment period, as per the review above. 🔔

@rfcbot rfcbot removed the proposed-final-comment-period Proposed to merge/close by relevant subteam, see T-<team> label. Will enter FCP once signed off. label Aug 9, 2022
@rfcbot rfcbot added finished-final-comment-period The final comment period is finished for this PR / Issue. to-announce Announce this issue on triage meeting and removed final-comment-period In the final comment period and will be merged soon unless new substantive objections are raised. labels Aug 19, 2022
@rfcbot
Copy link

rfcbot commented Aug 19, 2022

The final comment period, with a disposition to merge, as per the review above, is now complete.

As the automated representative of the governance process, I would like to thank the author for their work and everyone else who contributed.

This will be merged soon.

Dylan-DPC added a commit to Dylan-DPC/rust that referenced this issue Aug 27, 2022
…g, r=JohnTitor

Stabilize `std::io::read_to_string`

Closes rust-lang#80218. 🎉

**Blocked on FCP finishing**
matthiaskrgr added a commit to matthiaskrgr/rust that referenced this issue Aug 28, 2022
…g, r=JohnTitor

Stabilize `std::io::read_to_string`

Closes rust-lang#80218. 🎉

This PR stabilizes the `std::io::read_to_string` function, with the following public API:

```rust
pub fn read_to_string<R: Read>(reader: R) -> Result<String>;
```

It's analogous to `std::fs::read_to_string` for files, but it works on anything that implements `io::Read`, including `io::stdin()`.

See the tracking issue (rust-lang#80218) or documentation for details.
Dylan-DPC added a commit to Dylan-DPC/rust that referenced this issue Aug 29, 2022
…g, r=JohnTitor

Stabilize `std::io::read_to_string`

Closes rust-lang#80218. 🎉

This PR stabilizes the `std::io::read_to_string` function, with the following public API:

```rust
pub fn read_to_string<R: Read>(reader: R) -> Result<String>;
```

It's analogous to `std::fs::read_to_string` for files, but it works on anything that implements `io::Read`, including `io::stdin()`.

See the tracking issue (rust-lang#80218) or documentation for details.
@bors bors closed this as completed in 9f7e20b Aug 29, 2022
@apiraino apiraino removed the to-announce Announce this issue on triage meeting label Sep 8, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-io Area: `std::io`, `std::fs`, `std::net` and `std::path` C-tracking-issue Category: An issue tracking the progress of sth. like the implementation of an RFC disposition-merge This issue / PR is in PFCP or FCP with a disposition to merge it. finished-final-comment-period The final comment period is finished for this PR / Issue. Libs-Tracked Libs issues that are tracked on the team's project board. T-libs-api Relevant to the library API team, which will review and decide on the PR/issue.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

8 participants