io: rewrite read_to_end and read_to_string #2560

Darksonn · 2020-05-25T09:26:31Z

This PR is introduced to solve #2544.

This PR rewrites most of the read_to_end code. The new implementation changes the following main features about the implementation:

The ReadToEnd and ReadToString structs must now promise read_to_end_internal that the buffer has been prepared for use with the AsyncRead.
The AsyncRead is now provided a slice into the unused capacity of the vector, and the length is increased after the call to poll_read, instead of first increasing the length, reading, and then decreasing the length.

The first change introduces more unsafe blocks to propagate that promise upwards, but is necessary to fix #2544 unless we initialize it ourselves. Relying on prepare_uninitialized_buffer might not initialize it, so using that requires the containers to promise to call prepare.

The second change is not strictly necessary, but I think it makes it more obvious that we can never expose the caller to uninitialized memory in case of rouge panics or calls to mem::forget, as the bytes in the vector up to the length are always initialized at every stage in the algorithm. Additionally, increasing the length first is technically in violation with the safety section on set_len, and the documentation on Vec says this:

There is one case which we will not break, however: using unsafe code to write to the excess capacity, and then increasing the length to match, is always valid.

The second change also removes the need for a drop guard, as the vector length has the correct value in case of panics. One disadvantage is that creating a slice to the excess capacity is rather annoying.

The new implementation changes the behaviour such that set_len is called after poll_read. The motivation of this change is that it makes it much more obvious that a rouge panic wont give the caller access to a vector containing exposed uninitialized memory. The new implementation also makes sure to not zero memory twice. Additionally it makes the various implementations more consistent with each other regarding naming of variables, and whether we store how many bytes we have read, or how many were in the container originally. Fixes: tokio-rs#2544

Darksonn · 2020-06-11T09:11:29Z

I realized that my get_unused_capacity is just a reimplementation of the bytes_mut method in the BufMut impl for Vec, and replaced it with a call to that method.

carllerche

Thanks for this and sorry for the delay reviewing.

I read through this change and it looks correct to me. I asked others to also take a look. I believe @hawkw is also reviewing.

Because this touches safety, it would be nice if we could get some miri tests to cover this. This does not have to be a blocker for this PR to land.

hawkw

This looks good to me overall, though I agree with Carl that Miri tests would be nice to have.

I commented on a few minor things.

tokio/src/io/util/read_to_end.rs

hawkw · 2020-07-22T22:38:05Z

tokio/src/io/util/read_to_end.rs

+        // safety: There are two situations:
+        //
+        // 1. The AsyncRead has not overriden `prepare_uninitialized_buffer`.
+        //
+        // In this situation, the default implementation of that method will have
+        // zeroed the unused capacity. This means that setting the length will
+        // never expose uninitialized memory in the vector.
+        //
+        // Note that the assert! below ensures that we don't set the length to
+        // something larger than the capacity, which malicious implementors might
+        // try to have us do.
+        //
+        // 2. The AsyncRead has overriden `prepare_uninitialized_buffer`.
+        //
+        // In this case, the safety of the `set_len` call below relies on this
+        // guarantee from the documentation on `prepare_uninitialized_buffer`:
+        //
+        // > This function isn't actually unsafe to call but unsafe to implement.
+        // > The implementer must ensure that either the whole buf has been zeroed
+        // > or poll_read() overwrites the buffer without reading it and returns
+        // > correct value.
+        //
+        // Note that `prepare_uninitialized_buffer` is unsafe to implement, so this
+        // is a guarantee we can rely on in unsafe code.
+        //
+        // The assert!() is technically only necessary in the first case.


this comment is lovely, thank you <3

hawkw · 2020-07-22T23:20:29Z

tokio/src/io/util/read_to_end.rs

+    let slice: &mut [u8] = {
+        let ptr = unused_capacity.as_mut_ptr().cast::<u8>();
+        let len = unused_capacity.len();
+        std::slice::from_raw_parts_mut(ptr, len)


For my own understanding: this looks equivalent to transmute-ing the uninitialized slice to an initialized slice, but with more steps. My guess is that we're doing this instead because it is less powerful than transmute, and can't (for instance) reinterpret the length field as some entirely distinct type accidentally?

Yes, it's just a transmute. I'm doing it like this because using transmutes for pointer casts is usually rather strongly discouraged. At a bit more thought, we can also do it like this:

let slice: &mut [u8] = { &mut *(unused_capacity as *mut [MaybeUninit<u8>] as *mut [u8]) };

tokio/src/io/util/read_to_string.rs

Co-authored-by: Eliza Weisman <eliza@buoyant.io>

Darksonn · 2020-07-23T07:56:35Z

I ran into some trouble getting miri tests to work. Running the command from the CI config gives me a whole bunch of

error[E0433]: failed to resolve: could not find `test` in `tokio`
 --> tokio/tests/macros_select.rs:9:10
  |
9 | #[tokio::test]
  |          ^^^^ could not find `test` in `tokio`

Darksonn added 2 commits May 24, 2020 23:39

Merge read_to_string and io_read_to_string tests

672ecd8

Darksonn added C-maintenance Category: PRs that clean code up or issues documenting cleanup. A-tokio Area: The main tokio crate M-io Module: tokio/io labels May 25, 2020

Darksonn requested a review from taiki-e May 25, 2020 09:26

Darksonn self-assigned this May 25, 2020

taiki-e added the S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. label Jun 11, 2020

taiki-e changed the title ~~Read to end prepare~~ io: rewrite read_to_end and read_to_string Jun 11, 2020

use bytes_mut for get_unused_capacity

6a52938

Merge remote-tracking branch 'upstream/master' into read-to-end-prepare

8016efc

carllerche approved these changes Jul 22, 2020

View reviewed changes

hawkw approved these changes Jul 22, 2020

View reviewed changes

Apply suggestions from code review

a54bfe4

Co-authored-by: Eliza Weisman <eliza@buoyant.io>

taiki-e removed the S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. label Jul 24, 2020

Merge 'upstream/master' into 'Darksonn/read-to-end-prepare'

5fbf56e

carllerche merged commit 03b68f4 into tokio-rs:master Jul 28, 2020

Darksonn mentioned this pull request Nov 10, 2020

Prepare 0.2.23 release #3114

Merged

Darksonn deleted the read-to-end-prepare branch February 19, 2023 10:06

HyeonuPark mentioned this pull request Nov 19, 2023

JoinHandle variant that abort on drop. #6160

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

io: rewrite read_to_end and read_to_string #2560

io: rewrite read_to_end and read_to_string #2560

Darksonn commented May 25, 2020

Darksonn commented Jun 11, 2020

carllerche left a comment

hawkw left a comment

hawkw Jul 22, 2020

hawkw Jul 22, 2020

Darksonn Jul 23, 2020

Darksonn commented Jul 23, 2020 •

edited

Loading

io: rewrite read_to_end and read_to_string #2560

io: rewrite read_to_end and read_to_string #2560

Conversation

Darksonn commented May 25, 2020

Darksonn commented Jun 11, 2020

carllerche left a comment

Choose a reason for hiding this comment

hawkw left a comment

Choose a reason for hiding this comment

hawkw Jul 22, 2020

Choose a reason for hiding this comment

hawkw Jul 22, 2020

Choose a reason for hiding this comment

Darksonn Jul 23, 2020

Choose a reason for hiding this comment

Darksonn commented Jul 23, 2020 • edited Loading

Darksonn commented Jul 23, 2020 •

edited

Loading