Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

io: rewrite read_to_end and read_to_string #2560

Merged
merged 6 commits into from
Jul 28, 2020

Conversation

Darksonn
Copy link
Contributor

This PR is introduced to solve #2544.

This PR rewrites most of the read_to_end code. The new implementation changes the following main features about the implementation:

  1. The ReadToEnd and ReadToString structs must now promise read_to_end_internal that the buffer has been prepared for use with the AsyncRead.
  2. The AsyncRead is now provided a slice into the unused capacity of the vector, and the length is increased after the call to poll_read, instead of first increasing the length, reading, and then decreasing the length.

The first change introduces more unsafe blocks to propagate that promise upwards, but is necessary to fix #2544 unless we initialize it ourselves. Relying on prepare_uninitialized_buffer might not initialize it, so using that requires the containers to promise to call prepare.

The second change is not strictly necessary, but I think it makes it more obvious that we can never expose the caller to uninitialized memory in case of rouge panics or calls to mem::forget, as the bytes in the vector up to the length are always initialized at every stage in the algorithm. Additionally, increasing the length first is technically in violation with the safety section on set_len, and the documentation on Vec says this:

There is one case which we will not break, however: using unsafe code to write to the excess capacity, and then increasing the length to match, is always valid.

The second change also removes the need for a drop guard, as the vector length has the correct value in case of panics. One disadvantage is that creating a slice to the excess capacity is rather annoying.

Darksonn added 2 commits May 24, 2020 23:39
The new implementation changes the behaviour such that set_len is called
after poll_read. The motivation of this change is that it makes it much
more obvious that a rouge panic wont give the caller access to a vector
containing exposed uninitialized memory. The new implementation also
makes sure to not zero memory twice.

Additionally it makes the various implementations more consistent with
each other regarding naming of variables, and whether we store how many
bytes we have read, or how many were in the container originally.

Fixes: tokio-rs#2544
@Darksonn Darksonn added C-maintenance Category: PRs that clean code up or issues documenting cleanup. A-tokio Area: The main tokio crate M-io Module: tokio/io labels May 25, 2020
@Darksonn Darksonn requested a review from taiki-e May 25, 2020 09:26
@Darksonn Darksonn self-assigned this May 25, 2020
@taiki-e taiki-e added the S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. label Jun 11, 2020
@taiki-e taiki-e changed the title Read to end prepare io: rewrite read_to_end and read_to_string Jun 11, 2020
@Darksonn
Copy link
Contributor Author

I realized that my get_unused_capacity is just a reimplementation of the bytes_mut method in the BufMut impl for Vec, and replaced it with a call to that method.

Copy link
Member

@carllerche carllerche left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for this and sorry for the delay reviewing.

I read through this change and it looks correct to me. I asked others to also take a look. I believe @hawkw is also reviewing.

Because this touches safety, it would be nice if we could get some miri tests to cover this. This does not have to be a blocker for this PR to land.

Copy link
Member

@hawkw hawkw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good to me overall, though I agree with Carl that Miri tests would be nice to have.

I commented on a few minor things.

tokio/src/io/util/read_to_end.rs Outdated Show resolved Hide resolved
tokio/src/io/util/read_to_end.rs Outdated Show resolved Hide resolved
tokio/src/io/util/read_to_end.rs Show resolved Hide resolved
Comment on lines +86 to +111
// safety: There are two situations:
//
// 1. The AsyncRead has not overriden `prepare_uninitialized_buffer`.
//
// In this situation, the default implementation of that method will have
// zeroed the unused capacity. This means that setting the length will
// never expose uninitialized memory in the vector.
//
// Note that the assert! below ensures that we don't set the length to
// something larger than the capacity, which malicious implementors might
// try to have us do.
//
// 2. The AsyncRead has overriden `prepare_uninitialized_buffer`.
//
// In this case, the safety of the `set_len` call below relies on this
// guarantee from the documentation on `prepare_uninitialized_buffer`:
//
// > This function isn't actually unsafe to call but unsafe to implement.
// > The implementer must ensure that either the whole buf has been zeroed
// > or poll_read() overwrites the buffer without reading it and returns
// > correct value.
//
// Note that `prepare_uninitialized_buffer` is unsafe to implement, so this
// is a guarantee we can rely on in unsafe code.
//
// The assert!() is technically only necessary in the first case.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this comment is lovely, thank you <3

Comment on lines 78 to 81
let slice: &mut [u8] = {
let ptr = unused_capacity.as_mut_ptr().cast::<u8>();
let len = unused_capacity.len();
std::slice::from_raw_parts_mut(ptr, len)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For my own understanding: this looks equivalent to transmute-ing the uninitialized slice to an initialized slice, but with more steps. My guess is that we're doing this instead because it is less powerful than transmute, and can't (for instance) reinterpret the length field as some entirely distinct type accidentally?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it's just a transmute. I'm doing it like this because using transmutes for pointer casts is usually rather strongly discouraged. At a bit more thought, we can also do it like this:

let slice: &mut [u8] = {
    &mut *(unused_capacity as *mut [MaybeUninit<u8>] as *mut [u8])
};

tokio/src/io/util/read_to_string.rs Outdated Show resolved Hide resolved
tokio/src/io/util/read_to_string.rs Show resolved Hide resolved
Co-authored-by: Eliza Weisman <eliza@buoyant.io>
@Darksonn
Copy link
Contributor Author

Darksonn commented Jul 23, 2020

I ran into some trouble getting miri tests to work. Running the command from the CI config gives me a whole bunch of

error[E0433]: failed to resolve: could not find `test` in `tokio`
 --> tokio/tests/macros_select.rs:9:10
  |
9 | #[tokio::test]
  |          ^^^^ could not find `test` in `tokio`

@taiki-e taiki-e removed the S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. label Jul 24, 2020
@carllerche carllerche merged commit 03b68f4 into tokio-rs:master Jul 28, 2020
@Darksonn Darksonn mentioned this pull request Nov 10, 2020
@Darksonn Darksonn deleted the read-to-end-prepare branch February 19, 2023 10:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-tokio Area: The main tokio crate C-maintenance Category: PRs that clean code up or issues documenting cleanup. M-io Module: tokio/io
Projects
None yet
Development

Successfully merging this pull request may close these issues.

read_to_end rezeroes the internal buffer many times
4 participants