-
-
Notifications
You must be signed in to change notification settings - Fork 2.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
io: rewrite read_to_end and read_to_string #2560
Conversation
The new implementation changes the behaviour such that set_len is called after poll_read. The motivation of this change is that it makes it much more obvious that a rouge panic wont give the caller access to a vector containing exposed uninitialized memory. The new implementation also makes sure to not zero memory twice. Additionally it makes the various implementations more consistent with each other regarding naming of variables, and whether we store how many bytes we have read, or how many were in the container originally. Fixes: tokio-rs#2544
I realized that my |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for this and sorry for the delay reviewing.
I read through this change and it looks correct to me. I asked others to also take a look. I believe @hawkw is also reviewing.
Because this touches safety, it would be nice if we could get some miri tests to cover this. This does not have to be a blocker for this PR to land.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks good to me overall, though I agree with Carl that Miri tests would be nice to have.
I commented on a few minor things.
// safety: There are two situations: | ||
// | ||
// 1. The AsyncRead has not overriden `prepare_uninitialized_buffer`. | ||
// | ||
// In this situation, the default implementation of that method will have | ||
// zeroed the unused capacity. This means that setting the length will | ||
// never expose uninitialized memory in the vector. | ||
// | ||
// Note that the assert! below ensures that we don't set the length to | ||
// something larger than the capacity, which malicious implementors might | ||
// try to have us do. | ||
// | ||
// 2. The AsyncRead has overriden `prepare_uninitialized_buffer`. | ||
// | ||
// In this case, the safety of the `set_len` call below relies on this | ||
// guarantee from the documentation on `prepare_uninitialized_buffer`: | ||
// | ||
// > This function isn't actually unsafe to call but unsafe to implement. | ||
// > The implementer must ensure that either the whole buf has been zeroed | ||
// > or poll_read() overwrites the buffer without reading it and returns | ||
// > correct value. | ||
// | ||
// Note that `prepare_uninitialized_buffer` is unsafe to implement, so this | ||
// is a guarantee we can rely on in unsafe code. | ||
// | ||
// The assert!() is technically only necessary in the first case. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this comment is lovely, thank you <3
tokio/src/io/util/read_to_end.rs
Outdated
let slice: &mut [u8] = { | ||
let ptr = unused_capacity.as_mut_ptr().cast::<u8>(); | ||
let len = unused_capacity.len(); | ||
std::slice::from_raw_parts_mut(ptr, len) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For my own understanding: this looks equivalent to transmute
-ing the uninitialized slice to an initialized slice, but with more steps. My guess is that we're doing this instead because it is less powerful than transmute
, and can't (for instance) reinterpret the length field as some entirely distinct type accidentally?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, it's just a transmute. I'm doing it like this because using transmutes for pointer casts is usually rather strongly discouraged. At a bit more thought, we can also do it like this:
let slice: &mut [u8] = {
&mut *(unused_capacity as *mut [MaybeUninit<u8>] as *mut [u8])
};
Co-authored-by: Eliza Weisman <eliza@buoyant.io>
I ran into some trouble getting miri tests to work. Running the command from the CI config gives me a whole bunch of
|
This PR is introduced to solve #2544.
This PR rewrites most of the
read_to_end
code. The new implementation changes the following main features about the implementation:ReadToEnd
andReadToString
structs must now promiseread_to_end_internal
that the buffer has been prepared for use with theAsyncRead
.AsyncRead
is now provided a slice into the unused capacity of the vector, and the length is increased after the call topoll_read
, instead of first increasing the length, reading, and then decreasing the length.The first change introduces more unsafe blocks to propagate that promise upwards, but is necessary to fix #2544 unless we initialize it ourselves. Relying on
prepare_uninitialized_buffer
might not initialize it, so using that requires the containers to promise to call prepare.The second change is not strictly necessary, but I think it makes it more obvious that we can never expose the caller to uninitialized memory in case of rouge panics or calls to
mem::forget
, as the bytes in the vector up to the length are always initialized at every stage in the algorithm. Additionally, increasing the length first is technically in violation with the safety section onset_len
, and the documentation onVec
says this:The second change also removes the need for a drop guard, as the vector length has the correct value in case of panics. One disadvantage is that creating a slice to the excess capacity is rather annoying.