-
-
Notifications
You must be signed in to change notification settings - Fork 2.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
read_to_end rezeroes the internal buffer many times #2544
Comments
Changing tokio version from 0.2.21 to branch from PR #2525 can drastically improve performance. |
Yeah, that makes ChildStdout/err not initialize the buffer at all, which avoids the issue in read_to_end entirely, for this particular repro. |
Does this patch take into consideration for tokio/tokio/src/io/util/read_to_end.rs Lines 35 to 41 in 2da15b5
|
It's just diff --git a/tokio/src/io/util/read_to_end.rs b/tokio/src/io/util/read_to_end.rs
index a2cd99be..477b76e5 100644
--- a/tokio/src/io/util/read_to_end.rs
+++ b/tokio/src/io/util/read_to_end.rs
@@ -54,12 +56,15 @@ pub(super) fn read_to_end_internal<R: AsyncRead + ?Sized>(
cx: &mut Context<'_>,
buf: &mut Vec<u8>,
start_len: usize,
+ init_len: &mut usize,
) -> Poll<io::Result<usize>> {
let mut g = Guard {
len: buf.len(),
buf,
};
let ret;
+
+ unsafe { g.buf.set_len(*init_len); }
loop {
if g.len == g.buf.len() {
unsafe {
@@ -70,6 +75,7 @@ pub(super) fn read_to_end_internal<R: AsyncRead + ?Sized>(
let b = &mut *(&mut g.buf[g.len..] as *mut [u8] as *mut [MaybeUninit<u8>]);
rd.prepare_uninitialized_buffer(b);
+ *init_len = g.buf.len();
}
} ie saving/restoring the buf.len() state as well, but only within the Guard region. Another way would be to assume + ensure that len()..capacity() has always been prepare_uninitialize_buffer()ed by also putting something in read_to_end() and read_to_string(), in which case read_to_end_internal could just assume that |
@talchas Thanks. Looks like it's working correctly for the issue I was concerned about. Also, it seems that actually using in tokio will require some minor fixes.
EDIT: The second thing is only needed for |
For the first, yeah; for the second I think you're fine because ReadToEnd holds a borrow on the input buffer, and having returned success once shouldn't change anything in terms of what's initialized. |
Oops, sorry. The second thing is only needed for |
Ah yes, ReadToString would have to be careful there. |
The new implementation changes the behaviour such that set_len is called after poll_read. The motivation of this change is that it makes it much more obvious that a rouge panic wont give the caller access to a vector containing exposed uninitialized memory. The new implementation also makes sure to not zero memory twice. Additionally it makes the various implementations more consistent with each other regarding naming of variables, and whether we store how many bytes we have read, or how many were in the container originally. Fixes: tokio-rs#2544
This will be fixed by #2541. |
The new implementation changes the behaviour such that set_len is called after poll_read. The motivation of this change is that it makes it much more obvious that a rouge panic wont give the caller access to a vector containing exposed uninitialized memory. The new implementation also makes sure to not zero memory twice. Additionally it makes the various implementations more consistent with each other regarding naming of variables, and whether we store how many bytes we have read, or how many were in the container originally. Fixes: tokio-rs#2544
The new implementation changes the behavior such that set_len is called after poll_read. The motivation of this change is that it makes it much more obvious that a rouge panic won't give the caller access to a vector containing exposed uninitialized memory. The new implementation also makes sure to not zero memory twice. Additionally, it makes the various implementations more consistent with each other regarding the naming of variables, and whether we store how many bytes we have read, or how many were in the container originally. Fixes: #2544
Version
tokio master f480659
tokio-macros 0.2.5
Platform
Linux 4.19.82-gentoo x86_64
Description
read_to_end
/ReadToEnd::poll
/read_to_end_internal
don't keep track of what part of the unused Vec has been initialized across poll() calls, leading to tons of unnecessary zeroing. (The code adapted from std's read_to_end keeps track of this, but only within a single poll() call)The original repro from a discord question for linux (create a ~200 mb file in /dev/shm/input.txt):
A hacked up patch to keep an extra initialized_len in ReadToEnd and keep it and g.buf.len() in sync while inside of read_to_end_internal fixes this and cuts the runtime from 20s to 0.5s for me.
(A side issue from the original complaint is that ChildStdout/ChildStderr could use uninitialized buffers, which looks to be being fixed by #2525)
The text was updated successfully, but these errors were encountered: