-
Notifications
You must be signed in to change notification settings - Fork 12.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
unix: reduce the size of DirEntry #94750
Conversation
On platforms where we call `readdir` instead of `readdir_r`, we store the name as an allocated `CString` for variable length. There's no point carrying around a full `dirent64` with its fixed-length `d_name` too.
(rust-highfive has picked a reviewer for you, use r? to override) |
@bors r+ |
📌 Commit e8b9ba8 has been approved by |
☀️ Test successful - checks-actions |
Finished benchmarking commit (163c207): comparison url. Summary: This benchmark run did not return any relevant results. If you disagree with this performance assessment, please file an issue in rust-lang/rustc-perf. @rustbot label: -perf-regression |
Eliminate 280-byte memset from ReadDir iterator This guy: https://github.com/rust-lang/rust/blob/1536ab1b383f21b38f8d49230a2aecc51daffa3d/library/std/src/sys/unix/fs.rs#L589 It turns out `libc::dirent64` is quite big—https://docs.rs/libc/0.2.135/libc/struct.dirent64.html. In rust-lang#103135 this memset accounted for 0.9% of the runtime of iterating a big directory. Almost none of the big zeroed value is ever used. We memcpy a tiny prefix (19 bytes) into it, and then read just 9 bytes (`d_ino` and `d_type`) back out. We can read exactly those 9 bytes we need directly from the original entry_ptr instead. ## History This code got added in rust-lang#93459 and tweaked in rust-lang#94272 and rust-lang#94750. Prior to rust-lang#93459, there was no memset but a full 280 bytes were being copied from the entry_ptr. <table><tr><td>copy 280 bytes</td></tr></table> This was not legal because not all of those bytes might be initialized, or even allocated, depending on the length of the directory entry's name, leading to a segfault. That PR fixed the segfault by creating a new zeroed dirent64 and copying just the guaranteed initialized prefix into it. <table><tr><td>memset 280 bytes</td><td>copy 19 bytes</td></tr></table> However this was still buggy because it used `addr_of!((*entry_ptr).d_name)`, which is considered UB by Miri in the case that the full extent of entry_ptr is not in bounds of the same allocation. (Arguably this shouldn't be a requirement, but here we are.) The UB got fixed by rust-lang#94272 by replacing `addr_of` with some pointer manipulation based on `offset_from`, but still fundamentally the same operation. <table><tr><td>memset 280 bytes</td><td>copy 19 bytes</td></tr></table> Then rust-lang#94750 noticed that only 9 of those 19 bytes were even being used, so we could pick out only those 9 to put in the ReadDir value. <table><tr><td>memset 280 bytes</td><td>copy 19 bytes</td><td>copy 9 bytes</td></tr></table> After my PR we just grab the 9 needed bytes directly from entry_ptr. <table><tr><td>copy 9 bytes</td></tr></table> The resulting code is more complex but I believe still worthwhile to land for the following reason. This is an extremely straightforward thing to accomplish in C and clearly libc assumes that; literally just `entry_ptr->d_name`. The extra work in comparison to accomplish it in Rust is not an example of any actual safety being provided by Rust. I believe it's useful to have uncovered that and think about what could be done in the standard library or language to support this obvious operation better. ## References - https://man7.org/linux/man-pages/man3/readdir.3.html
Eliminate 280-byte memset from ReadDir iterator This guy: https://github.com/rust-lang/rust/blob/1536ab1b383f21b38f8d49230a2aecc51daffa3d/library/std/src/sys/unix/fs.rs#L589 It turns out `libc::dirent64` is quite big—https://docs.rs/libc/0.2.135/libc/struct.dirent64.html. In rust-lang#103135 this memset accounted for 0.9% of the runtime of iterating a big directory. Almost none of the big zeroed value is ever used. We memcpy a tiny prefix (19 bytes) into it, and then read just 9 bytes (`d_ino` and `d_type`) back out. We can read exactly those 9 bytes we need directly from the original entry_ptr instead. ## History This code got added in rust-lang#93459 and tweaked in rust-lang#94272 and rust-lang#94750. Prior to rust-lang#93459, there was no memset but a full 280 bytes were being copied from the entry_ptr. <table><tr><td>copy 280 bytes</td></tr></table> This was not legal because not all of those bytes might be initialized, or even allocated, depending on the length of the directory entry's name, leading to a segfault. That PR fixed the segfault by creating a new zeroed dirent64 and copying just the guaranteed initialized prefix into it. <table><tr><td>memset 280 bytes</td><td>copy 19 bytes</td></tr></table> However this was still buggy because it used `addr_of!((*entry_ptr).d_name)`, which is considered UB by Miri in the case that the full extent of entry_ptr is not in bounds of the same allocation. (Arguably this shouldn't be a requirement, but here we are.) The UB got fixed by rust-lang#94272 by replacing `addr_of` with some pointer manipulation based on `offset_from`, but still fundamentally the same operation. <table><tr><td>memset 280 bytes</td><td>copy 19 bytes</td></tr></table> Then rust-lang#94750 noticed that only 9 of those 19 bytes were even being used, so we could pick out only those 9 to put in the ReadDir value. <table><tr><td>memset 280 bytes</td><td>copy 19 bytes</td><td>copy 9 bytes</td></tr></table> After my PR we just grab the 9 needed bytes directly from entry_ptr. <table><tr><td>copy 9 bytes</td></tr></table> The resulting code is more complex but I believe still worthwhile to land for the following reason. This is an extremely straightforward thing to accomplish in C and clearly libc assumes that; literally just `entry_ptr->d_name`. The extra work in comparison to accomplish it in Rust is not an example of any actual safety being provided by Rust. I believe it's useful to have uncovered that and think about what could be done in the standard library or language to support this obvious operation better. ## References - https://man7.org/linux/man-pages/man3/readdir.3.html
On platforms where we call
readdir
instead ofreaddir_r
, we storethe name as an allocated
CString
for variable length. There's no pointcarrying around a full
dirent64
with its fixed-lengthd_name
too.