-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Unify fd_readdir impl between *nixes #613
Conversation
This commit unifies the implementation of `fd_readdir` between Linux and BSD hosts. In particular, it re-uses the `Dirent`, `Entry`, and `Dir` (among others) building blocks introduced recently when `fd_readdir` was being implemented on Windows. Notable changes: * on BSD, wraps `readdir` syscall in an `Iterator` of the mutex-locked `Dir` struct * on BSD, removes `DirStream` struct from `OsFile`; `OsFile` now holds a mutex to `Dir` * makes `Dir` iterators implementation specific (Linux has its own, and so does BSD)
3e1481e
to
fc23cde
Compare
// descriptor, or to modify the state of the associated description other | ||
// than by means of closedir(), readdir(), readdir_r(), or rewinddir(), | ||
// the behaviour is undefined. | ||
let fd = (*os_file).try_clone()?; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If osfile.dir
already is Some
, then you're making a useless syscall for dup
with a throwaway result.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you elaborate? This leg is only executed if os_file.dir
is not set, i.e., None
, so I'm not sure what you mean here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, I focused on the get_or_insert
too much :) It's fine then!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
:-)
// control of the system, and if any attempt is made to close the file | ||
// descriptor, or to modify the state of the associated description other | ||
// than by means of closedir(), readdir(), readdir_r(), or rewinddir(), | ||
// the behaviour is undefined. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We seem to be mostly using the American English variant behavior
, so I think it's best to choose one variant and stay consistent :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As a British citizen I have to respectfully decline :-P
// new items may not be returned to the caller. | ||
if cookie == wasi::__WASI_DIRCOOKIE_START { | ||
log::trace!(" | fd_readdir: doing rewinddir"); | ||
dir.lock().unwrap().rewind(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please lock the mutex only once.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
})) | ||
} | ||
|
||
struct DirIter<'a>(MutexGuard<'a, Dir>); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we possibly have a common DirIter
implementation for all *nixes?
On Linux, nothing will prevent the caller to execute concurrent fd_readdir
's on the same directory stream, so I don't see why we should prevent it on BSD. In particular readdir(2)
on Linux says: (emphasis added)
In cases where multiple threads must read from the same directory stream, using readdir() with external synchronization is still preferable to the use of the deprecated readdir_r(3) function
This means, that code which doesn't have do any synchronization itself for readdir
ing a single directory stram from multiple threads will be invalid anyway, so we can probably get away with not having mutexes here. Unfortunately, this would probably require adding some unsafe
code, but we might want to do it for performance reasons.
In particular, when threads are introduced to WASI, thread-safety of various function will have to be evaluated.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I'd prefer to leave it behind a Mutex
as-is for now. This change was not introduced in this PR, but only refactored out from the actual implementation of fd_readdir
on BSD that we've had until now. I'm OK with some code duplication between Linux and BSD, especially since on BSD we still need to recoved the current location using telldir
as it's not available to be read directly from the dirent
entry.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Then we could possibly have a system-dependent implementation of Entry
which would reuse the d_off
value on Linux and call telldir
on other nixes, or just call telldir everywhere (it's just a glibc call, so while it's not free, it's cheap).
But I'm fine with addressing this in a subsequent PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed to address this in a subsequent PR. :-)
@@ -98,7 +97,7 @@ pub(crate) fn fd_readdir_impl( | |||
dir.seek(loc); | |||
} | |||
|
|||
Ok(dir.into_iter().map(|entry| { | |||
Ok(DirIter(dir).map(|entry| { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
into_iter
seems more idiomatic to me, is there any particular reason why you switched to literal DirIter
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since it's an internal structure, I wanted to keep DirIter
private to the module. By implementing an IntoIterator
for Dir
we have to leak the struct outside of the module we seems like a lot of noise for little to be gained.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fair enough!
if errno != Errno::last() { | ||
// According to POSIX man (for Linux though!), there was an error | ||
// if the errno value has changed at some point during the sequence | ||
// of readdir calls |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Linux man says something different:
To distinguish end of stream and from an error, set errno to zero before calling readdir() and then check the value of errno if NULL is returned.
POSIX says the same:
Applications wishing to check for error situations should set errno to 0 before calling readdir(). If errno is set to non-zero on return, an error occurred.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Interesting, since this reference says something different:
If the end of the directory stream is reached, NULL is returned and errno is not changed. If an error occurs, NULL is returned and errno is set appropriately.
Also, please note that this bit of code was introduced in 4982878 as a fix to a subtle bug that only emerged when testing the crate in release.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The version I quoted is from
This page is part of release 5.03 of the Linux man-pages project.
and
This is POSIX.1-2008 with the 2013 Technical Corrigendum 1 applied.
respectively.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, interesting. Perhaps this is a BSD-only thing (we won't know for sure until we rewrite fd_readdir
on Linux using readdir
rather than readdir_r
. Anyhow, as a compromise, I can remove the mention of Linux in the comment.
#[derive(Debug)] | ||
pub(crate) struct OsFile { | ||
pub(crate) file: fs::File, | ||
pub(crate) dir_stream: Option<Mutex<DirStream>>, | ||
pub(crate) dir: Option<Mutex<Dir>>, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please add a comment why we're storing it in OsFile
(or mentioned where the reason is described in more detail)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you add a reference that BSD nixes require the client to do the caching?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure thing. It's actually taken directly from the man pages. Let me add that in.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
#[derive(Debug)] | ||
pub(crate) struct OsFile { | ||
pub(crate) file: fs::File, | ||
pub(crate) dir_stream: Option<Mutex<DirStream>>, | ||
pub(crate) dir: Option<Mutex<Dir>>, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you add a reference that BSD nixes require the client to do the caching?
I'm gonna go ahead and merge this so that I don't necessarily block other avenues including #520. Feel free to reopen or submit an issue if you feel something is amiss though! |
This commit unifies the implementation of
fd_readdir
between Linuxand BSD hosts. In particular, it re-uses the
Dirent
,Entry
, andDir
(among others) building blocks introduced recently whenfd_readdir
was being implemented on Windows.Notable changes:
readdir
syscall in anIterator
of the mutex-lockedDir
structDirStream
struct fromOsFile
;OsFile
now holds amutex to
Dir
Dir
iterators implementation specific (Linux has its own,and so does BSD)
cc @marmistrz
Note: this PR depends on #612 as a by-product of me accidentally disabling the WASI tests in #600 (sorry!).This has now been sorted out.