Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unify fd_readdir impl between *nixes #613

Merged
merged 3 commits into from
Nov 24, 2019

Conversation

kubkon
Copy link
Member

@kubkon kubkon commented Nov 21, 2019

This commit unifies the implementation of fd_readdir between Linux
and BSD hosts. In particular, it re-uses the Dirent, Entry, and
Dir (among others) building blocks introduced recently when
fd_readdir was being implemented on Windows.

Notable changes:

  • on BSD, wraps readdir syscall in an Iterator of the mutex-locked
    Dir struct
  • on BSD, removes DirStream struct from OsFile; OsFile now holds a
    mutex to Dir
  • makes Dir iterators implementation specific (Linux has its own,
    and so does BSD)

cc @marmistrz

Note: this PR depends on #612 as a by-product of me accidentally disabling the WASI tests in #600 (sorry!). This has now been sorted out.

@kubkon kubkon added the wasi:impl Issues pertaining to WASI implementation in Wasmtime label Nov 21, 2019
This commit unifies the implementation of `fd_readdir` between Linux
and BSD hosts. In particular, it re-uses the `Dirent`, `Entry`, and
`Dir` (among others) building blocks introduced recently when
`fd_readdir` was being implemented on Windows.

Notable changes:
* on BSD, wraps `readdir` syscall in an `Iterator` of the mutex-locked
  `Dir` struct
* on BSD, removes `DirStream` struct from `OsFile`; `OsFile` now holds a
  mutex to `Dir`
* makes `Dir` iterators implementation specific (Linux has its own,
  and so does BSD)
// descriptor, or to modify the state of the associated description other
// than by means of closedir(), readdir(), readdir_r(), or rewinddir(),
// the behaviour is undefined.
let fd = (*os_file).try_clone()?;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If osfile.dir already is Some, then you're making a useless syscall for dup with a throwaway result.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you elaborate? This leg is only executed if os_file.dir is not set, i.e., None, so I'm not sure what you mean here.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, I focused on the get_or_insert too much :) It's fine then!

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:-)

// control of the system, and if any attempt is made to close the file
// descriptor, or to modify the state of the associated description other
// than by means of closedir(), readdir(), readdir_r(), or rewinddir(),
// the behaviour is undefined.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We seem to be mostly using the American English variant behavior, so I think it's best to choose one variant and stay consistent :)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As a British citizen I have to respectfully decline :-P

// new items may not be returned to the caller.
if cookie == wasi::__WASI_DIRCOOKIE_START {
log::trace!(" | fd_readdir: doing rewinddir");
dir.lock().unwrap().rewind();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please lock the mutex only once.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

}))
}

struct DirIter<'a>(MutexGuard<'a, Dir>);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we possibly have a common DirIter implementation for all *nixes?

On Linux, nothing will prevent the caller to execute concurrent fd_readdir's on the same directory stream, so I don't see why we should prevent it on BSD. In particular readdir(2) on Linux says: (emphasis added)

In cases where multiple threads must read from the same directory stream, using readdir() with external synchronization is still preferable to the use of the deprecated readdir_r(3) function

This means, that code which doesn't have do any synchronization itself for readdiring a single directory stram from multiple threads will be invalid anyway, so we can probably get away with not having mutexes here. Unfortunately, this would probably require adding some unsafe code, but we might want to do it for performance reasons.

In particular, when threads are introduced to WASI, thread-safety of various function will have to be evaluated.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I'd prefer to leave it behind a Mutex as-is for now. This change was not introduced in this PR, but only refactored out from the actual implementation of fd_readdir on BSD that we've had until now. I'm OK with some code duplication between Linux and BSD, especially since on BSD we still need to recoved the current location using telldir as it's not available to be read directly from the dirent entry.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Then we could possibly have a system-dependent implementation of Entry which would reuse the d_off value on Linux and call telldir on other nixes, or just call telldir everywhere (it's just a glibc call, so while it's not free, it's cheap).

But I'm fine with addressing this in a subsequent PR.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed to address this in a subsequent PR. :-)

@@ -98,7 +97,7 @@ pub(crate) fn fd_readdir_impl(
dir.seek(loc);
}

Ok(dir.into_iter().map(|entry| {
Ok(DirIter(dir).map(|entry| {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

into_iter seems more idiomatic to me, is there any particular reason why you switched to literal DirIter?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since it's an internal structure, I wanted to keep DirIter private to the module. By implementing an IntoIterator for Dir we have to leak the struct outside of the module we seems like a lot of noise for little to be gained.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fair enough!

if errno != Errno::last() {
// According to POSIX man (for Linux though!), there was an error
// if the errno value has changed at some point during the sequence
// of readdir calls
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Linux man says something different:

To distinguish end of stream and from an error, set errno to zero before calling readdir() and then check the value of errno if NULL is returned.

POSIX says the same:

Applications wishing to check for error situations should set errno to 0 before calling readdir(). If errno is set to non-zero on return, an error occurred.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting, since this reference says something different:

If the end of the directory stream is reached, NULL is returned and errno is not changed. If an error occurs, NULL is returned and errno is set appropriately.

Also, please note that this bit of code was introduced in 4982878 as a fix to a subtle bug that only emerged when testing the crate in release.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The version I quoted is from

This page is part of release 5.03 of the Linux man-pages project.

and

This is POSIX.1-2008 with the 2013 Technical Corrigendum 1 applied.

respectively.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, interesting. Perhaps this is a BSD-only thing (we won't know for sure until we rewrite fd_readdir on Linux using readdir rather than readdir_r. Anyhow, as a compromise, I can remove the mention of Linux in the comment.

#[derive(Debug)]
pub(crate) struct OsFile {
pub(crate) file: fs::File,
pub(crate) dir_stream: Option<Mutex<DirStream>>,
pub(crate) dir: Option<Mutex<Dir>>,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add a comment why we're storing it in OsFile (or mentioned where the reason is described in more detail)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add a reference that BSD nixes require the client to do the caching?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure thing. It's actually taken directly from the man pages. Let me add that in.

Copy link
Contributor

@marmistrz marmistrz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

#[derive(Debug)]
pub(crate) struct OsFile {
pub(crate) file: fs::File,
pub(crate) dir_stream: Option<Mutex<DirStream>>,
pub(crate) dir: Option<Mutex<Dir>>,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add a reference that BSD nixes require the client to do the caching?

@kubkon
Copy link
Member Author

kubkon commented Nov 24, 2019

I'm gonna go ahead and merge this so that I don't necessarily block other avenues including #520. Feel free to reopen or submit an issue if you feel something is amiss though!

@kubkon kubkon merged commit c45f709 into bytecodealliance:master Nov 24, 2019
@kubkon kubkon deleted the fd_readdir_bsd branch November 24, 2019 09:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
wasi:impl Issues pertaining to WASI implementation in Wasmtime
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants