Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Virtual file support #701

Merged
merged 29 commits into from
Mar 6, 2020
Merged

Virtual file support #701

merged 29 commits into from
Mar 6, 2020

Conversation

iximeow
Copy link
Contributor

@iximeow iximeow commented Dec 12, 2019

This PR adds support for virtual files (so, file-like objects not necessarily backed by bytes on a local disk) for most WASI APIs. It also includes an implementation of virtual files (InMemoryFile) and directories (VirtualDir). Functionally this is transparent to users of wasi-common.

The intent here is to support operations on entities that are logically files, even if they may not be what the host OS calls "a file". These might be entirely artificial (like /dev/random, though __wasi_random_get() already exists), static data acquired over the network at open rather than at read, or some other file-like semantic that shouldn't be obligated to go through a filesystem.

For tests that have preopened directories, this duplicates the tests for VirtualDir/InMemoryFile-backed testing. Actual test runner changes are pretty small as well: https://github.com/bytecodealliance/wasmtime/pull/701/files#diff-6bb07f6d2e62ac9c41406cd4d888db1aR29-R41. Additionally, I have a small branch against lucet, bytecodealliance/lucet#375, to see this in action on our test programs there.

(edit: updated description to reflect that this is no longer a draft PR)

@iximeow iximeow changed the title Virtual file supporrt Virtual file support Dec 12, 2019
Copy link
Member

@sunfishcode sunfishcode left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall, this is a nice feature.

My main concern here is that it will be tempting to use this mechanism to create Linux-/proc-like APIs, because, well, Linux does it, but also, it's a pretty simple way to expose new APIs to applications that doesn't require adding any new "syscalls" or dealing with libc, toolchain, WASI Subgroup, or other entities. However, /proc-style APIs have several downsides:

  • It ties API functionality to the concept of a filesystem, which some users of Wasmtime don't want to expose at all, making the functionality awkward for them to work with.
  • Modules don't declare imports for these APIs, so there's no way to know statically which APIs a module uses.
  • They don't take advantage of wasm's type system. Files just provide byte sequences and applications need to parse them.

In general, what I'd like to encourage is for embedders to use regular APIs with functions that can be imported. The current infrastructure for doing that isn't as convenient as it could be, but we'll be in better shape if we improve it rather than work around it.

That said, there are still uses for virtual filesystems, so I'm mainly looking for ideas for how we can encourage people to use it for exposing things which are inherently filesystem-like, and to avoid using it for general-purpose APIs.

@@ -3,7 +3,7 @@ use std::ops::{Deref, DerefMut};
use std::os::unix::prelude::{AsRawFd, RawFd};

#[derive(Debug)]
pub(crate) struct OsFile(fs::File);
pub(crate) struct OsFile(pub(crate) fs::File);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OsFile's contents are encapsulated so that we can hide some differences between platforms. Is it possible to organize things such that we don't need to make this pub(crate?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Recorganizing to keep OsFile contents non-public really just needed an applicable FdEntry constructor - I think this makes sense, but I'm not sure what might use FdEntry::from outside wasi-common. Matching this change is pretty mechanical for callers, at least: callers would wrap the file in File::OsHandle(OsHandle::from(the_original_argument))

crates/wasi-common/src/fdentry.rs Outdated Show resolved Hide resolved
crates/wasi-common/src/ctx.rs Outdated Show resolved Hide resolved
@iximeow
Copy link
Contributor Author

iximeow commented Dec 19, 2019

Pardon the rebase. Particularly if you think virtual files and directories are fair to live in wasi-common, I'm hopeful that not needing to fiddle with trait VirtualFile will help avoid procfs-style usage later on.

There are still several unimplemented!(), and I realized today that I've not adjusted any non-linux cfg branches for these changes either, so that's what I'll be doing here next! 😅

@iximeow iximeow force-pushed the virtfs branch 2 times, most recently from 983d015 to 7f2bb9b Compare February 1, 2020 02:37
@iximeow iximeow marked this pull request as ready for review February 10, 2020 16:32
@iximeow
Copy link
Contributor Author

iximeow commented Feb 10, 2020

Okay! It works! I've made the test-programs build.rs generate copies of those tests that use VirtualFile instead of a real file system and that caught a few cross-platform bugs where VirtualFile had inconsistencies against a real file system. It's green now, and ready for review!

I have a few bugfixes bundled in here, I'll point them out in comments and if anyone would rather see them parceled out into their own PRs I'm happy to do so.

FileType::Symlink
} else {
FileType::Unknown
match sflags {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a workaround for a yanix bug: SFlag should not be a bit field! It corresponds to stat.st_mode file types, and use as a bitfield results in situations such as a symlink (0o12) being also reported as a regular file (0o10) and a character device (0o02). Because of the lookup order here, when I adjusted path_get to use wasi file types, symlinks file types were being changed to character devices :( There will be a followup PR to fix this in yanix.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's an interesting one, thanks for catching it! I think though that SFlag is correctly a bit field as prescribed by POSIX (see inode(7)). However, is it that we were using it wrong all along? POSIX seems to prescribe that we should do bit masking with SFlag::S_IFMT to get the filetype.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the mask is to pick the four bits that comprise the enumeration - note that the example is if ((sb.st_mode & S_IFMT) == S_IFREG) { rather than if ((sb.st_mode & S_IFREG) == S_IFREG) {.

Also, if it's a bit mask, I don't think there would be a way to disambiguate

           S_IFSOCK   0140000   socket
           S_IFLNK    0120000   symbolic link

in the face of S_IFREG, S_IFDIR, and S_IFCHR overlapping with one of those bits?

I think SFlag ended up not used much in wasi-common, I didn't see other use sites with a quick look when I was changing path_get, so maybe no one was "using" it wrong in the first place? 😁

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, exactly, the S_IFMT is to pick the bits that comprise the filetype enumeration in st_mode while ignoring the mode and permission bits. I guess the naming is off here. st_mode is the bit field, whereas we should have used an enumeration type like #[repr(u16)] SFiletype or whatnot to represent only the filetype info, remembering though not to include S_IFMT as part of the enumeration, but rather use it to construct the values. Would you agree here? The interesting thing though is that the same inaccuracy is present in nix-rust and I wonder if it's actually used as intended there 🤔.

Copy link
Member

@kubkon kubkon Feb 15, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm really glad you found this bug actually, since as it turns out, when converting between SFlags and host::FileType (in wasi-common), we don't handle the case of the (named) pipe (S_IFIFO).

I've now drafted out a rewrite of this and would really appreciate your feedback. The PR: #945. I've also cc'd you in the PR itself. Also, it'd be great if you could verify that the proposed PR fixes the bug you've discovered.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've given it a quick read and it looks good! I'm pretty sure it fixes this, though I'll take a closer look Monday or Tuesday.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agh, I neglected to update here - yes, this is fixed with #945!

}
})
_ => {
unreachable!("streams do not have paths and should not be accessible via PathGet");
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will this remain true, generally, as wasi-common grows an idea of streams outside stdin/stdout/stderr?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Excellent question, and while I don't possess a clear-cut answer for this, I think it will not remain true. But then again, I think that introducing streams into wasi-common will require at least some rewriting of some syscalls. I don't have a problem leaving it as unreachable. Alternatively, I guess we could change it to unimplemented but I agree with your train of thoughts here, only if we're certain this will apply to streams as well. All in all, I'm voting for leaving that in as-is.

pub(crate) fn as_handle<'descriptor>(&'descriptor self) -> Handle<'descriptor> {
match self {
Self::VirtualFile(virt) => Handle::VirtualFile(virt.as_ref()),
Self::OsHandle(handle) => Handle::OsHandle(handle),
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a workaround for another subtle bug/point of confusion: descriptor_as_oshandle is lossy on BSDs. BSD OsHandle maintains wasi-common-side state to handle readdir subtleties, but descriptor_as_oshandle constructs an OsHandleRef by making a new File from the shared fd and putting that in a new OsHandle it holds with ManuallyDrop. This discards wasi-common-side state for the file and results in incorrect readdir behavior.

I think generally we probably don't want OsHandleRef while also having state on OsHandle, and my first draft of this PR in fact removed the one and only place OsHandleRef was used, but as of last week I ran into some issues here. So I'm noting this here and if I've been convincing I'll copy the above into an issue.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's some great observation, thanks for pointing it out! We indeed seem to be discarding the state of OsHandle on BSD which contains a DIR* stream essentially allowing us for correctly handling seek, tell and rewinds in fd_readdir call. I wonder if that's a problem for OsHandleRef since I always assumed that would point at a stream-like file descriptor for which fd_readdir wouldn't really make much sense in its current form? Just thinking out loud here, it'd probably be best if @sunfishcode chipped in here as well.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, OsHandleRef was motivated by the need to abstract over files and stdin/stdout/stderr, so it's aimed at stream-like use cases, and not fd_readdir. However, I agree that this is a really subtle issue that would be good to think through.

The new Handle type introduced here is a lot like Descriptor. Would it be feasible to just pass around Descriptor references, rather than create Handle/HandleMut objects to pass around? Similarly, it may make sense to pass around Descriptor references instead of OsHandleRef too, and only convert to File or similar at the point of calling into yanix/winx/libc.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've added a commit removing Handle and HandleMut: 0c571c6. I originally added them for a similar reason as OsHandleRef - to abstract over the different Stream variants. It turns out that there really isn't much benefit of that though, either I'd call as_os_handle() anyway, or having any non-File-like is an error.

It would be nice if we could rely on the type system to enforce that anything in sys (or later, calling yanix and friends) is a non-virtual handle, a nice benefit of OsHandle/OsHandleRef. I imagine wasi-common streams will be a specific set of functionality still on top of an OsHandle so I think we could get away with passing around OsHandle internally, rather than full descriptors.

kubkon pushed a commit to kubkon/wasmtime that referenced this pull request Feb 15, 2020
This commit does a bit of everything: refactors bits here and there,
fixes a bug discovered in another bytecodealliance#701, and combines all structs that
we used in `yanix` and `wasi-common` crates to represent file types
on *nix into one struct, `yanix::file::FileType`.

Up until now, in `yanix`, we've had two separate structs used to
represent file types on the host: `yanix::dir::FileType` and
`yanix::file::SFlags` (well, not quite, but that was its main use).
They both were used in different context (the former when parsing
`dirent` struct, and the latter when parsing `stat` struct), they
were C-compatible (as far as their representation goes), and as it
turns out, they shared possible enumeration values. This commit
combines them both into an idiomatic Rust enum with the caveat that
it is now *not* C-compatible, however, I couldn't find a single use
where that would actually matter, and even if it does in the future,
we can simply add appropriate impl methods.

The combine `yanix::file::FileType` struct can be constructed in two
ways: 1) either from `stat.st_mode` value (and while we're here,
now it's done correctly according to POSIX which fixes the bug mentioned
in VFS impl PR bytecodealliance#701), or 2) from `dirent.d_type` value. Also, since we now
have one struct for representing both contexts, this cleans up nicely
a lot of duplicated code in `host` module.
Comment on lines +238 to +253
// TODO: virtfs files cannot be poll_oneoff'd yet
"poll_oneoff_virtualfs" => true,
// TODO: virtfs does not support filetimes yet.
"path_filestat_virtualfs" |
"fd_filestat_set_virtualfs" => true,
// TODO: virtfs does not support symlinks yet.
"nofollow_errors_virtualfs" |
"path_link_virtualfs" |
"readlink_virtualfs" |
"readlink_no_buffer_virtualfs" |
"dangling_symlink_virtualfs" |
"symlink_loop_virtualfs" |
"path_symlink_trailing_slashes_virtualfs" => true,
// TODO: virtfs does not support rename yet.
"path_rename_trailing_slashes_virtualfs" |
"path_rename_virtualfs" => true,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, this looks like we're redoing every VirtualFs test on every host. Am I correct in assuming so, and do we actually need to do that given that our de facto host is "virtual"? How much effort do you think it would take to have VirtualFs tests behind a feature flag (treating it as a separate "host" altogether), and do you reckon it would actually make sense?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So.. my thinking here is that because Descriptor involves non-virtual variants, in the strictest sense we ought to run on all hosts. Slightly less interesting, we'd need to pick some host to run *_virtualfs tests on, unless we codify the idea of a "virtual" host to the point of having it define "OsHandle" for tests too.

Realistically, your recommendation to adjust dispatch to VirtualFs to consistently be before getting reaching host-specific impls would help here. At that point I think running these on only one host (linux, because I'm biased?) is less risky .. when doing full CI runs, anyway.

I think the last reason to run VirtualFs tests for each host is local development; Running on every host is helpful for local platform-specific development, since you'll only run one host's view. Do you think it makes sense to make it conditional on tests are running in CI?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I see what you mean by having a mix of the two: a host plus some virtual fs on top of it. The reason I mentioned it here is since there seems to be a lot of code duplication so I was trying to figure out if we could somehow reduce it especially since it seems to be very much copy-paste reuse. 🤔

I think the last reason to run VirtualFs tests for each host is local development; Running on every host is helpful for local platform-specific development, since you'll only run one host's view. Do you think it makes sense to make it conditional on tests are running in CI?

I think I'd actually be more comfortable always having it on so that we make sure we don't break it with any changes to wasi-common in the future. In fact, I think that full test coverage that's always on trumps some code duplication here and there. But it would be nice to have the first without the second ;-)

@kubkon
Copy link
Member

kubkon commented Feb 18, 2020

This is really great, thanks for working on this! I've left a few comments, some of them smaller, some bigger. I hope that most of them actually makes sense!

@sunfishcode
Copy link
Member

sunfishcode commented Feb 19, 2020

I don't have a full-fledged idea here, but I'm curious if you could comment on whether you need the full VirtualFile trait for your use case, or could you get by with a smaller trait that just has read/write/etc. stream-like methods but not path-oriented operations?

@iximeow
Copy link
Contributor Author

iximeow commented Feb 19, 2020

I do not imagine a need for the full VirtualFile trait. In fact, I rather dislike exposing all of VirtualFile because a correct (in terms of errors, flags, etc) implementation is a bit burdensome. It wasn't clear at first how we could have a more minimal public interface but maybe a trait exposing only read/write/pread/pwrite, which wasi-common uses to build a full VirtualFile could do?

@iximeow
Copy link
Contributor Author

iximeow commented Feb 28, 2020

I've now merged #945, so if you could rebase to master, would be great, thanks!

done!

I'm curious if you could comment on whether you need the full VirtualFile trait for your use case,

this question got me thinking about how to scope down VirtualFile to a smaller public part and I've done just that in cc8ee2a. There is now a FileContents trait that a wasi context can be provided. I'm fairly happy with this, but re-reading it this morning, I'll going to nix VirtualDirEntry in favor of a variants of VirtualDir::add_file like VirtualDir::build_file that can take a Box<dyn FileContents>. Happy to hear thoughts, but I'll still be tweaking just a little bit more.

edit: @sunfishcode I'm not sure how closely you're watching this PR, so I'm not sure if I ought to re-request review from you too, but I'll @ you here just because I'd like your thoughts on this FileContents approach.

... and looking back at the commit history, when this all looks good I'll give it a squash and a good commit message :)

this mirrors the POSIX error mode from readv/writev:
> EINVAL The sum of the iov_len values overflows an ssize_t value.
@iximeow iximeow requested a review from kubkon February 28, 2020 22:42
Copy link
Member

@sunfishcode sunfishcode left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This makes sense, thanks for making the new and more focused FileContents trait.

Also, as we discussed offline, I see the future direction here is to follow @lukewagner's idea and de-emphasize custom traits like this, and do more with interposition libraries linked in between an application and the host WASI exports. But that will depend on more infrastructure to support it, so the PR here looks good for now.

Copy link
Member

@kubkon kubkon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! If you could just check one final tiny nit, and fix formatting, I reckon we're good to go!

crates/wasi-common/src/sys/unix/hostcalls_impl/fs.rs Outdated Show resolved Hide resolved
@iximeow
Copy link
Contributor Author

iximeow commented Mar 6, 2020

Given the approvals and my one last commit being removal of an unused lifetime and some whitespace, I'm gonna press the button :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants