-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Virtual file support #701
Virtual file support #701
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall, this is a nice feature.
My main concern here is that it will be tempting to use this mechanism to create Linux-/proc
-like APIs, because, well, Linux does it, but also, it's a pretty simple way to expose new APIs to applications that doesn't require adding any new "syscalls" or dealing with libc, toolchain, WASI Subgroup, or other entities. However, /proc
-style APIs have several downsides:
- It ties API functionality to the concept of a filesystem, which some users of Wasmtime don't want to expose at all, making the functionality awkward for them to work with.
- Modules don't declare imports for these APIs, so there's no way to know statically which APIs a module uses.
- They don't take advantage of wasm's type system. Files just provide byte sequences and applications need to parse them.
In general, what I'd like to encourage is for embedders to use regular APIs with functions that can be imported. The current infrastructure for doing that isn't as convenient as it could be, but we'll be in better shape if we improve it rather than work around it.
That said, there are still uses for virtual filesystems, so I'm mainly looking for ideas for how we can encourage people to use it for exposing things which are inherently filesystem-like, and to avoid using it for general-purpose APIs.
@@ -3,7 +3,7 @@ use std::ops::{Deref, DerefMut}; | |||
use std::os::unix::prelude::{AsRawFd, RawFd}; | |||
|
|||
#[derive(Debug)] | |||
pub(crate) struct OsFile(fs::File); | |||
pub(crate) struct OsFile(pub(crate) fs::File); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OsFile
's contents are encapsulated so that we can hide some differences between platforms. Is it possible to organize things such that we don't need to make this pub(crate
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Recorganizing to keep OsFile
contents non-public really just needed an applicable FdEntry
constructor - I think this makes sense, but I'm not sure what might use FdEntry::from
outside wasi-common. Matching this change is pretty mechanical for callers, at least: callers would wrap the file in File::OsHandle(OsHandle::from(the_original_argument))
Pardon the rebase. Particularly if you think virtual files and directories are fair to live in wasi-common, I'm hopeful that not needing to fiddle with There are still several |
983d015
to
7f2bb9b
Compare
Okay! It works! I've made the I have a few bugfixes bundled in here, I'll point them out in comments and if anyone would rather see them parceled out into their own PRs I'm happy to do so. |
FileType::Symlink | ||
} else { | ||
FileType::Unknown | ||
match sflags { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a workaround for a yanix
bug: SFlag
should not be a bit field! It corresponds to stat.st_mode
file types, and use as a bitfield results in situations such as a symlink (0o12
) being also reported as a regular file (0o10
) and a character device (0o02
). Because of the lookup order here, when I adjusted path_get
to use wasi file types, symlinks file types were being changed to character devices :( There will be a followup PR to fix this in yanix.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's an interesting one, thanks for catching it! I think though that SFlag
is correctly a bit field as prescribed by POSIX (see inode(7)). However, is it that we were using it wrong all along? POSIX seems to prescribe that we should do bit masking with SFlag::S_IFMT
to get the filetype.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the mask is to pick the four bits that comprise the enumeration - note that the example is if ((sb.st_mode & S_IFMT) == S_IFREG) {
rather than if ((sb.st_mode & S_IFREG) == S_IFREG) {
.
Also, if it's a bit mask, I don't think there would be a way to disambiguate
S_IFSOCK 0140000 socket
S_IFLNK 0120000 symbolic link
in the face of S_IFREG
, S_IFDIR
, and S_IFCHR
overlapping with one of those bits?
I think SFlag
ended up not used much in wasi-common, I didn't see other use sites with a quick look when I was changing path_get
, so maybe no one was "using" it wrong in the first place? 😁
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, exactly, the S_IFMT
is to pick the bits that comprise the filetype enumeration in st_mode
while ignoring the mode and permission bits. I guess the naming is off here. st_mode
is the bit field, whereas we should have used an enumeration type like #[repr(u16)] SFiletype
or whatnot to represent only the filetype info, remembering though not to include S_IFMT
as part of the enumeration, but rather use it to construct the values. Would you agree here? The interesting thing though is that the same inaccuracy is present in nix-rust and I wonder if it's actually used as intended there 🤔.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm really glad you found this bug actually, since as it turns out, when converting between SFlags
and host::FileType
(in wasi-common), we don't handle the case of the (named) pipe (S_IFIFO
).
I've now drafted out a rewrite of this and would really appreciate your feedback. The PR: #945. I've also cc'd you in the PR itself. Also, it'd be great if you could verify that the proposed PR fixes the bug you've discovered.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've given it a quick read and it looks good! I'm pretty sure it fixes this, though I'll take a closer look Monday or Tuesday.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agh, I neglected to update here - yes, this is fixed with #945!
} | ||
}) | ||
_ => { | ||
unreachable!("streams do not have paths and should not be accessible via PathGet"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will this remain true, generally, as wasi-common grows an idea of streams outside stdin/stdout/stderr?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Excellent question, and while I don't possess a clear-cut answer for this, I think it will not remain true. But then again, I think that introducing streams into wasi-common
will require at least some rewriting of some syscalls. I don't have a problem leaving it as unreachable
. Alternatively, I guess we could change it to unimplemented
but I agree with your train of thoughts here, only if we're certain this will apply to streams as well. All in all, I'm voting for leaving that in as-is.
crates/wasi-common/src/fdentry.rs
Outdated
pub(crate) fn as_handle<'descriptor>(&'descriptor self) -> Handle<'descriptor> { | ||
match self { | ||
Self::VirtualFile(virt) => Handle::VirtualFile(virt.as_ref()), | ||
Self::OsHandle(handle) => Handle::OsHandle(handle), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a workaround for another subtle bug/point of confusion: descriptor_as_oshandle
is lossy on BSDs. BSD OsHandle
maintains wasi-common-side state to handle readdir
subtleties, but descriptor_as_oshandle
constructs an OsHandleRef
by making a new File
from the shared fd and putting that in a new OsHandle
it holds with ManuallyDrop
. This discards wasi-common-side state for the file and results in incorrect readdir
behavior.
I think generally we probably don't want OsHandleRef
while also having state on OsHandle
, and my first draft of this PR in fact removed the one and only place OsHandleRef
was used, but as of last week I ran into some issues here. So I'm noting this here and if I've been convincing I'll copy the above into an issue.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's some great observation, thanks for pointing it out! We indeed seem to be discarding the state of OsHandle
on BSD which contains a DIR*
stream essentially allowing us for correctly handling seek, tell and rewinds in fd_readdir
call. I wonder if that's a problem for OsHandleRef
since I always assumed that would point at a stream-like file descriptor for which fd_readdir
wouldn't really make much sense in its current form? Just thinking out loud here, it'd probably be best if @sunfishcode chipped in here as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, OsHandleRef
was motivated by the need to abstract over files and stdin/stdout/stderr, so it's aimed at stream-like use cases, and not fd_readdir
. However, I agree that this is a really subtle issue that would be good to think through.
The new Handle
type introduced here is a lot like Descriptor
. Would it be feasible to just pass around Descriptor
references, rather than create Handle
/HandleMut
objects to pass around? Similarly, it may make sense to pass around Descriptor
references instead of OsHandleRef
too, and only convert to File
or similar at the point of calling into yanix/winx/libc.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've added a commit removing Handle
and HandleMut
: 0c571c6. I originally added them for a similar reason as OsHandleRef
- to abstract over the different Stream
variants. It turns out that there really isn't much benefit of that though, either I'd call as_os_handle()
anyway, or having any non-File
-like is an error.
It would be nice if we could rely on the type system to enforce that anything in sys
(or later, calling yanix
and friends) is a non-virtual handle, a nice benefit of OsHandle
/OsHandleRef
. I imagine wasi-common streams will be a specific set of functionality still on top of an OsHandle
so I think we could get away with passing around OsHandle
internally, rather than full descriptors.
This commit does a bit of everything: refactors bits here and there, fixes a bug discovered in another bytecodealliance#701, and combines all structs that we used in `yanix` and `wasi-common` crates to represent file types on *nix into one struct, `yanix::file::FileType`. Up until now, in `yanix`, we've had two separate structs used to represent file types on the host: `yanix::dir::FileType` and `yanix::file::SFlags` (well, not quite, but that was its main use). They both were used in different context (the former when parsing `dirent` struct, and the latter when parsing `stat` struct), they were C-compatible (as far as their representation goes), and as it turns out, they shared possible enumeration values. This commit combines them both into an idiomatic Rust enum with the caveat that it is now *not* C-compatible, however, I couldn't find a single use where that would actually matter, and even if it does in the future, we can simply add appropriate impl methods. The combine `yanix::file::FileType` struct can be constructed in two ways: 1) either from `stat.st_mode` value (and while we're here, now it's done correctly according to POSIX which fixes the bug mentioned in VFS impl PR bytecodealliance#701), or 2) from `dirent.d_type` value. Also, since we now have one struct for representing both contexts, this cleans up nicely a lot of duplicated code in `host` module.
// TODO: virtfs files cannot be poll_oneoff'd yet | ||
"poll_oneoff_virtualfs" => true, | ||
// TODO: virtfs does not support filetimes yet. | ||
"path_filestat_virtualfs" | | ||
"fd_filestat_set_virtualfs" => true, | ||
// TODO: virtfs does not support symlinks yet. | ||
"nofollow_errors_virtualfs" | | ||
"path_link_virtualfs" | | ||
"readlink_virtualfs" | | ||
"readlink_no_buffer_virtualfs" | | ||
"dangling_symlink_virtualfs" | | ||
"symlink_loop_virtualfs" | | ||
"path_symlink_trailing_slashes_virtualfs" => true, | ||
// TODO: virtfs does not support rename yet. | ||
"path_rename_trailing_slashes_virtualfs" | | ||
"path_rename_virtualfs" => true, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, this looks like we're redoing every VirtualFs
test on every host. Am I correct in assuming so, and do we actually need to do that given that our de facto host is "virtual"? How much effort do you think it would take to have VirtualFs
tests behind a feature flag (treating it as a separate "host" altogether), and do you reckon it would actually make sense?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So.. my thinking here is that because Descriptor
involves non-virtual variants, in the strictest sense we ought to run on all hosts. Slightly less interesting, we'd need to pick some host to run *_virtualfs
tests on, unless we codify the idea of a "virtual" host to the point of having it define "OsHandle" for tests too.
Realistically, your recommendation to adjust dispatch to VirtualFs
to consistently be before getting reaching host-specific impls would help here. At that point I think running these on only one host (linux, because I'm biased?) is less risky .. when doing full CI runs, anyway.
I think the last reason to run VirtualFs
tests for each host is local development; Running on every host is helpful for local platform-specific development, since you'll only run one host's view. Do you think it makes sense to make it conditional on tests are running in CI
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I see what you mean by having a mix of the two: a host plus some virtual fs on top of it. The reason I mentioned it here is since there seems to be a lot of code duplication so I was trying to figure out if we could somehow reduce it especially since it seems to be very much copy-paste reuse. 🤔
I think the last reason to run VirtualFs tests for each host is local development; Running on every host is helpful for local platform-specific development, since you'll only run one host's view. Do you think it makes sense to make it conditional on tests are running in CI?
I think I'd actually be more comfortable always having it on so that we make sure we don't break it with any changes to wasi-common
in the future. In fact, I think that full test coverage that's always on trumps some code duplication here and there. But it would be nice to have the first without the second ;-)
This is really great, thanks for working on this! I've left a few comments, some of them smaller, some bigger. I hope that most of them actually makes sense! |
I don't have a full-fledged idea here, but I'm curious if you could comment on whether you need the full |
I do not imagine a need for the full |
this fixes an issue where setting fd flags would not get propagated to all open InMemoryFile, but instead only affect the InMemoryFile handle that was set
also check against VirtualFile getting into sys hostcall_impl::path_rename
also remove some println that never should have made it in
done!
this question got me thinking about how to scope down edit: @sunfishcode I'm not sure how closely you're watching this PR, so I'm not sure if I ought to re-request review from you too, but I'll @ you here just because I'd like your thoughts on this ... and looking back at the commit history, when this all looks good I'll give it a squash and a good commit message :) |
this mirrors the POSIX error mode from readv/writev: > EINVAL The sum of the iov_len values overflows an ssize_t value.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This makes sense, thanks for making the new and more focused FileContents
trait.
Also, as we discussed offline, I see the future direction here is to follow @lukewagner's idea and de-emphasize custom traits like this, and do more with interposition libraries linked in between an application and the host WASI exports. But that will depend on more infrastructure to support it, so the PR here looks good for now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! If you could just check one final tiny nit, and fix formatting, I reckon we're good to go!
Given the approvals and my one last commit being removal of an unused lifetime and some whitespace, I'm gonna press the button :) |
This PR adds support for virtual files (so, file-like objects not necessarily backed by bytes on a local disk) for most WASI APIs. It also includes an implementation of virtual files (
InMemoryFile
) and directories (VirtualDir
). Functionally this is transparent to users of wasi-common.The intent here is to support operations on entities that are logically files, even if they may not be what the host OS calls "a file". These might be entirely artificial (like
/dev/random
, though__wasi_random_get()
already exists), static data acquired over the network at open rather than at read, or some other file-like semantic that shouldn't be obligated to go through a filesystem.For tests that have preopened directories, this duplicates the tests for
VirtualDir
/InMemoryFile
-backed testing. Actual test runner changes are pretty small as well: https://github.com/bytecodealliance/wasmtime/pull/701/files#diff-6bb07f6d2e62ac9c41406cd4d888db1aR29-R41. Additionally, I have a small branch againstlucet
, bytecodealliance/lucet#375, to see this in action on our test programs there.(edit: updated description to reflect that this is no longer a draft PR)