-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
It's wiggle time! #1202
It's wiggle time! #1202
Conversation
* Bump version to 0.48.0 * Re-enable `byteorder`'s default features. The code uses `WriteBytesExt` which depends on the `std` feature being enabled. So for now, just enable `std`.
Subscribe to Label ActionThis issue or pull request has been labeled: "w", "a", "s", "i" To subscribe or unsubscribe from this label, edit the |
1963b36
to
52a9467
Compare
self.0 | ||
} | ||
} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you just use the Into<u32>
implementation?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree it'd be good to have one way to do this, rather than two. Could I suggest preferring .inner()
over .into()
? Most of the code in wasi-common should treat these as opaque, so it'd be nice for the places where we do need to peek inside to stand out.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, this must be a remnant of my hack to not stifle progress with wasi-common
while some required changes to wiggle
were in progress. This change is no longer part of this PR, so please ignore it.
OK, I'm marking this as ready for review. Here's the summary of changes:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very exciting! Here's a first round of comments:
crates/wasi-common/src/ctx.rs
Outdated
fd_pool: FdPool, | ||
entries: HashMap<wasi::__wasi_fd_t, Entry>, | ||
fd_pool: RefCell<FdPool>, | ||
entries: RefCell<HashMap<types::Fd, Entry>>, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fd_pool
and entries
need to stay in sync with each other. How awkward would it be to have a single RefCell
that holds a struct containing both?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, excellent observation!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, I've now added an aux struct called EntryTable
which holds both fd_pool
and entries
, and inside WasiCtx
it's behind a single RefCell
thus both working in sync.
return Err(Errno::Badf); | ||
} | ||
Ok(Ref::map(self.entries.borrow(), |entries| { | ||
entries.get(&fd).unwrap() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This isn't strictly incorrect (at least while we only have one thread), but it feels awkward. Would it work to do something like this? A little verbose still, but it only does one borrow and one entries lookup.
let maybe = Ref::map(self.entries.borrow(), |entries| {
entries.get(&fd)
});
if maybe.is_none() {
return Err(Errno::BadF);
}
Ref::map(maybe, |maybe_entry| maybe_entry.unwrap())
}
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Awesome, this is exactly what I wanted to do and couldn't figure out how. Thanks! :-D
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, this proves to be more difficult than I originally envisioned. If I follow your suggestion, than we end up with a lifetime mismatch since the first maybe
will contain weirdness of the like &Option<&Entry>
, and the compiler cannot assign approriate lifetimes to the final Ref<'_, Entry>
. Perhaps when we revisit the use of RefCell
in wasi-common
altogether, this will no longer be an issue?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, if it gets awkward, then I agree it makes sense to leave the code as-is for now.
} | ||
Ok(RefMut::map(self.entries.borrow_mut(), |entries| { | ||
entries.get_mut(&fd).unwrap() | ||
})) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And similar here? If so, we may not even need the contains_key
method, which would be nice because it can be tricky to use without look-before-you-leap hazards.
@@ -122,7 +124,7 @@ impl<'ctx> Dir<'ctx> { | |||
/// [`std::fs::File::create`]: https://doc.rust-lang.org/std/fs/struct.File.html#method.create | |||
pub fn create_file<P: AsRef<Path>>(&mut self, path: P) -> io::Result<File> { | |||
let path = path.as_ref(); | |||
let mut fd = 0; | |||
let mut fd = types::Fd::from(0); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of from
, what would you think of naming this method from_raw
, to emphasize that this call is not something one should do often.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, I've actually tweaked this only so that it builds. I was going to cc you about the fs
module and how we'd want to handle things here :-)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@pchickey What do you think of having a from_raw
method for handles in wiggle
?
mut argv: GuestPtr<'b, GuestPtr<'b, u8>>, | ||
mut argv_buf: GuestPtr<'b, u8>, | ||
) -> Result<()> { | ||
trace!("args_get(argv_ptr={:?}, argv_buf={:?})", argv, argv_buf); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a path to where these trace!
calls could be auto-generated by wiggle?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Excellent question! Thanks for bringing it up! @pchickey and I had some thoughts about this, and as far as I can tell, he would rather avoid adding logging to wiggle
(which is where we'd want this to end up actually). I haven't put that much thought to it just yet, hence why I left the traces in wasi-common
, but I'm open to suggestions here!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could there be a log
feature of wiggle
which generates log statements? That way wasmtime could turn it on but if it's not needed in lucet it could be disabled?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, I believe what @alexcrichton suggests sounds like the solution to this problem. @pchickey @sunfishcode what do you guys reckon? Oh, and until that lands in wiggle
, my suggestion would be to leave the trace
s as-is in wasi-common
. What do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, and I think we should do the same for wasi-common
as well. We've got a couple log
calls here and there for debugging and what not, and I believe that @pchickey would like to have a version stripped off all altogether for lucet/xqd.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not suggesting we do it immediately in this PR, but we should have it in mind. :-)
use std::ops::DerefMut; | ||
use wiggle_runtime::{GuestBorrows, GuestPtr}; | ||
|
||
impl<'a> WasiSnapshotPreview1 for WasiCtx { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Implementing this as a trait for WasiCtx
is an interesting idea, and it is pretty fun to do ctx.fd_close
as the code does above, but does this make it harder to share a WasiCtx
between snapshot versions in the same wasm module?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Excellent question! So I think it won't be that difficult. However, I suggest to leave that for a subsequent PR. I'm purposely migrated all impls into one module and directly into the trait impl so that we can have a better idea how to tackle the problem of multiple snapshots/polyfill. Does it make sense?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd actually imagine that it makes it easier to share a WasiCtx
between versions because we could implement both traits for one struct!
); | ||
|
||
// Extract path as &str. | ||
let mut bc = GuestBorrows::new(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What does bc
stand for?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You got me, I was just following @alexcrichton's example in wiggle
's tests :-} I'm happy to change it to something more meaningful, say borrows
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
its short for "borrow checker".
|
||
// Extract path as &str. | ||
let mut bc = GuestBorrows::new(); | ||
let path = unsafe { &*path.as_raw(&mut bc)? }; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you say more about the unsafe
here? I assume wiggle has done the bounds checking before we get here, is that right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's correct, as_raw
does the bounds checking. We need unsafe
here since we're dereferencing a raw *mut _
pointer, and in this particular case, we're obtaining &str
. The signature of this function here is:
fn as_raw(...) -> Result<*mut _, GuestError>
This is unsafe since we cannot really track the references here other than by using GuestBorrows
. More on that here.
@pchickey I've now re-used |
// FIXME | ||
// Much like in the case with `path_symlink`, we need to account for the | ||
// fact where both `old_path` and `new_path` point the same memory location. | ||
// Hence, separate scopes and `GuestBorrows` instances. | ||
let resolved_old = { | ||
let mut bc = GuestBorrows::new(); | ||
let old_as_str = old_path.as_raw(&mut bc)?; | ||
let old_path = unsafe { &*old_as_str }; | ||
|
||
trace!(" | (old_path_ptr,old_path_len)='{}'", old_path); | ||
|
||
let old_entry = unsafe { self.get_entry(old_fd)? }; | ||
path::get( | ||
&old_entry, | ||
types::Rights::PATH_LINK_SOURCE, | ||
types::Rights::empty(), | ||
types::Lookupflags::empty(), | ||
old_path, | ||
false, | ||
)? | ||
}; | ||
let resolved_new = { | ||
let mut bc = GuestBorrows::new(); | ||
let new_as_str = new_path.as_raw(&mut bc)?; | ||
let new_path = unsafe { &*new_as_str }; | ||
|
||
trace!(" | (new_path_ptr,new_path_len)='{}'", new_path); | ||
|
||
let new_entry = unsafe { self.get_entry(new_fd)? }; | ||
path::get( | ||
&new_entry, | ||
types::Rights::PATH_LINK_TARGET, | ||
types::Rights::empty(), | ||
types::Lookupflags::empty(), | ||
new_path, | ||
false, | ||
)? | ||
}; | ||
path::link(resolved_old, resolved_new) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll try to beat everyone to the question here. I'm not sure how to handle this better with the current state of wiggle
(which is not to say we can't modify wiggle
to accommodate this case). So, I was forced to create two instances of GuestBorrows
since it is possible for the two path
s that are handed to us from guest to point at the same memory location (one use case being for instance symlink loops, etc.).
I was thinking, we could potentially check for that by comparing the pointers (if they implemented PartialEq
), but it still feels clunky. Perhaps we should revisit the concept of immutability in wiggle
? That is, differentiate immutable vs mutable borrows? In this case, I reckon it should be fine to borrow the same location immutably multiple times. Thoughts @sunfishcode @pchickey? @alexcrichton what do you reckon?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a great question. One alternative to creating two different borrowed path strings is to clone two path strings from the wasm memory into Rust owned String
s. That can be done safely without any borrow checking, since only one clone can be happening at a time. I expect, since paths shouldn't be too big, that is the right design choice here, except for the possibility of a DoS attack by creating two 4gb Strings here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Differentiating between mutable and immutable borrows sounds like a good way to go.
For example, POSIX doesn't say anything about link
's arguments aliasing or not, but most functions with output buffers, such as readlink are defined with restrict
qualifiers, meaning the output buffer is expected to not overlap with any of the input buffers. This also echos the Rust's borrow checker: you can have many aliasing immutable references, but a mutable reference must be unaliased.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've thought about this and I'm fairly convinced the right thing, for now, is to keep this implementation and leave a nice comment that outlines the design decisions as you just did in this thread.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't actually think that GuestBorrows
is really needed here at all because there are no long-lived pointers, I think it's fine to do what this is currently doing, but I might recommend making it a bit more concise with the comment I made above
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, I don't think I understand what you mean here by GuestBorrows
not needed here at all. I'm probably missing something but let me outline the things I do believe:
I agree that in this case, the borrow checking is vacuous. But there isn't presently another method besides GuestPtr::as_raw
that allows us to get a &str
out of this GuestPtr<str>
. We could add a method to GuestPtr that returns an owned String
, but I fear that is a DoS vector because the guest will control the size of that String, and therefore gets to allocate in Rust's heap. We haven't yet rigorously audited this codebase for DoS vectors, of course, but I'm wary of adding something I know I'll have to forbid actually using in production.
I do not think we should create a shortcut variant of GuestPtr::as_raw
to omit borrow checking which is only safe if the user ensures borrows are always short-lived, because I could see that leading to bugs when someone less familiar with the safety properties updates the code. Maybe that is too paranoid?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Exactly as @pchickey explained, obtaining a reference to a slice-type object in Rust from wiggle
is currently only possible through as_raw
method which requires a GuestBorrows
object in scope.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@pchickey er sorry I don't think I was being very clear! I don't think we should remove the GuestBorrows
argument or use owned strings here, I mean that each of these is resolved independently and there's never any overlapping borrows so documenting fixmes and/or hacks isn't necessary because each path is getting sucked into an owned object anyway so borrows never last long enough for runtime borrow-checking to be that relevant.
use wiggle_runtime::{GuestBorrows, GuestPtr}; | ||
|
||
impl<'a> WasiSnapshotPreview1 for WasiCtx { | ||
fn args_get<'b>( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@sunfishcode Note that we don't have unsafe
's in methods' signatures. I was somewhat worried about this since we're still dealing with file descriptors and other OS handles, but perhaps the entry points to WASI syscalls are OK being safe especially given that WASI fd is now a handle type? What do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Crazy idea that I haven't thought all the way through yet: Would it work if we made Fd
work like std::fs::File
?
- Make
Fd::from_raw
unsafe? , similar tofrom_raw_fd
being unsafe. - Make
Fd
not implementCopy
. ChangeFd
arguments to&Fd
. - Make
Fd
implementDrop
and have it automatically close. This might be hard though. Is there a way in Rust to makedrop
private, so that you have to pass it back to the context?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On further consideration, the Drop
part makes this unworkable. I'm going to investigate other approaches.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like the idea but yeah, Drop
will not work like this, unless we made the context object static/global. Having said that your suggestion aligns with my mental model of Fd
100%, so perhaps we could come up with something close to it but not necessarily one-to-one.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, thinking this through more carefully, maybe we can do this without the Drop
part. So make handle
types not implement Copy
, pass handle
arguments by reference rather than by value, and make the raw constructor unsafe
. That's not the same as std::fs::File
, but it still has some nice properties. We just need to be careful when calling the raw constructor, but there aren't many places where we need to do that outside of the generated wrappers. Does that sound feasible?
If you'd like to defer this to a later PR, that's fine too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good, and yeah, if we could defer to a subsequent PR, that would be perfect IMHO. There's already a substantial amount of tweaks I have to apply and I'm worried more and more will slip away. Is that OK?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking great @kubkon!
My main interest here is to reduce the amount of unsafe
as much as possible. I left a lot of comments along those lines, but let me know if anything stands out!
// FIXME | ||
// Much like in the case with `path_symlink`, we need to account for the | ||
// fact where both `old_path` and `new_path` point the same memory location. | ||
// Hence, separate scopes and `GuestBorrows` instances. | ||
let resolved_old = { | ||
let mut bc = GuestBorrows::new(); | ||
let old_as_str = old_path.as_raw(&mut bc)?; | ||
let old_path = unsafe { &*old_as_str }; | ||
|
||
trace!(" | (old_path_ptr,old_path_len)='{}'", old_path); | ||
|
||
let old_entry = unsafe { self.get_entry(old_fd)? }; | ||
path::get( | ||
&old_entry, | ||
types::Rights::PATH_LINK_SOURCE, | ||
types::Rights::empty(), | ||
types::Lookupflags::empty(), | ||
old_path, | ||
false, | ||
)? | ||
}; | ||
let resolved_new = { | ||
let mut bc = GuestBorrows::new(); | ||
let new_as_str = new_path.as_raw(&mut bc)?; | ||
let new_path = unsafe { &*new_as_str }; | ||
|
||
trace!(" | (new_path_ptr,new_path_len)='{}'", new_path); | ||
|
||
let new_entry = unsafe { self.get_entry(new_fd)? }; | ||
path::get( | ||
&new_entry, | ||
types::Rights::PATH_LINK_TARGET, | ||
types::Rights::empty(), | ||
types::Lookupflags::empty(), | ||
new_path, | ||
false, | ||
)? | ||
}; | ||
path::link(resolved_old, resolved_new) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't actually think that GuestBorrows
is really needed here at all because there are no long-lived pointers, I think it's fine to do what this is currently doing, but I might recommend making it a bit more concise with the comment I made above
Just a quick note -- there are a bunch of references to |
Thanks for the heads up! I have to admit, I've not paid too much attention to the docs yet. |
I'm curious what the story is for the old snapshot? It's already been somewhat painful when the two snapshots diverge in how they're defined idiomatically. Is the intention to migrate the old snapshot? Implement the old snapshot trait for the same |
So the current idea is to only migrate latest to |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a big milestone, thanks for all your hard work here! This looks ready to land to me, and we can follow up as discussed in separate PRs.
return Err(Errno::Badf); | ||
} | ||
Ok(Ref::map(self.entries.borrow(), |entries| { | ||
entries.get(&fd).unwrap() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, if it gets awkward, then I agree it makes sense to leave the code as-is for now.
This is a rather massive commit that introduces `wiggle` into the picture. We still use `wig`'s macro in `old` snapshot and to generate `wasmtime-wasi` glue, but everything else is now autogenerated by `wiggle`. In summary, thanks to `wiggle`, we no longer need to worry about serialising and deserialising to and from the guest memory, and all guest (WASI) types are now proper idiomatic Rust types. While we're here, in preparation for the ephemeral snapshot, I went ahead and reorganised the internal structure of the crate. Instead of modules like `hostcalls_impl` or `hostcalls_impl::fs`, the structure now resembles that in ephemeral with modules like `path`, `fd`, etc. Now, I'm not requiring we leave it like this, but I reckon it looks cleaner this way after all.
I've left the implementation of VirtualFs pretty much untouched as I don't feel that comfortable in changing the API too much. Having said that, I reckon `pread` and `pwrite` could be refactored out, and `preadv` and `pwritev` could be entirely rewritten using `seek` and `read_vectored` and `write_vectored`.
This commit adds aux struct `EntryTable` which is private to `WasiCtx` and is basically responsible for `Fd` alloc/dealloc as well as storing matching `Entry`s. This struct is entirely private to `WasiCtx` and as such as should remain transparent to `WasiCtx` users.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you @kubkon for all the outstanding work here and to @sunfishcode @alexcrichton for all of the excellent review feedback!
It's
wiggle
time!This PR is the beginning of a journey which aims at replacing
wig
crate withwiggle
. @pchickey and myself have been working onwiggle
for over a month now, and we feel we got to the point where it could successfully be used inwasi-common
,wasmtime
andlucet
. We also agreed that if any functionality is still missing, we should iterate in-tree if possible. Anyhow, this is a draft PR so things will invariably change (a lot!), but I wanted to get it out there sooner rather than later so that I can get some meaningful feedback early on and work on addressing that when things are still plastic.