-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add new_file #3
base: main-test-workflow
Are you sure you want to change the base?
Add new_file #3
Conversation
In order to migrate branches to ISLE, we define a second entry point `lower_branch` which gets the list of branch targets as additional argument. This requires a small change to `lower_common`: the `isle_lower` callback argument is changed from a function pointer to a closure. This allows passing the extra argument via a closure. Traps make use of the recently added facility to emit safepoints from ISLE, but are otherwise straightforward.
…alliance#3739) This commit updates the allocation of a `VMExternRefActivationsTable` structure to perform zero malloc memory allocations. Previously it would allocate a page-size of `chunk` plus some space in hash sets for future insertions. The main trick here implemented is that after the first gc during the slow path the fast chunk allocation is allocated and configured. The motivation for this PR is that given our recent work to further refine and optimize the instantiation process this allocation started to show up in a nontrivial fashion. Most modules today never touch this table anyway as almost none of them use reference types, so the time spent allocation and deallocating the table per-store was largely wasted time. Concretely on a microbenchmark this PR speeds up instantiation of a module with one function by 30%, decreasing the instantiation cost from 1.8us to 1.2us. Overall a pretty minor win but when the instantiation times we're measuring start being in the single-digit microseconds this win ends up getting magnified!
…lliance#3741) * Don't copy `VMBuiltinFunctionsArray` into each `VMContext` This is another PR along the lines of "let's squeeze all possible performance we can out of instantiation". Before this PR we would copy, by value, the contents of `VMBuiltinFunctionsArray` into each `VMContext` allocated. This array of function pointers is modestly-sized but growing over time as we add various intrinsics. Additionally it's the exact same for all `VMContext` allocations. This PR attempts to speed up instantiation slightly by instead storing an indirection to the function array. This means that calling a builtin intrinsic is a tad bit slower since it requires two loads instead of one (one to get the base pointer, another to get the actual address). Otherwise though `VMContext` initialization is now simply setting one pointer instead of doing a `memcpy` from one location to another. With some macro-magic this commit also replaces the previous implementation with one that's more `const`-friendly which also gets us compile-time type-checks of libcalls as well as compile-time verification that all libcalls are defined. Overall, as with bytecodealliance#3739, the win is very modest here. Locally I measured a speedup from 1.9us to 1.7us taken to instantiate an empty module with one function. While small at these scales it's still a 10% improvement! * Review comments
/bench_x64 |
b91d0dc
to
8ff5de0
Compare
bench |
/binch_x64 |
/bench_x64 |
8ff5de0
to
2cc4b39
Compare
X86_blah |
/x86_64 |
/Bench_x64 |
2cc4b39
to
3090ec0
Compare
/bench_x64
|
Performance results based on clockticks comparison with main HEAD (higher %change shows improvement):
|
3090ec0
to
9ea1089
Compare
/Bench_x64 |
/bench_x64
|
/bench_x64 Performance results based on clockticks comparison with main HEAD (higher %change shows improvement):
|
As first suggested by Jan on the Zulip here [1], a cheap and effective way to obtain copy-on-write semantics of a "backing image" for a Wasm memory is to mmap a file with `MAP_PRIVATE`. The `memfd` mechanism provided by the Linux kernel allows us to create anonymous, in-memory-only files that we can use for this mapping, so we can construct the image contents on-the-fly then effectively create a CoW overlay. Furthermore, and importantly, `madvise(MADV_DONTNEED, ...)` will discard the CoW overlay, returning the mapping to its original state. By itself this is almost enough for a very fast instantiation-termination loop of the same image over and over, without changing the address space mapping at all (which is expensive). The only missing bit is how to implement heap *growth*. But here memfds can help us again: if we create another anonymous file and map it where the extended parts of the heap would go, we can take advantage of the fact that a `mmap()` mapping can be *larger than the file itself*, with accesses beyond the end generating a `SIGBUS`, and the fact that we can cheaply resize the file with `ftruncate`, even after a mapping exists. So we can map the "heap extension" file once with the maximum memory-slot size and grow the memfd itself as `memory.grow` operations occur. The above CoW technique and heap-growth technique together allow us a fastpath of `madvise()` and `ftruncate()` only when we re-instantiate the same module over and over, as long as we can reuse the same slot. This fastpath avoids all whole-process address-space locks in the Linux kernel, which should mean it is highly scalable. It also avoids the cost of copying data on read, as the `uffd` heap backend does when servicing pagefaults; the kernel's own optimized CoW logic (same as used by all file mmaps) is used instead. [1] https://bytecodealliance.zulipchat.com/#narrow/stream/206238-general/topic/Copy.20on.20write.20based.20instance.20reuse/near/266657772
Testing so far with recent Wasmtime has not been able to show the need for avoiding the process-wide mmap lock in real-world use-cases. As such, the technique of using an anonymous file and ftruncate() to extend it seems unnecessary; instead, memfd can always use anonymous zeroed memory for heap backing where the CoW image is not present, and mprotect() to extend the heap limit by changing page protections.
…nchtrap s390x: Migrate branches and traps to ISLE
Even though the implementation of emit and emit_safepoint may be platform-specific, the interface ought to be common so that other code in prelude.isle may safely call these constructors. This patch moves the definition of emit (from all platforms) and emit_safepoint (s390x only) to prelude.isle. This required adding an emit_safepoint implementation to aarch64 and x64 as well - the latter is still a stub as special move mitosis handling will be required.
Move emit and emit_safepoint to prelude.isle
With the addition of `sock_accept()` in `wasi-0.11.0`, wasmtime can now implement basic networking for pre-opened sockets. For Windows `AsHandle` was replaced with `AsRawHandleOrSocket` to cope with the duality of Handles and Sockets. For Unix a `wasi_cap_std_sync::net::Socket` enum was created to handle the {Tcp,Unix}{Listener,Stream} more efficiently in `WasiCtxBuilder::preopened_socket()`. The addition of that many `WasiFile` implementors was mainly necessary, because of the difference in the `num_ready_bytes()` function. A known issue is Windows now busy polling on sockets, because except for `stdin`, nothing is querying the status of windows handles/sockets. Another know issue on Windows, is that there is no crate providing support for `fcntl(fd, F_GETFL, 0)` on a socket. Signed-off-by: Harald Hoyer <harald@profian.com>
(This was not a correctness bug, but is an obvious performance bug...)
afb0d6e
to
4e5c126
Compare
/bench_x64 |
16 similar comments
/bench_x64 |
/bench_x64 |
/bench_x64 |
/bench_x64 |
/bench_x64 |
/bench_x64 |
/bench_x64 |
/bench_x64 |
/bench_x64 |
/bench_x64 |
/bench_x64 |
/bench_x64 |
/bench_x64 |
/bench_x64 |
/bench_x64 |
/bench_x64 |
Requested from pull request comment. Shows clockticks reduced. 1-Patch/Main (positive pct is better)
|
/bench_x64 |
1 similar comment
/bench_x64 |
Requested from pull request comment. Shows clockticks reduced. 1-Patch/Main (positive pct is better)
|
Requested from pull request comment. Shows clockticks reduced. 1-Patch/Main (positive pct is better)
|
/bench_x64 |
Requested from pull request comment. Shows clockticks reduced. 1-Patch/Main (positive pct is better)
|
* Integrate experimental HTTP into wasmtime. * Reset Cargo.lock * Switch to bail!, plumb options partially. * Implement timeouts. * Remove generated files & wasm, add Makefile * Remove generated code textfile * Update crates/wasi-http/Cargo.toml Co-authored-by: Eduardo de Moura Rodrigues <16357187+eduardomourar@users.noreply.github.com> * Update crates/wasi-http/Cargo.toml Co-authored-by: Eduardo de Moura Rodrigues <16357187+eduardomourar@users.noreply.github.com> * Extract streams from request/response. * Fix read for len < buffer length. * Formatting. * types impl: swap todos for traps * streams_impl: idioms, and swap todos for traps * component impl: idioms, swap all unwraps for traps, swap all todos for traps * http impl: idiom * Remove an unnecessary mut. * Remove an unsupported function. * Switch to the tokio runtime for the HTTP request. * Add a rust example. * Update to latest wit definition * Remove example code. * wip: start writing a http test... * finish writing the outbound request example havent executed it yet * better debug output * wasi-http: some stubs required for rust rewrite of the example * add wasi_http tests to test-programs * CI: run the http tests * Fix some warnings. * bump new deps to latest releases (#3) * Add tests for wasi-http to test-programs (#2) * wip: start writing a http test... * finish writing the outbound request example havent executed it yet * better debug output * wasi-http: some stubs required for rust rewrite of the example * add wasi_http tests to test-programs * CI: run the http tests * bump new deps to latest releases h2 0.3.16 http 0.2.9 mio 0.8.6 openssl 0.10.48 openssl-sys 0.9.83 tokio 1.26.0 --------- Co-authored-by: Brendan Burns <bburns@microsoft.com> * Update crates/test-programs/tests/http_tests/runtime/wasi_http_tests.rs * Update crates/test-programs/tests/http_tests/runtime/wasi_http_tests.rs * Update crates/test-programs/tests/http_tests/runtime/wasi_http_tests.rs * wasi-http: fix cargo.toml file and publish script to work together (#4) unfortunately, the publish script doesn't use a proper toml parser (in order to not have any dependencies), so the whitespace has to be the trivial expected case. then, add wasi-http to the list of crates to publish. * Update crates/test-programs/build.rs * Switch to rustls * Cleanups. * Merge switch to rustls. * Formatting * Remove libssl install * Fix tests. * Rename wasi-http -> wasmtime-wasi-http * prtest:full Conditionalize TLS on riscv64gc. * prtest:full Fix formatting, also disable tls on s390x * prtest:full Add a path parameter to wit-bindgen, remove symlink. * prtest:full Fix tests for places where SSL isn't supported. * Update crates/wasi-http/Cargo.toml --------- Co-authored-by: Eduardo de Moura Rodrigues <16357187+eduardomourar@users.noreply.github.com> Co-authored-by: Pat Hickey <phickey@fastly.com> Co-authored-by: Pat Hickey <pat@moreproductive.org>
…dealliance#7029) * Rename `Host*` things to avoid name conflicts with bindings. * Update to the latest resource-enabled wit files. * Adapting the code to the new bindings. * Update wasi-http to the resource-enabled wit deps. * Start adapting the wasi-http code to the new bindings. * Make `get_directories` always return new owned handles. * Simplify the `poll_one` implementation. * Update the wasi-preview1-component-adapter. FIXME: temporarily disable wasi-http tests. Add logging to the cli world, since stderr is now a reseource that can only be claimed once. * Work around a bug hit by poll-list, fix a bug in poll-one. * Comment out `test_fd_readwrite_invalid_fd`, which panics now. * Fix a few FIXMEs. * Use `.as_ref().trapping_unwrap()` instead of `TrappingUnwrapRef`. * Use `drop_in_place`. * Remove `State::with_mut`. * Remove the `RefCell` around the `State`. * Update to wit-bindgen 0.12. * Update wasi-http to use resources for poll and I/O. This required making incoming-body and outgoing-body resourrces too, to work with `push_input_stream_child` and `push_output_stream_child`. * Re-enable disabled tests, remove logging from the worlds. * Remove the `poll_list` workarounds that are no longer needed. * Remove logging from the adapter. That said, there is no replacement yet, so add a FIXME comment. * Reenable a test that now passes. * Remove `.descriptors_mut` and use `with_descriptors_mut` instead. Replace `.descriptors()` and `.descriptors_mut()` with functions that take closures, which limits their scope, to prevent them from invalid aliasing. * Implement dynamic borrow checking for descriptors. * Add a cargo-vet audit for wasmtime-wmemcheck. * Update cargo vet for wit-bindgen 0.12. * Cut down on duplicate sync/async resource types (#1) * Allow calling `get-directories` more than once (#2) For now `Clone` the directories into new descriptor slots as needed. * Start to lift restriction of stdio only once (#3) * Start to lift restriction of stdio only once This commit adds new `{Stdin,Stdout}Stream` traits which take over the job of the stdio streams in `WasiCtxBuilder` and `WasiCtx`. These traits bake in the ability to create a stream at any time to satisfy the API of `wasi:cli`. The TTY functionality is folded into them as while I was at it. The implementation for stdin is relatively trivial since the stdin implementation already handles multiple streams reading it. Built-in impls of the `StdinStream` trait are also provided for helper types in `preview2::pipe` which resulted in the implementation of `MemoryInputPipe` being updated to support `Clone` where all clones read the same original data. * Get tests building * Un-ignore now-passing test * Remove unneeded argument from `WasiCtxBuilder::build` * Fix tests * Remove some workarounds Stdio functions can now be called multiple times. * If `poll_oneoff` fails part-way through, clean up properly. Fix the `Drop` implementation for pollables to only drop the pollables that have been successfully added to the list. This fixes the poll_oneoff_files failure and removes a FIXME. --------- Co-authored-by: Alex Crichton <alex@alexcrichton.com>
No description provided.