-
Notifications
You must be signed in to change notification settings - Fork 641
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Map WASM memory allocation APIs to IREE's HAL #5137
Comments
I spent some time evaluating wasm3 and Wasmtime's APIs from this perspective:
Both seem architecturally more flexible when it comes to memory allocation and memory space management, but I don't quite see a way to satisfy our requirements yet. More discussions on Discord about this here (wasm3) and here (wasmtime). Generally, we want to allocate a block of memory to be managed by an IREE "device" and shared between wasm modules (IREE "executables"). I think this can be solved using memory exports and imports (it's basically how SharedArrayBuffer is used on the web?), though I'm still searching for concrete examples and documentation from these runtimes. We're also wondering about thread safety (Discord discussion) - locking to allocate or load/unload a module is fine but we shouldn't need to take a lock to safely call stateless functions, for example. |
Quick question: would the WebAssembly multi-memory proposal be useful for this? To avoid the external call cost, the device/HAL allocator logic could be baked into each module but operating on a single buffer shared between all modules, with other buffer(s) allocated per-module for isolated memory space as needed. |
@mykmartin it may be (my hope is that it is :) Our usage really needs there to be a way to allocate a growable block of memory independent of any wasm module that we can then provide to each wasm module as we load them. The default memory of each wasm module is where stacks would live while the bulk data we'd work with would come from the shared memory. It's hard for me to see in the spec if this is something the spec even cares about or if it's purely something related to the engines. From what we looked into most engines assume that they create all the memory for the loaded modules instead of allowing imports. The proposal spec looks like it would be very compatible with this approach as we could do this with just two data segments and we can easily assign the pointer address spaces in LLVM that end up as the data segment identifiers on instructions from our generated code. Then we just need the engines to have an "allocate a growable memory instance" and "import this memory instance during module instantiation". We also could get by without multi-memory support if we could do the same "allocate a growable memory instance" and "use this memory instance during module instantiation" - we'd then assign the stack offsets ourselves and have all modules share the same exact memory. In browser land this would be like having a SharedArrayBuffer that all loaded wasm modules used - which would be useful (in my previous life I worked on Google Maps and wanted the same feature there for multithreaded decoding into staging buffers for GPU upload). (posted some more details about what we are doing here: #2863 (comment), which shows where multi-memory may help) |
The other thing multi-memory may allow in the future is importing of large read-only constant buffers. In ML inference this would be things like your model weights (which can be 10-100MB, or much larger 2GB). Today we would need to copy those into the wasm-accessible memory - which is similar to what we need to do for GPUs with discrete memory - but it'd be nice to not have to given that the bytes already exist. If we could create a wasm_memory_t with existing read-only contents then we could import that without the full alloc+copy. |
fyi the wasmtime team have just added the multi-memory option in the C API: bytecodealliance/wasmtime#3066 - would that have an impact on the analysis in Scott's comment? |
Nice! That would require a small bit of compiler work to enable (just tagging LLVM pointers with the right address space, I believe) but nothing major - then we could evaluate an allocator that sliced from a wasm memory block and import that into each executable in the prototype wasmtime implementation Scott has. |
Ho there! This bug hasn't been updated in a long time. Good intentions and all, but we're moving this to the backlog. Feel free to bring it back if you think there's a reasonable chance it'll get worked on in the next 6mo! |
Splitting this off from #5096 and a discussion on Discord here.
TL;DR: WAMR allocates memory per module (executable). IREE wants to define an allocator up a level, shared across executables. What should we do?
WASM runtimes limit what memory WASM modules can access to a single contiguous memory address range that the module can suballocate within. Applications can typically create this block of memory, resize it, offer it to instantiated modules, etc. See this article for a pretty good overview.
IREE follows several APIs (like Vulkan) in using a hierarchical setup going from application contexts down to executables:
While implementing a WASM HAL driver using WAMR in #5096, we found that WAMR has a different memory allocation architecture in its "iwasm" VM core:
So,
At face value, an "IREE WAMR device" would need to either limit itself to one executable, or it would need multiple allocators.
WAMR has a
WAMR_BUILD_SHARED_MEMORY
CMake option /WASM_ENABLE_SHARED_MEMORY
C define that could help, but we still want isolation between drivers/devices. One of the main reasons we'd be using WASM would be for the memory sandbox.Notably, the WASM C API, which WAMR partially implements has a different model:
that seems to map more directly to IREE's architecture (
device
ordriver
would havewasm_engine_t
,device
would havewasm_store_t
andwasm_memory_t
,executable
would havewasm_module_t
).Here are a few of our options, none of which seem too favorable:
(A) Wait for WAMR to implement the latest WASM C API and use that, instead of their "iwasm" VM core
(B) Continue using WAMR's "iwasm" VM core, finding a workaround using shared memory
(C) Restrict devices in IREE's WASM HAL to one executable
(D) Externalize memory using native read/write functions loading/storing from our own heap, making each wasm environment use no local heap and contain compute-only code. This would require some further ahead-of-time compilation work on our end (turning loads/stores into calls, ensuring these loads/stores are handled in pages since they would be slow individually).
(E) Use a different WASM runtime (see this list). We picked WAMR as an initial target for its low footprint, portability, performance, and ease of integration (C/C++ and CMake with few dependencies). If we want an IREE WASM HAL to serve as a flexible deployment path, we can't really compromise on those points (e.g. by taking a Rust dependency or using a runtime that can't run on embedded systems).
The text was updated successfully, but these errors were encountered: