-
Notifications
You must be signed in to change notification settings - Fork 691
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Zero-copy pass ArrayBuffer from JS-land to WebAssembly-land #1162
Comments
You're right that many Web APIs now incur extra copying overhead that we should work to remove. The problem with what you've proposed is that WebAssembly engines often want to have special representations (extra guard pages, page-aligned allocation) for I think a better way to achieve zero-copy is if, instead of allocating new |
Hmm, I'm not even sure this could be done in all cases. You don't know the size of a message up front and this sounds a lot like solving this problem of zero-copying by enforcing a completely new way of dealing with (in this case) networking. It is like killing a fly with a bazooka. I don't really see this as a solution. For once it would require quite the extensive overhaul of many Web APIs and secondly, like mentioned, I think this is not even possible in all cases. It seems super strange that something so essential to data management in JS, ArrayBuffer, would be so foreign to WebAssembly. |
In the case of WebSockets you could maybe just add a new binaryType called "memory":
This way all data passed to onmessage would be a WebAssembly.Memory. |
For streaming cases like reading data from a socket, I think the place to do this is part of the Streams API. As Streams become more prevalent, this means more APIs would automatically become wasm-friendly. The other big interesting case which isn't covered by streams is canvas/audio/video. I think these would/could know the size ahead of time. Note that these changes would also be an improvement for JS: by allowing the caller to reuse an ArrayBuffer, it would produce far less garbage. |
I agree with @alexhultman, I think this is something we should fix in WebAssembly rather than requesting various web APIs to provide a view. Another problem with that solution is that if that API is provided, it's likely that the API will have to copy out anyway, since it likely can't trust the lifetime of the data in the view. Providing a mechanism for WebAssembly to access data from an arbitrary ArrayBuffer is a much more powerful and (IMO) useful feature. It seems as though this is very naturally represented with multiple memories, which we've been planning from the start. This would mean that this There are definitely are some complications: the memory may no longer be page-sized, you'll likely want to unbind and rebind different memories, you can't use |
If the API is currently producing an
While there may be special cases where only a single Other problems with using an
However, there is another persistent idea I should've mentioned above that could actually fit in as part of Host Bindings (if go the |
Yes, I suppose so. Though it's possible that the API is providing direct access to its underlying data, where that can't be true with views. But you're right, maybe it's just changing who does the copy.
I was thinking you could rebind the memory in that case, where the compiled code wouldn't bake in the address or size. Rebinding could be a similar operation to
True, but perhaps it wouldn't be too much work to modify libjpeg to use the memory region pointers here instead. I don't know much about this C++ extension though (can't remember the actual name of it either).
I like this idea, but it does seem like it is more complicated than using static memory indices. |
The one example I can think of currently is where we create an
I think 'rebinding' would need to be a new fundamental semantic operation, then? It's certainly implementable, but I think a bit weird spec-wise given that imports/definitions aren't defined to be mutable locations (that's what we have globals for; you could imagine having storing a reference to a
It's possible, but I think it's a fairly non-trivial change to make to an existing codebase in general. You have to put a special
Yeah, both more expressive and more complicated. If we're adding |
If the solution is a significant overhaul of the entire (JS) Web API then why not cut to the chase and just define a standard C Web API and cut JS completely out of the picture. Then you could reach and control the browser without the need for intermediate JS wrappers just acting as inefficient delegates. |
Right, unless the ArrayBuffer was detached, to represent transfer. But no Web APIs do that aside from postMessage, I suppose.
Yeah, I'm not entirely certain how it would work. But if we only allow it for linear memory, and a Memory object is just a pointer to its data and its length, then I think it has similar behavior to growing memory -- the data pointer and length are different. Unbinding would be like detaching the buffer; set data pointer to null and length to 0. Ah,
I'm not sure how it solves the C++/rust problem though. You'll still need special objects to access this memory.
I assume you mean C bindings to existing web APIs? I like this idea, and I believe we've already talked about something like this for APIs like WebGL where there is an underlying C API that is well known. I'm not certain this can work for all Web APIs, though, as they often rely on JS-specific features that would be difficult (or maybe impossible) to provide from wasm. |
Right, that's what I said in my initial comment on
I don't think "overhaul" is quite right. We're talking about adding overloads to some existing methods, in a way that can be done incrementally, for the hottest methods first. |
Sorry, misread that comment.
We should definitely do this, and I agree that it shouldn't be too much burden for most APIs. Like you say, they're probably doing a copy anyway. My concern is mostly for cases where we aren't just giving data back to a web API, for decompressors and decoders and so on. We can just use typed arrays over WebAssembly memory for this too, but it's pretty unsatisfying to have to manage the lifetime of that data all the way through to the wasm module. |
I may not understand your meaning here but, except for cases that detach [1][2], Web APIs that want to use a view's data after the call returns need to make a synchronous copy, so it doesn't seem like lifetime would be an issue here. |
Here's an example of what I was thinking. Imagine you are using a zlib wasm module library. You don't know what the user of the library is going to do with the decompressed data, so you want to hand back an |
Ah hah, I see the direction you're talking about now. Yeah, I totally agree we should try to avoid such opportunities for wasm-level "leaks" and "use after free". So for the case of compression, encryption, and any large data processing, it seems like the best interface would be for the compressor to take a stream in and a stream out. This has parallelism and composability wins. I think we should strive, at both the toolchain and host-bindings-feature level, to make it easy/efficient to work with streams. |
My vote for this feature, if I understand it correctly. For general interest: There is a '.set' function for typed arrays which seems perfect to fill this hole. |
Sorry I'm a newbie to WebAssembly so am still learning and I may not fully understand the suggestions here. That said, am not sure streams are sufficient for solving the problem. They may make sense in the case of compression, encryption, etc. However things like FFTs generally require the full dataset to operate on or are at least difficult to implement for cases where only chunks of the data are supplied. So a stream won't cut it there. If the data is sufficiently large, copying the data will (if you are lucky) just be slow or (if not) crash the browser. What would you propose in this case? |
The comments before the streaming comment discuss extending Web APIs to, instead of returning data in new |
Views sound like a good idea. This is very similar to what people do to solve this problem in Python (e.g. the Python Buffer Protocol). Would it be possible to make read-only views? How would returning arrays work? |
So far, JS/WebIDL doesn't have read-only typed array views. For "returning arrays", the general idea is to replace an array return value with a mutable view parameter that is written into. The hard part is picking the size of that mutable view argument: I think there's multiple options here, and probably different things would be appropriate for different APIs. |
This comment has been minimized.
This comment has been minimized.
@frank-dspeed By cross origin, do you mean between different WebAssembly modules running in different iframes or windows? I would expect that kind of integration to work by something like a |
This comment has been minimized.
This comment has been minimized.
To provide a different viewpoint/use-case on this. Given a Since browsers are already free to cache/mmap/manage Streaming them is definitely NOT an option, the added latency quickly makes any meaningful queries (which often transitively load more datasets referenced in previous datasets) infeasible. It seems like a huge missed opportunity to get this right, given that The use of intrusive memory management techniques that preclude raw byte ranges for linear memory seems like a costly decision that precludes WASM from covering many real world use-cases. |
I don't see that explicitly mentioned so I'll add a common use case (at least for me) where this would be very helpful. To use multithreading I spawn several WebWorkers which shall communicate with each other. (e.g. work concurrently on the same possibly huge dataset). For that to work I create a SharedMemoryBuffer and pass it around. Each WASM-Module wraps a safe end-point implementation around that buffer (e.g. https://docs.rs/wasm-rs-shared-channel/0.1.0/wasm_rs_shared_channel/). Iiuc sending a chunk of bytes through such a channel would involve two copies (one on each end) to transmit them. With that shared memory being mapped into the WASM address space (as separate memory entity), this would be zero copy. Also I suspect that the coordination via atomic primitives involves a lot of overhead right now. But I have to admit that I didn't find out how I know that I could in some cases combine all binaries into a single memory and let this run on a shared memory to get the same effect. But the last time I tried this it was a very fragile setup with several downsides (Rust-panics not working, memory wasn't resizable, nightly Rust-compiler required, unable to share memory with external wasm-modules etc.) |
@kawogi Is that use-case not covered by shared Webassembly.Memory objects and multi-memory support? |
At least in the case of Rust (or other LLVM-based languages) multiple memory doesn't appear to be supported. See this Zulip thread: https://bytecodealliance.zulipchat.com/#narrow/channel/223391-wasm/topic/Multi-memory.20.2F.20shared.20memory/near/249000208 I've been exploring building a new file analyzer web app that uses WASM, and I'd like to be able to zero-copy a File's over to WASM. I can't seem to find a good way to do it. I've looked at a few potential solutions that failed for different reasons:
|
Yes, it is. I somehow assumed that making a MemoryBuffer available to WASM (this topic) was the same as creating a SharedMemory. Where can I read about the state of support for multi-memory support? I wasn't able to derive this info from https://caniuse.com/?search=wasm or https://github.com/WebAssembly/multi-memory/blob/main/proposals/multi-memory/Overview.md Thanks @a10y for the info and links! |
Would proposed JavaScript Immutable ArrayBuffers help? Would they conflict? |
Hey,
From reading about the WebAssembly.Memory there really is no way to pass an ArrayBuffer zero-copy to C++ from JS.
Example:
Let's say a WebSocket in the browser triggers the onmessage with an ArrayBuffer of the message data. Now I want to read this data in C++. From all solutions I've seen you really need to:
Imagine having a constructor like new WebAssembly.Memory(ArrayBuffer) so that you could skip all steps except for step 3.
Did I get something wrong or is this not supported?
The text was updated successfully, but these errors were encountered: