Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Expose memory from generated JavaScript #2102

Closed
RReverser opened this issue Apr 25, 2020 · 22 comments
Closed

Expose memory from generated JavaScript #2102

RReverser opened this issue Apr 25, 2020 · 22 comments

Comments

@RReverser
Copy link
Member

Motivation

Currently, Wasm memory is accessible only from inside the Wasm module (Rust code) via bindings but not exposed to JavaScript.

This means that examples like raytrace-parallel have to wait for instantiation to finish and for corresponding code to be reached, before sending the module + memory pair to another Worker.

Module is the easy bit, as it can be constructed by the user via manual WebAssembly.compileStreaming API, but constructing the memory requires knowing the limits defined by the module.

At the same time, memory object as generated by wasm-bindgen for the threaded case, doesn't require full module instantiation, and so exposing it from JS could allow application to instantiate Wasm over the same SAB in all modules in parallel, without waiting for the main thread to finish.

Proposed Solution

Export memory factory function from the generated JavaScript code, so that user would be able to construct and pass it manually via existing 2nd param to init.

@Pauan
Copy link
Contributor

Pauan commented Apr 25, 2020

This means that examples like raytrace-parallel have to wait for instantiation to finish and for corresponding code to be reached, before sending the module + memory pair to another Worker.

I think the current behavior is good, because the first time it's instantiated it will be the "main thread", whereas the other instantiations are the "threads".

If you send the module + memory to the other workers, you can get a race condition where sometimes the worker will finish first (and thus it will be the "main thread"), and sometimes it'll finish after (and thus it will be a "thread").

So even if you were able to spawn everything in parallel, you'd still have to guarantee that the main thread is instantiated and run first (before instantiating any of the workers).

And this does actually matter, because it affects the behavior of static, and in the future we will have different behavior for the main thread and spawned threads (e.g. the memory allocator will have different implementations).

Incidentally, this behavior also matches real OS processes, where you run your main process and it then spawns sub-processes (or threads).

@RReverser
Copy link
Member Author

RReverser commented Apr 25, 2020

If you send the module + memory to the other workers, you can get a race condition where sometimes the worker will finish first (and thus it will be the "main thread")

Huh, wait, is main thread determined purely by order of instantiation?

Incidentally, this behavior also matches real OS processes, where you run your main process and it then spawns sub-processes (or threads).

Sure, but OS threads have different costs of instantiation. On the Web, being able to pre-instantiate Wasm worker pool without actually running any useful code could be more benefitial, and can still be made transparently to the user.

@Pauan
Copy link
Contributor

Pauan commented Apr 25, 2020

Huh, wait, is main thread determined purely by order of instantiation?

Yes.

Let's say that hypothetically it was determined by a flag or something, static variables would be initialized in the main thread, but if a worker finishes before the main thread, then the worker could read an uninitialized static variable (which is undefined behavior). So that doesn't work.

It's pretty much unavoidable that the main thread must be initialized first:

This synchronization may involve waiting, so in a web context the runtime must
ensure that the module is instantiated either first on the browser's main thread
without racing with worker threads or not at all on the browser's main thread.

The only exception to this is if you're running the Wasm entirely in workers (no main thread at all), in which case it would be possible to initialize all the workers in parallel.

On the Web, being able to pre-instantiate Wasm worker pool without actually running any useful code could be more benefitial, and can still be made transparently to the user.

  1. Spawn a worker pool.

  2. Instantiate the Wasm on the main thread (this can be done in parallel with step 1).

  3. postMessage the module + memory to the workers.

  4. When receiving the module + memory the workers then instantiate the Wasm.

This can be done today, without any changes to wasm-bindgen.

Note that when you use postMessage with a Module and SharedArrayBuffer, the browser will share them, so it won't recompile the Module on each worker, instead it only needs to compile it once.

@RReverser
Copy link
Member Author

then the worker could read an uninitialized static variable (which is undefined behavior)

Which code are you referring to? AFAIK most of Wasm out there doesn't use start section (which would be the only way to run any Rust code and access such static variable), but instead relies on custom functions like _start (WASI) or __wbindgen_start (wasm-bindgen).

As long as you're only instantiating modules in parallel, and not invoking that function yet, any code wouldn't execute yet and no UB would occur.

This can be done today, without any changes to wasm-bindgen.

Unless I'm missing something new about the mechanism you're proposing, that's how existing examples (including the raytrace-parallel mentioned in the description) already work.

What I'm proposing is building on top of that and providing further optimisation by ensuring that the JS (Wasm) engine can actually instantiate the module as soon as possible and not wait for the main thread to finish initialisation.

@RReverser
Copy link
Member Author

Note that this also doesn't requite any substantial changes to wasm-bindgen or how it operates; all I'm asking for is to export the (already existing) memory construction to JavaScript so that app authors could use them if they wish to.

@Pauan
Copy link
Contributor

Pauan commented Apr 25, 2020

As long as you're only instantiating modules in parallel, and not invoking that function yet, any code wouldn't execute yet and no UB would occur.

The #[wasm_bindgen(start)] runs as soon as the module has been instantiated. Also you might run some .js code in the worker which calls a Rust function after the Wasm has been instantiated.

It has nothing to do with the start section. If you're instantiating the worker and the main thread in parallel, it's possible for the worker to be fully instantiated before the main thread has been instantiated. That is the race condition that I'm talking about.

that's how existing examples (including the raytrace-parallel mentioned in the description) already work.

No, because the raytrace example only spawns the worker pool after the main thread is instantiated, whereas the strategy I posted allows you to spawn the worker pool in parallel with the instantiation.

What I'm proposing is building on top of that and providing further optimisation by ensuring that the JS (Wasm) engine can actually instantiate the module as soon as possible and not wait for the main thread to finish initialisation.

As I explained, that's only possible if all of your Wasm is running in workers (no main thread).

Note that this also doesn't requite any substantial changes to wasm-bindgen or how it operates; all I'm asking for is to export the (already existing) memory construction to JavaScript so that app authors could use them if they wish to.

It has nothing to do with how substantial the changes are, what you are asking for is just not compatible with how the main thread works.

The only way your proposal could work is if we added a new --target worker, which would allow you to instantiate the Wasm on workers (but not the main thread).

@RReverser
Copy link
Member Author

Also you might run some .js code in the worker which calls a Rust function after the Wasm has been instantiated.

Well, it's up to me as an app developer not to do that.

I feel you're conflating module instantiation in JS API sense (literally just WebAssembly.instantiate) with the surrounding JavaScript glue code generated by wasm-bindgen.

I want to parallelise the former, which wouldn't cause any of the mentioned issues (as long as start section is not involved, which is the reason I mentioned it).

The JavaScript, and, correspondingly, Rust code would still execute in the same order as it does now and thus such change would be completely unobservable, while improving startup time for additional threads.

@Pauan
Copy link
Contributor

Pauan commented Apr 26, 2020

Well, it's up to me as an app developer not to do that.

To be clear, the __wasm_init_memory function is used as the Wasm start function, so it will always run even if you don't call any Rust functions. So no matter what you have a race condition.

In addition, you seem to have ignored what I showed earlier:

This synchronization may involve waiting, so in a web context the runtime must
ensure that the module is instantiated either first on the browser's main thread
without racing with worker threads or not at all on the browser's main thread.

If there is a race condition, the main thread will throw an exception, because the main thread cannot block. This happens even if you don't call any of the Rust functions.

It is simply not possible to initialize the main thread and worker threads in parallel, because of these race conditions.

I feel you're conflating module instantiation in JS API sense (literally just WebAssembly.instantiate) with the surrounding JavaScript glue code generated by wasm-bindgen.

No, I have always been referring to WebAssembly instantiation. Even just calling WebAssembly.instantiate causes race conditions. That's what I have been trying to explain to you. You cannot "pre-initialize" the Wasm. You will get race conditions, and you will get errors.

@RReverser
Copy link
Member Author

Note that __wasm_init_memory uses a fence and is non-destructive. It doesn't matter which thread invokes it, there will always be only one thread that calls it and ends up initialising the memory, but on its own that function doesn't determine which thread is "main".

Overall, I'm somewhat surprised by this attitude and responses so far to be honest :/

I'm fully in control of my app and I know what I'm doing there. All I'm asking is to expose an already existing primitive that would help me and possibly other use-cases. Moreover, as I said, doesn't require any changes to how wasm-bindgen works and requires explicit usage / opt-in by developer (sort of like Rust's opt-in).

I've described my motivation in the issue description just to provide one of use-cases where it was handy, but instead we're now discussing particular app architecture and WebAssembly in general in length. I'm always happy to do that sometime at a conference afterparty (if we still get those someday 😅), but it doesn't seem right for a Github issue.

Note that if this primitive is not provided, it's not that the usages will stop, it's just that now developers will have to resort to guessing or hardcoding the limits when manually constructing new SharedArrayBuffer(...) and passing it to the init function. Now, this would be quite error-prone and undesirable - that's why I'm asking to expose something wasm-bindgen already has.

In the future situation might change when js-types API is widely implemented (for now we have it only under a flag in V8), and users will be able to access any level of detailed information about module imports/exports directly, but meanwhile exposing it from wasm-bindgen would make task much easier and more robust than any hardcoded numbers.

@Pauan
Copy link
Contributor

Pauan commented Apr 27, 2020

It doesn't matter which thread invokes it, there will always be only one thread that calls it and ends up initialising the memory, but on its own that function doesn't determine which thread is "main".

Yes, there will only be one thread that initializes the memory, but that's not the problem.

I'm fully in control of my app and I know what I'm doing there.

And my point is that it's impossible to avoid race conditions and use it correctly. It doesn't matter how much control you have, or how skilled you are, it is impossible to avoid the race condition.

Overall, I'm somewhat surprised by this attitude and responses so far to be honest :/

Of course I would have no problem with exposing the shared memory... if it actually worked. But it doesn't.

but instead we're now discussing particular app architecture and WebAssembly in general in length

I don't know why you got that impression, everything I have said is about this specific proposal and why it doesn't work.

Note that if this primitive is not provided, it's not that the usages will stop, it's just that now developers will have to resort to guessing or hardcoding the limits when manually constructing new SharedArrayBuffer(...) and passing it to the init function.

If somebody does that, they will have the exact same race condition, and they will get errors.


You seem to be misunderstanding how this all works. So let's suppose we made the changes you want, here is what would happen:

  1. LLVM defines a __wasm_init_memory function. Rust uses LLVM, so therefore Rust uses this function when compiling to Wasm.

  2. The Wasm module has a start section which calls __wasm_init_memory. This start section is the same for the main thread and the workers.

  3. You send the Module + SharedArrayBuffer to the workers.

  4. You instantiate the Module + SharedArrayBuffer in parallel with the main thread and the workers.

  5. During instantiation it automatically calls the start section, which in turn calls __wasm_init_memory.

  6. If the main thread gets instantiated first, then everything is okay.

  7. But let's suppose that the worker gets instantiated first. In that case the __wasm_init_memory function will run on the worker.

  8. The __wasm_init_memory function uses a Wasm global to handle synchronization. This Wasm global starts at 0.

    The __wasm_init_memory function checks if it is the first instantiation (i.e. if the global is 0). If so, it then sets the global to 1.

    Since the worker is being instantiated first, it sets the global to 1 and then does the memory initialization.

  9. While this is happening, the main thread now starts to instantiate. It also checks the global, but the global is now 1, which means it does an atomic wait.

    This atomic wait is necessary because it has to wait for the worker to finish initializing the memory.

    However, this atomic wait is blocking, and blocking atomic wait throws an error on the main thread (because blocking is not allowed on the main thread).

    It cannot use async atomic wait because the start section runs synchronously and the memory must be synchronously initialized.

You'll note that this situation will happen even if you don't call any of the exported functions, because the problem is caused by __wasm_init_memory, which is in the start section, which is automatically run during instantiation.

It is simply not safe to instantiate workers in parallel with the main thread, period. You will cause a race condition, and it will end up sporadically causing errors.

No matter what you do, you cannot stop this. If you instantiate in parallel and haven't run into any errors yet, that's just pure luck, because this race condition always exists.

The only solution is to instantiate the module on the main thread first, before instantiating it on the workers.

Any other solution will require changes in LLVM, because Rust uses LLVM for its multi-threading support, and it's LLVM that defines the __wasm_init_memory function.

@RReverser
Copy link
Member Author

I don't know why you got that impression, everything I have said is about this specific proposal

It still sounds more like discussing the specific usecase I provided, rather than proposal in general (exposing memory).

But okay, reasonable enough, thanks for bearing with me and your explanations. I just didn't expect the dicussion to go that deep :)


Let's backtrack a bit and look at a simpler usecase / architecture that hopefully won't be controversial. I'll take yours as a base:

  1. Spawn a worker pool.
  2. Instantiate the Wasm on the main thread (this can be done in parallel with step 1).
  3. postMessage the module + memory to the workers.
  4. When receiving the module + memory the workers then instantiate the Wasm.

Let's say I want to perform step (3) here. How do I do that?

Well, turns out, the only way is, again, to do this only from the WebAssembly side because it's the only one that has bindings to memory, even though actual memory lives on JS side.

So if I want to postMessage module and memory from my JavaScript, I still have to export some function from WebAssembly that would either call postMessage or at least return wasm_bindgen::memory as simply a return value, that will go via wasm-bindgen back to JS, add memory to reference table, Wasm returns, JS wrapper extracts memory back from reference table, and finally returns to me.

This is quite a bit of unnecessary indirection to return JS value to JS code.

Moreover, in this case we do have to execute not only instantiate[Streaming], but also run JS / Rust initialisation on the main thread before getting that memory object, because the wasm-bindgen wrappers above most likely depend on it being already finished and all variables initialised and so on.

@Pauan
Copy link
Contributor

Pauan commented Apr 27, 2020

It still sounds more like discussing the specific usecase I provided, rather than proposal in general (exposing memory).

Because wasm-bindgen doesn't expose the memory before instantiation, this forces you to instantiate the module before instantiating it on the workers.

The only reason to expose the memory is in order to instantiate it in parallel with the workers, which is not safe. So I think it is relevant to the proposal in general.

Well, turns out, the only way is, again, to do this only from the WebAssembly side because it's the only one that has bindings to memory, even though actual memory lives on JS side.

I don't think it's correct to think of it as "JS side" or "WebAssembly side". The memory is created and owned by wasm-bindgen. Whether it's a part of the "JS side" or "WebAssembly side" is just an implementation detail which can change. And in the future it likely will change.

I still have to export some function from WebAssembly that would either call postMessage or at least return wasm_bindgen::memory as simply a return value

You can actually access the memory by accessing the __wbindgen_export_0 property:

const wasm = await init();
const memory = wasm.__wbindgen_export_0;

That's an implementation detail though, so I really don't recommend doing that. Instead the best option is indeed to just create an export in Rust:

#[wasm_bindgen]
pub fn shared_memory() -> JsValue {
    wasm_bindgen::memory()
}

If you're suggesting that wasm-bindgen should make it easier to access the memory from an instance without needing a Rust export, I agree.

However we'll need to be careful about name collisions. Right now wasm-bindgen is actually quite bad about name collisions (in general). I have some ideas about that, but it would be a breaking change.

This is quite a bit of unnecessary indirection to return JS value to JS code.

I suppose, though it's really just a couple nanoseconds, the overhead of passing is extremely small.

but also run JS / Rust initialisation on the main thread before getting that memory object, because the wasm-bindgen wrappers above most likely depend on it being already finished and all variables initialised and so on.

I'm not sure what JS / Rust initialization you're referring to: the JS glue code just calls instantiateStreaming, initializes the memory, and sets a couple variables.

It really doesn't do anything (unless you use #[wasm_bindgen(start)] of course). So there's basically no overhead compared to just running instantiateStreaming.

@RReverser
Copy link
Member Author

If you're suggesting that wasm-bindgen should make it easier to access the memory from an instance without needing a Rust export, I agree.

Yeah, that's what I'm suggesting. I think same would apply to few other things (e.g. module or some implicit imports), but these are out of scope of this issue.

You can actually access the memory by accessing the __wbindgen_export_0 property:

Huh, I thought in case of imported memory it's not exported. This could help, but yeah, I'm not comfortable relying on it in any way.

However we'll need to be careful about name collisions. Right now wasm-bindgen is actually quite bad about name collisions (in general).

I wonder if simply changing the export name to "memory" in that case would be sufficient for now?

That's the export name LLVM already uses by default (any non-threaded outputs), so presumably there is very low risk of it conflicting with anything else.

@Pauan
Copy link
Contributor

Pauan commented Apr 27, 2020

Yeah, that's what I'm suggesting.

Okay, though that won't fix your original request.

I wonder if simply changing the export name to "memory" in that case would be sufficient for now?

It already is called memory, but when multi-threading is enabled it then gets renamed to __wbindgen_export_0 (for some reason).

So yeah, hardcoding it as memory seems reasonable to me.

@alexcrichton
Copy link
Contributor

Sorry I've only skimmed the discussion here a bit, but the original issue should be solved by importing the *_bg.wasm file which should have the memory export. Currently memory is intentionally not exported from the JS shim because that's largely what interface types is intended for ("shared-nothing linking", in that implementation details are all internal).

Other than that I think @Pauan already covered this but you probably don't want to instantiate modules in parallel across workers and the main thread because the main thread cannot block.

In general though I don't think this is necessarily something that wasm-bindgen itself would handle (if you'd like to do this by instantiating in parallel across only workers). It's pretty easy for a wasm tool to switch the memory export to a memory import and then you'd know the limits to create in JS and then pass as imports everywhere.

@RReverser
Copy link
Member Author

Sorry I've only skimmed the discussion here a bit, but the original issue should be solved by importing the *_bg.wasm file which should have the memory export.

This would be sufficient, but it seems like right now it's renamed to __wbindgen_export_0 in threaded build. If it remained exported as memory, that would be enough to solve the issue.

@alexcrichton
Copy link
Contributor

Ah ok makes sense! That sounds like a bug in one of the possible passes along the way, likely something that's de-exporting it and then re-exporting it later, forgetting the original export name along the way.

@RReverser
Copy link
Member Author

Okay, though that won't fix your original request.

@Pauan I didn't immediately realised what you meant by this. It gets me much closer, but yeah, doesn't resolve the original request.

To recheck my understanding on previous comments - are you saying that using memory in the way I want to would be fine in the following scenario:

As I explained, that's only possible if all of your Wasm is running in workers (no main thread).

?

If so, then it's perfectly sensible for me (I'm already Worker for the "main" thread for other reasons), but yeah, still requires exposing Wasm memory limits from JS code somehow.

@Pauan
Copy link
Contributor

Pauan commented Apr 29, 2020

are you saying that using memory in the way I want to would be fine in the following scenario: if all of your Wasm is running in workers (no main thread).

Yes, at least for now. In the future there will be other differences between the main thread and workers, but for now it should be safe to run only on workers[1].

That's why earlier I suggested a --target worker, which would expose the memory and could only be run on workers. We've needed a --target worker for other things as well, so I think we should consider it.

  • [1]: Whichever worker finishes first will initialize the memory, but since they're all workers, that doesn't really matter.

@RReverser
Copy link
Member Author

Sounds good; do you think it's worth closing this issue in favour of two separate issues for --target worker and __wbindgen_export_0 fix? Seems like they're two very separate things, and someone coming across this discussion probably won't be able to extract useful pieces at this point.

@Pauan
Copy link
Contributor

Pauan commented Apr 29, 2020

That sounds good to me.

@RReverser
Copy link
Member Author

@Pauan I've submited an issue for the memory export above, but I wonder if --target worker proposal is something you'd prefer to write up? I can create a basic issue, but from your comment it sounds like you've already gave this more thought and might have a more fleshed-out proposal.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants