-
Notifications
You must be signed in to change notification settings - Fork 3.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Prevent multiple fetch + initialize for side modules #14820
Comments
Adding this check Edit: Looks like I've run into issues when calling functions on other threads... as expected I suppose. So this cannot work for me as is, because 42mb -> 1.3gb is just not very worthwhile just to use shared modules. I'm actually mostly interested in runtime dynamic loading game modules, rather than engine modules, and the initial 140+ engine modules could be compiled as one MAIN_MODULE, so another way I've thought about doing this is to just build all the engine modules as a single module with MAIN_MODULE=2 and then at runtime I can load additional modules using I guess this next question should probably have it's own thread and probably better suited to llvm, but would it not be possible to split __wasm_apply_data_relocs and __wasm_call_ctors into multiple functions if they are too large? Is there any way this is possible today, if not, what would have to change to make this possible? |
What do you mean by the function being too large? Do you get an error? (I would expect a function so large to not be fully optimized, but I'd hope it still runs.) |
Combining all those engines modules has caused the (__wasm_apply_data_relocs)
method to grow to 8mb with full optimizations, Oz and O3 are similar.
The error Chrome gives is this
```ModuleError: CompileError: WebAssembly.instantiate(): size 8101201 >
maximum function size 7654321```
…On Mon, Aug 9, 2021, 5:45 PM Alon Zakai ***@***.***> wrote:
MAIN_MODULE would be the standard way to do this, yes, and to load
libraries in side modules.
What do you mean by the function being too large? Do you get an error? (I
would expect a function so large to not be fully optimized, but I'd hope it
still runs.)
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
<#14820 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ACBQKLZVX2CLJOQYQ6S3SFDT4BSBXANCNFSM5BXPSXWA>
.
|
Interesting, yes, that is actually hitting the VM limit then... As this is emitted by LLVM, I think the fix should be there. Specifically, I assume |
I wonder if we can instead move the relocations into data. Then the code could be tiny (and potentially not linker-generated). It sounds like you are passing all the side modules on the command line (so that MAIN_MODULE=2 works correctly?) but that you don't actually want to load them on startup? Is that right? If that is correct then we the |
Also, dynamic linking + pthreads is experimental right now, and one of the limitation is that runtime loading (loading after startup) basically doesn't work. |
Essentially the engine I'm using has hot reloading for modules, note I'm not talking about emscripten or wasm modules, just code that can be linked statically or as a shared library; on windows this works fine, on the web compiling them all statically works fine without MAIN_MODULE but so far I've not had success compiling them as shared libraries. Originally, as in my first post, I tried compiling all modules as separate shared libraries (SIDE_MODULE) except for the launch module (MAIN_MODULE with the side modules listed). However it wants to load the side modules on each thread, that's 140x32 times the modules need to load, blowing out the memory on the devices I want to use this on. Even using -s AUTOLOAD_DYLIBS=0 does not avoid the problem, it just allows me to load some libraries later via loadDynamicLibrary but they all need to get instantiated on each thread. I imagine this would be easy to fix once interface types are ready. For now, I've decided to tackle something slightly easier, instead of compiling all the engine modules as shared, I compile them all statically as one MAIN_MODULE, no shared modules yet - the reason I even added MAIN_MODULE is so that I can at least load up new modules dynamically and hopefully this avoids needing to instantiate the module multiple times. This would allow me to code new parts of the engine or the game in the browser. |
Also, are the relocations even necessary (in the MAIN_MODULE) if the only need is to load libraries at runtime via dlopen? If not, would it be possible to have MAIN_MODULE=3 that only works with runtime / dlopen'ing libraries? |
In theory MAIN_MODULE could/should be built without relocations... but last time I tried to do that I hit a roadblock. The problem is that side modules are loaded before the main module which means that static data region is not fixed in size.. which means addresses are not known at link time. One solution to that problem would be to load the main module first and somehow allocate the space for the side modules using the start of the heap (basically dump sbrk)... but that would also mean that calls from the main module into the side modules are forced to go via JS (as opposed to today where they can be direct but calls in the other direction are via JS). |
Also, even if we make MAIN_MODULE non-relocatable is doesn't solve the general problem of |
@sbc100 I'm guessing a fix for this would probably be dependent on #18376? |
Re: Re: Using postMessage, this could work, but even with #18376 it can only work in the dlopen case if workers are returning the event loop (which, in general, they don't). It should always work for preloadDylibs though, so we should probably do that. |
Fixing #18552 will help some use cases like this. |
So I have two builds of my program, a game engine, one where it's a single 37mb wasm file (5mb br compressed).
With my game engine, I use all threads available on my machine via Navigator.hardwareConcurrency (32). This works fine, as only the .js files are duplicated and my game starts up on my machine within 1-2 seconds with ~700ms of that been fetching and initializing the .js and wasm file and the remaining been engine + game startup time.
Now, I've started experimenting with hot reloading parts of my engine, to do this I've decided to use shared libraries and this has produced about 120 libraries around a few hundred KB and a couple libraries that are 11MB, 17MB. This has made the total size a little larger, around 42mb (6mb compressed), the size is not the issue though.
Now that the code is split among many wasm modules, emscripten is loading up each library (125 modules or 42mb) * threads (32), resulting in 1.3gb of code been downloaded and instantiated.
Is there currently a way to do what I'm trying to achieve in an efficient way, or would I have to modify emscripten for this purpose?
Also, slightly related, I can understand the worker.js files needing to match the threads, but is there not a way around the fetching of the extra non worker .js files?
Thanks
The text was updated successfully, but these errors were encountered: