Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Prevent multiple fetch + initialize for side modules #14820

Open
warvstar opened this issue Aug 7, 2021 · 13 comments
Open

Prevent multiple fetch + initialize for side modules #14820

warvstar opened this issue Aug 7, 2021 · 13 comments

Comments

@warvstar
Copy link
Contributor

warvstar commented Aug 7, 2021

So I have two builds of my program, a game engine, one where it's a single 37mb wasm file (5mb br compressed).

With my game engine, I use all threads available on my machine via Navigator.hardwareConcurrency (32). This works fine, as only the .js files are duplicated and my game starts up on my machine within 1-2 seconds with ~700ms of that been fetching and initializing the .js and wasm file and the remaining been engine + game startup time.

Now, I've started experimenting with hot reloading parts of my engine, to do this I've decided to use shared libraries and this has produced about 120 libraries around a few hundred KB and a couple libraries that are 11MB, 17MB. This has made the total size a little larger, around 42mb (6mb compressed), the size is not the issue though.

Now that the code is split among many wasm modules, emscripten is loading up each library (125 modules or 42mb) * threads (32), resulting in 1.3gb of code been downloaded and instantiated.

Is there currently a way to do what I'm trying to achieve in an efficient way, or would I have to modify emscripten for this purpose?

Also, slightly related, I can understand the worker.js files needing to match the threads, but is there not a way around the fetching of the extra non worker .js files?

Thanks

@warvstar
Copy link
Contributor Author

warvstar commented Aug 7, 2021

Adding this check if (!ENVIRONMENT_IS_PTHREAD) before preloadDylibs(); inside run() has appeared to have solved the issue for me? Now they are only loaded once and I've tested calling between libraries and main and everything looks to be working fine.

Edit: Looks like I've run into issues when calling functions on other threads... as expected I suppose.

So this cannot work for me as is, because 42mb -> 1.3gb is just not very worthwhile just to use shared modules.

I'm actually mostly interested in runtime dynamic loading game modules, rather than engine modules, and the initial 140+ engine modules could be compiled as one MAIN_MODULE, so another way I've thought about doing this is to just build all the engine modules as a single module with MAIN_MODULE=2 and then at runtime I can load additional modules using emscripten_dlopen();. This looks like it should work, although it adds an 8+mb function (__wasm_apply_data_relocs) to the 40mb main module, this just happens to be too large of a function for Chrome and probably any other browser.

I guess this next question should probably have it's own thread and probably better suited to llvm, but would it not be possible to split __wasm_apply_data_relocs and __wasm_call_ctors into multiple functions if they are too large?

Is there any way this is possible today, if not, what would have to change to make this possible?

@kripken
Copy link
Member

kripken commented Aug 9, 2021

MAIN_MODULE would be the standard way to do this, yes, and to load libraries in side modules.

What do you mean by the function being too large? Do you get an error? (I would expect a function so large to not be fully optimized, but I'd hope it still runs.)

@warvstar
Copy link
Contributor Author

warvstar commented Aug 10, 2021 via email

@kripken
Copy link
Member

kripken commented Aug 10, 2021

Interesting, yes, that is actually hitting the VM limit then...

As this is emitted by LLVM, I think the fix should be there. Specifically, I assume __wasm_apply_data_relocs is created in wasm-ld, so perhaps it could create sub-functions when it's close to that limit, what do you think @sbc100 ?

@sbc100
Copy link
Collaborator

sbc100 commented Aug 10, 2021

I wonder if we can instead move the relocations into data. Then the code could be tiny (and potentially not linker-generated).

It sounds like you are passing all the side modules on the command line (so that MAIN_MODULE=2 works correctly?) but that you don't actually want to load them on startup? Is that right? If that is correct then we the -sAUTOLOAD_DYLIBS=0 might be what you want?

@sbc100
Copy link
Collaborator

sbc100 commented Aug 10, 2021

Also, dynamic linking + pthreads is experimental right now, and one of the limitation is that runtime loading (loading after startup) basically doesn't work.

@warvstar
Copy link
Contributor Author

warvstar commented Aug 10, 2021

It sounds like you are passing all the side modules on the command line (so that MAIN_MODULE=2 works correctly?) but that you don't actually want to load them on startup? Is that right? If that is correct then we the -sAUTOLOAD_DYLIBS=0 might be what you want?

Essentially the engine I'm using has hot reloading for modules, note I'm not talking about emscripten or wasm modules, just code that can be linked statically or as a shared library; on windows this works fine, on the web compiling them all statically works fine without MAIN_MODULE but so far I've not had success compiling them as shared libraries.

Originally, as in my first post, I tried compiling all modules as separate shared libraries (SIDE_MODULE) except for the launch module (MAIN_MODULE with the side modules listed). However it wants to load the side modules on each thread, that's 140x32 times the modules need to load, blowing out the memory on the devices I want to use this on. Even using -s AUTOLOAD_DYLIBS=0 does not avoid the problem, it just allows me to load some libraries later via loadDynamicLibrary but they all need to get instantiated on each thread. I imagine this would be easy to fix once interface types are ready.

For now, I've decided to tackle something slightly easier, instead of compiling all the engine modules as shared, I compile them all statically as one MAIN_MODULE, no shared modules yet - the reason I even added MAIN_MODULE is so that I can at least load up new modules dynamically and hopefully this avoids needing to instantiate the module multiple times.

This would allow me to code new parts of the engine or the game in the browser.

@warvstar
Copy link
Contributor Author

warvstar commented Aug 10, 2021

Also, are the relocations even necessary (in the MAIN_MODULE) if the only need is to load libraries at runtime via dlopen? If not, would it be possible to have MAIN_MODULE=3 that only works with runtime / dlopen'ing libraries?

@sbc100
Copy link
Collaborator

sbc100 commented Aug 10, 2021

In theory MAIN_MODULE could/should be built without relocations... but last time I tried to do that I hit a roadblock. The problem is that side modules are loaded before the main module which means that static data region is not fixed in size.. which means addresses are not known at link time. One solution to that problem would be to load the main module first and somehow allocate the space for the side modules using the start of the heap (basically dump sbrk)... but that would also mean that calls from the main module into the side modules are forced to go via JS (as opposed to today where they can be direct but calls in the other direction are via JS).

@sbc100
Copy link
Collaborator

sbc100 commented Aug 10, 2021

Also, even if we make MAIN_MODULE non-relocatable is doesn't solve the general problem of __wasm_apply_data_relocs being large which can also effect SIDE_MODULEs.

@dschuff
Copy link
Member

dschuff commented Jan 13, 2023

@sbc100 I'm guessing a fix for this would probably be dependent on #18376?
Also IIUC there are really 2 separate problems here: one is the title of the issue (redundant fetching/compiling of all modules across the threads, where we should be just postMessage-ing the modules), and the other is that __wasm_apply_data_relocs is too large; and I guess this is particular to use of dynamic modules because a statically linked module wouldn't need the data relocs at all...

@sbc100
Copy link
Collaborator

sbc100 commented Jan 13, 2023

Re: __wasm_apply_data_relocs, yes this only exists when in MAIN_MODULE and SIDE_MODULE binaries.

Re: Using postMessage, this could work, but even with #18376 it can only work in the dlopen case if workers are returning the event loop (which, in general, they don't). It should always work for preloadDylibs though, so we should probably do that.

@sbc100
Copy link
Collaborator

sbc100 commented Mar 9, 2023

Fixing #18552 will help some use cases like this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants