Add support for user JS library code to be able to depend on C functions #15982

juj · 2022-01-11T23:58:21Z

There is a long standing issue that user JS libraries cannot depend back on C functions without a) either modifying their Emscripten installation's src/deps_info.py file, or b) manually annotating EXPORTED_FUNCTIONS with all such C functions that the JS code will call back to.

Neither of these methods is tenable, since they both make it hard to share code between developers. (This limitation could be considered outright a bug)

This PR aims to add support for user JS library code to be able to depend on C functions, via a custom
'foo__wasm_deps: ['wasmFunc1', 'wasmFunc2', ...] directive.

This way JS functions can declare which C/Wasm functions they depend on, and they only need to know whether the depended function resides in Wasm or JS land, and either use foo__deps: ['jsFunc'] or foo__wasm_deps: ['wasmFunc'] to declare the dependency.

This is a barebones (but working) implementation, wanted to open this up for comments at this point.

The way this works is that JS libraries are parsed an additional time in the beginning to find the JS->wasm deps from them.

This will slow down builds, but slower is better than broken. However I appreciate that might create contention, so if there is pushback from someone not wanting/needing this machinery, I would be open to making this an optional cmdline flag that users have to pass to activate it (e.g. a -sSCAN_WASM_DEPS=1 or pile on a new mode on the existing -sREVERSE_DEPS flag?)

I think in the long term it would be better to move to an architecture like this in the system JS libraries, and nuke all the registrations from deps_info.py altogether. That way we can get closer to "user JS libraries" being at the same power level as "system JS libraries" are.

Thoughts? @kripken ?

…ons, via a custom 'foo__wasm_deps: ['wasmFunc1', 'wasmFunc2', ...] directive.

sbc100 · 2022-01-12T02:01:13Z

I'm definitely behind the core idea here.

I don't know if you know but I've already added an extra pass over the libraries as part of (optional) more accurate undefined symbol reporting:

https://github.com/emscripten-core/emscripten/blob/main/emcc.py#L440-L464

Here I use emscripten.compile_settings() to extract just a list of function names. If we are going to do a pre-pass of the libraries we should at least combine them so we don't ever do two.

Also, does this handle indirect reverse deps? e.g the case described in deps_info.py where we have user_code => other_libc_func => js_func => libc_func (llvm-nm on the object code only sees sees other_libc_func here)

sbc100 · 2022-01-12T02:12:05Z

What I've been really dreaming about is finding a way to have wasm-ld to all of the symbol resolution for us. The closest I've come to thinking up something that might work is.

For each library_foo.js create some kind of dummy libfoo.so linker input.
The dummy libfoo.so is fed to the linker but doesn't actually contain any code, just declaration of things that are external/undefined/JS.
The dummy libfoo.so would also need to encode a list of exports that would be needed if an import is linked (i.e.g the __deps info.

The flow would look something like this:

$ ./tools/extract_syms_with_deps mylib.js -o libmylib.so
$ cat libmylib.so
# Text format understood by wasm-ld
# Here we are declaring the foo is available as an import.  But it required that bar be exported in return.
foo: bar
$ wasm-ld object.o libmylib.so -o out.wasm

One of the primary advantage of doing all the symbol resolution in wasm-ld is that we get much nicer error message, for example we can report which object file contains the references to an undefined symbol.

juj · 2022-01-12T14:11:56Z

I don't know if you know but I've already added an extra pass over the libraries as part of (optional) more accurate undefined symbol reporting:

Thanks, I was not aware of this.

Also, does this handle indirect reverse deps? e.g the case described in deps_info.py where we have user_code => other_libc_func => js_func => libc_func (llvm-nm on the object code only sees sees other_libc_func here)

In this example, should user_code be a JS function or a wasm function?

If it is a JS function, then it is possible that the dual cycle js->wasm->js->wasm is not seen. Not sure how rare/common these are in general, I think there may have been such a scenario once.

Having LLVM do the full JS + Wasm side link would certainly be the ultimate solution.

kripken · 2022-01-12T18:43:17Z

Interesting...

Another way to get JS->wasm dep handling might be to use EM_JS instead of a JS library. Normally EM_JS is just another way to write a JS library - it basically compiles into the same thing - but it has the benefit of being in a file that LLVM actually sees. An example of what I mean:

#include <emscripten.h>

EM_JS_DEP(malloc); // This is a novel thing that would be needed

EM_JS(void, do_something, () {
  var buffer = _malloc(100);
  // ...
});

Imagine that EM_JS_DEP adds an export to that compilation unit. As a result, if lld ends up linking in this file it will export malloc. We could then write JS libraries in EM_JS in this way, annotating their deps in the C/C++ source files. So it's still manual annotation, but it would then all be left to lld to do. Thoughts?

sbc100 · 2022-01-12T18:50:10Z

Interesting...

Another way to get JS->wasm dep handling might be to use EM_JS instead of a JS library. Normally EM_JS is just another way to write a JS library - it basically compiles into the same thing - but it has the benefit of being in a file that LLVM actually sees. An example of what I mean:
#include <emscripten.h>

EM_JS_DEP(malloc); // This is a novel thing that would be needed

EM_JS(void, do_something, () {
  var buffer = _malloc(100);
  // ...
});
Imagine that EM_JS_DEP adds an export to that compilation unit. As a result, if lld ends up linking in this file it will export malloc. We could then write JS libraries in EM_JS in this way, annotating their deps in the C/C++ source files. So it's still manual annotation, but it would then all be left to lld to do. Thoughts?

That seems like a great idea yes. I see this this very similar to what I proposed above with with one major downside and one major upside:

downside: we would need to rewrite all our library functions
upside: no custom linking input format would be needed.

I think maybe we can combine the two in order to produce a solution without this downside: We could internally convert all JS libraryes into the above .cpp form and compile them and link them as .a files. This would simplify the post-link JS compiler phase.

The downside is this approach would be would be doing a bunch of .cpp codegen can compilation during the link phase before we run wasm-ld.

juj · 2022-01-12T19:15:53Z

EM_JS is currently not good for production due to a few reasons. The major reason is that it does not allow preprocessing with any of the current -s settings, unlike in JS library files.

Also the {{{ }}} machinery is not available in EM_JS files, so can't do things like

    {{{ makeDynCall('viiii', 'callback') }}}(device, errorCode, errorMessage, userData);
```js
or `{{{ C_STRUCTS.foo.bar }}}` or `{{{ cDefine('ENOSYS') }}}` etc.

It is also not possible to define foo__postset in EM_JS, nor JS variables (with initializers) (foo: '[1,2,3,4]'). Also __proxy is not available there, and none of the other annotations (e.g. the current __deps).

But maybe if the EM_JS mechanism is possible to be expanded to include all of these necessities, then it can work out.

sbc100 · 2022-01-12T21:50:06Z

EM_JS is currently not good for production due to a few reasons. The major reason is that it does not allow preprocessing with any of the current -s settings, unlike in JS library files.

Also the {{{ }}} machinery is not available in EM_JS files, so can't do things like
    {{{ makeDynCall('viiii', 'callback') }}}(device, errorCode, errorMessage, userData);
```js
or `{{{ C_STRUCTS.foo.bar }}}` or `{{{ cDefine('ENOSYS') }}}` etc.
It is also not possible to define foo__postset in EM_JS, nor JS variables (with initializers) (foo: '[1,2,3,4]'). Also __proxy is not available there, and none of the other annotations (e.g. the current __deps).

But maybe if the EM_JS mechanism is possible to be expanded to include all of these necessities, then it can work out.

I agree those are all limitations of hard-written EM_JS, but if we were to generate EM_JS source files from library JS files then the preprocessor issues would be solved because we would be outputting the EM_JS code after we do all the processing and substitutions.

One issue with doing the library processing early is that some functions that we use during JS library pre-processing don't make sense before linking (e.g. hasExportedFunction).

kripken · 2022-01-12T22:58:56Z

@juj

Ah, yes, those EM_JS limitations are an issue, good points. I wonder if there isn't a natural way to use EM_JS code for things that interact with C, and that EM_JS can call JS library code for things that need macros etc. I'd need to try to rewrite something to see if that makes sense, very possibly it does not...

@sbc100

if we were to generate EM_JS source files from library JS files

Hmm, do you mean at link time? Sorry if that's obvious, I think I missed it before. So there would be two phases, first JS, then wasm-ld? I'm a little unclear on how the first JS phase would know what to emit (without wasm-ld first telling it what symbols remain unresoled). Or would all possible JS library functions be emitted at each link, dynamically?

sbc100 · 2022-01-12T23:13:54Z

@juj

Ah, yes, those EM_JS limitations are an issue, good points. I wonder if there isn't a natural way to use EM_JS code for things that interact with C, and that EM_JS can call JS library code for things that need macros etc. I'd need to try to rewrite something to see if that makes sense, very possibly it does not...

@sbc100

if we were to generate EM_JS source files from library JS files

Hmm, do you mean at link time? Sorry if that's obvious, I think I missed it before. So there would be two phases, first JS, then wasm-ld? I'm a little unclear on how the first JS phase would know what to emit (without wasm-ld first telling it what symbols remain unresoled). Or would all possible JS library functions be emitted at each link, dynamically?

I was thinking something like this: For each library_foo.js file create a library_foo.cpp containing all the available JS functions in the library, but as EM_JS functions. Compiler library_foo.cpp and create library_foo.a from it, and then pass all of the library_foo.a archives to wasm-ld.

This whole process would need to be done each time emcc was used a linker because that contents of library_foo.cpp would depend on the settings used at link time (i.e. the pre-processor options could change). And each of the library_foo.cpp would need to be compiled with clang and linked with emar all under the hood... and compiling stuff at link time is something I'd like to less of rather than mode.

It would be a bit like how you need to compile libsdl.a the first time you like with -sUSE_SDL=2 but it would happen for every link. I guess we could be clever and try to cache the results based hash(settings + library content)?

sbc100 · 2022-01-12T23:14:47Z

Obviously such a process could be greatly sped up if we could make AUTO_JS_LIBRARIES the default.

sbc100 · 2022-01-13T19:04:07Z

I started a separate discussion about my idea which it to extend LLD_REPORT_UNDEFINED to include reverse dependencies i the JS symbol list we pass to the linker: #16010

This makes undefined symbol errors more precise by including the name of the object that references the undefined symbol. Its also paves the way (in my mind anyway) for finally fixing reverse dependencies in a salable way. See #15982. That PR uses an alternative script for the pre-processing of dependencies but also fundamentally relies on processing JS libraries both before and after linking. The cost is about 300ms per link operation due to double processing of the JS libraries. This cost is fixed for most projects (since most project don't add a lot JS libraries over time in the way that they add native code object). I imagine even in the most pathological cases JS libraries usage will be dwarfed by native object file usage so even in those cases the native linking will likely always dominate the link time. If the 300ms extra link time causes issues, for example with cmake or autoconf, that do a lot linking of small programs, we could consider hashing the config setting and caching the result of the processing based on them.

This makes undefined symbol errors more precise by including the name of the object that references the undefined symbol. Its also paves the way (in my mind anyway) for finally fixing reverse dependencies in a salable way. See #15982. That PR uses an alternative script for the pre-processing of dependencies but also fundamentally relies on processing JS libraries both before and after linking. The cost is about 300ms per link operation due to double processing of the JS libraries, but results are cached so in practice this only happens the first time a given link command is run (see #18326).

Now that LLD_REPORT_UNDEFINED is always on, we can depend on JS library processing before link time. Extend the existing symbol list to be a list of symbols + native dependencies. This way we can begin to move reverse dependencies out of deps_info.py and potentially one day completely remove that file. For now it is still needed because indirect dependencies can't be specified in the JS library code yet. Replaces (and inspired by): #15982

sbc100 · 2023-03-15T18:13:42Z

I think this can be closed now that #18849 has landed.

…ipten-core#18849) Now that LLD_REPORT_UNDEFINED is always on, we can depend on JS library processing before link time. Extend the existing symbol list to be a list of symbols + native dependencies. This way we can begin to move reverse dependencies out of deps_info.py and potentially one day completely remove that file. For now it is still needed because indirect dependencies can't be specified in the JS library code yet. Replaces (and inspired by): emscripten-core#15982

sbc100 · 2023-05-09T16:47:10Z

Fixed in #18849

Add support for user JS library code to be able to depend on C functi…

e4037df

…ons, via a custom 'foo__wasm_deps: ['wasmFunc1', 'wasmFunc2', ...] directive.

sbc100 mentioned this pull request Feb 22, 2022

Add helper function for C++ exception formatting #16343

Merged

3 tasks

sbc100 mentioned this pull request Dec 4, 2022

Enable LLD_REPORT_UNDEFINED by default #16003

Merged

sbc100 mentioned this pull request Feb 24, 2023

Remove LLD_REPORT_UNDEFINED setting #18342

Merged

sbc100 mentioned this pull request Feb 25, 2023

Allow JS -> native dependencies to be specified in __deps. NFC #18849

Merged

sbc100 closed this May 9, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add support for user JS library code to be able to depend on C functions #15982

Add support for user JS library code to be able to depend on C functions #15982

juj commented Jan 11, 2022 •

edited

Loading

sbc100 commented Jan 12, 2022

sbc100 commented Jan 12, 2022 •

edited

Loading

juj commented Jan 12, 2022

kripken commented Jan 12, 2022

sbc100 commented Jan 12, 2022 •

edited

Loading

juj commented Jan 12, 2022 •

edited

Loading

sbc100 commented Jan 12, 2022

kripken commented Jan 12, 2022

sbc100 commented Jan 12, 2022 •

edited

Loading

sbc100 commented Jan 12, 2022

sbc100 commented Jan 13, 2022 •

edited

Loading

sbc100 commented Mar 15, 2023

sbc100 commented May 9, 2023

Add support for user JS library code to be able to depend on C functions #15982

Add support for user JS library code to be able to depend on C functions #15982

Conversation

juj commented Jan 11, 2022 • edited Loading

sbc100 commented Jan 12, 2022

sbc100 commented Jan 12, 2022 • edited Loading

juj commented Jan 12, 2022

kripken commented Jan 12, 2022

sbc100 commented Jan 12, 2022 • edited Loading

juj commented Jan 12, 2022 • edited Loading

sbc100 commented Jan 12, 2022

kripken commented Jan 12, 2022

sbc100 commented Jan 12, 2022 • edited Loading

sbc100 commented Jan 12, 2022

sbc100 commented Jan 13, 2022 • edited Loading

sbc100 commented Mar 15, 2023

sbc100 commented May 9, 2023

juj commented Jan 11, 2022 •

edited

Loading

sbc100 commented Jan 12, 2022 •

edited

Loading

sbc100 commented Jan 12, 2022 •

edited

Loading

juj commented Jan 12, 2022 •

edited

Loading

sbc100 commented Jan 12, 2022 •

edited

Loading

sbc100 commented Jan 13, 2022 •

edited

Loading