-
Notifications
You must be signed in to change notification settings - Fork 3.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add support for user JS library code to be able to depend on C functions #15982
Conversation
…ons, via a custom 'foo__wasm_deps: ['wasmFunc1', 'wasmFunc2', ...] directive.
I'm definitely behind the core idea here. I don't know if you know but I've already added an extra pass over the libraries as part of (optional) more accurate undefined symbol reporting: https://github.com/emscripten-core/emscripten/blob/main/emcc.py#L440-L464 Here I use Also, does this handle indirect reverse deps? e.g the case described in deps_info.py where we have |
What I've been really dreaming about is finding a way to have
The flow would look something like this:
One of the primary advantage of doing all the symbol resolution in wasm-ld is that we get much nicer error message, for example we can report which object file contains the references to an undefined symbol. |
Thanks, I was not aware of this.
In this example, should If it is a JS function, then it is possible that the dual cycle js->wasm->js->wasm is not seen. Not sure how rare/common these are in general, I think there may have been such a scenario once. Having LLVM do the full JS + Wasm side link would certainly be the ultimate solution. |
Interesting... Another way to get JS->wasm dep handling might be to use #include <emscripten.h>
EM_JS_DEP(malloc); // This is a novel thing that would be needed
EM_JS(void, do_something, () {
var buffer = _malloc(100);
// ...
}); Imagine that |
That seems like a great idea yes. I see this this very similar to what I proposed above with with one major downside and one major upside:
I think maybe we can combine the two in order to produce a solution without this downside: We could internally convert all JS libraryes into the above .cpp form and compile them and link them as The downside is this approach would be would be doing a bunch of .cpp codegen can compilation during the link phase before we run wasm-ld. |
EM_JS is currently not good for production due to a few reasons. The major reason is that it does not allow preprocessing with any of the current -s settings, unlike in JS library files. Also the {{{ makeDynCall('viiii', 'callback') }}}(device, errorCode, errorMessage, userData);
```js
or `{{{ C_STRUCTS.foo.bar }}}` or `{{{ cDefine('ENOSYS') }}}` etc. It is also not possible to define But maybe if the EM_JS mechanism is possible to be expanded to include all of these necessities, then it can work out. |
I agree those are all limitations of hard-written EM_JS, but if we were to generate EM_JS source files from library JS files then the preprocessor issues would be solved because we would be outputting the EM_JS code after we do all the processing and substitutions. One issue with doing the library processing early is that some functions that we use during JS library pre-processing don't make sense before linking (e.g. |
Ah, yes, those EM_JS limitations are an issue, good points. I wonder if there isn't a natural way to use EM_JS code for things that interact with C, and that EM_JS can call JS library code for things that need macros etc. I'd need to try to rewrite something to see if that makes sense, very possibly it does not...
Hmm, do you mean at link time? Sorry if that's obvious, I think I missed it before. So there would be two phases, first JS, then wasm-ld? I'm a little unclear on how the first JS phase would know what to emit (without wasm-ld first telling it what symbols remain unresoled). Or would all possible JS library functions be emitted at each link, dynamically? |
I was thinking something like this: For each This whole process would need to be done each time It would be a bit like how you need to compile |
Obviously such a process could be greatly sped up if we could make |
I started a separate discussion about my idea which it to extend LLD_REPORT_UNDEFINED to include reverse dependencies i the JS symbol list we pass to the linker: #16010 |
This makes undefined symbol errors more precise by including the name of the object that references the undefined symbol. Its also paves the way (in my mind anyway) for finally fixing reverse dependencies in a salable way. See #15982. That PR uses an alternative script for the pre-processing of dependencies but also fundamentally relies on processing JS libraries both before and after linking. The cost is about 300ms per link operation due to double processing of the JS libraries. This cost is fixed for most projects (since most project don't add a lot JS libraries over time in the way that they add native code object). I imagine even in the most pathological cases JS libraries usage will be dwarfed by native object file usage so even in those cases the native linking will likely always dominate the link time. If the 300ms extra link time causes issues, for example with cmake or autoconf, that do a lot linking of small programs, we could consider hashing the config setting and caching the result of the processing based on them.
This makes undefined symbol errors more precise by including the name of the object that references the undefined symbol. Its also paves the way (in my mind anyway) for finally fixing reverse dependencies in a salable way. See #15982. That PR uses an alternative script for the pre-processing of dependencies but also fundamentally relies on processing JS libraries both before and after linking. The cost is about 300ms per link operation due to double processing of the JS libraries. This cost is fixed for most projects (since most project don't add a lot JS libraries over time in the way that they add native code object). I imagine even in the most pathological cases JS libraries usage will be dwarfed by native object file usage so even in those cases the native linking will likely always dominate the link time. If the 300ms extra link time causes issues, for example with cmake or autoconf, that do a lot linking of small programs, we could consider hashing the config setting and caching the result of the processing based on them.
This makes undefined symbol errors more precise by including the name of the object that references the undefined symbol. Its also paves the way (in my mind anyway) for finally fixing reverse dependencies in a salable way. See #15982. That PR uses an alternative script for the pre-processing of dependencies but also fundamentally relies on processing JS libraries both before and after linking. The cost is about 300ms per link operation due to double processing of the JS libraries. This cost is fixed for most projects (since most project don't add a lot JS libraries over time in the way that they add native code object). I imagine even in the most pathological cases JS libraries usage will be dwarfed by native object file usage so even in those cases the native linking will likely always dominate the link time. If the 300ms extra link time causes issues, for example with cmake or autoconf, that do a lot linking of small programs, we could consider hashing the config setting and caching the result of the processing based on them.
This makes undefined symbol errors more precise by including the name of the object that references the undefined symbol. Its also paves the way (in my mind anyway) for finally fixing reverse dependencies in a salable way. See #15982. That PR uses an alternative script for the pre-processing of dependencies but also fundamentally relies on processing JS libraries both before and after linking. The cost is about 300ms per link operation due to double processing of the JS libraries. This cost is fixed for most projects (since most project don't add a lot JS libraries over time in the way that they add native code object). I imagine even in the most pathological cases JS libraries usage will be dwarfed by native object file usage so even in those cases the native linking will likely always dominate the link time. If the 300ms extra link time causes issues, for example with cmake or autoconf, that do a lot linking of small programs, we could consider hashing the config setting and caching the result of the processing based on them.
This makes undefined symbol errors more precise by including the name of the object that references the undefined symbol. Its also paves the way (in my mind anyway) for finally fixing reverse dependencies in a salable way. See #15982. That PR uses an alternative script for the pre-processing of dependencies but also fundamentally relies on processing JS libraries both before and after linking. The cost is about 300ms per link operation due to double processing of the JS libraries, but results are cached so in practice this only happens the first time a given link command is run (see #18326).
This makes undefined symbol errors more precise by including the name of the object that references the undefined symbol. Its also paves the way (in my mind anyway) for finally fixing reverse dependencies in a salable way. See #15982. That PR uses an alternative script for the pre-processing of dependencies but also fundamentally relies on processing JS libraries both before and after linking. The cost is about 300ms per link operation due to double processing of the JS libraries, but results are cached so in practice this only happens the first time a given link command is run (see #18326).
This makes undefined symbol errors more precise by including the name of the object that references the undefined symbol. Its also paves the way (in my mind anyway) for finally fixing reverse dependencies in a salable way. See #15982. That PR uses an alternative script for the pre-processing of dependencies but also fundamentally relies on processing JS libraries both before and after linking. The cost is about 300ms per link operation due to double processing of the JS libraries, but results are cached so in practice this only happens the first time a given link command is run (see #18326).
This makes undefined symbol errors more precise by including the name of the object that references the undefined symbol. Its also paves the way (in my mind anyway) for finally fixing reverse dependencies in a salable way. See #15982. That PR uses an alternative script for the pre-processing of dependencies but also fundamentally relies on processing JS libraries both before and after linking. The cost is about 300ms per link operation due to double processing of the JS libraries, but results are cached so in practice this only happens the first time a given link command is run (see #18326).
This makes undefined symbol errors more precise by including the name of the object that references the undefined symbol. Its also paves the way (in my mind anyway) for finally fixing reverse dependencies in a salable way. See #15982. That PR uses an alternative script for the pre-processing of dependencies but also fundamentally relies on processing JS libraries both before and after linking. The cost is about 300ms per link operation due to double processing of the JS libraries, but results are cached so in practice this only happens the first time a given link command is run (see #18326).
This makes undefined symbol errors more precise by including the name of the object that references the undefined symbol. Its also paves the way (in my mind anyway) for finally fixing reverse dependencies in a salable way. See #15982. That PR uses an alternative script for the pre-processing of dependencies but also fundamentally relies on processing JS libraries both before and after linking. The cost is about 300ms per link operation due to double processing of the JS libraries, but results are cached so in practice this only happens the first time a given link command is run (see #18326).
This makes undefined symbol errors more precise by including the name of the object that references the undefined symbol. Its also paves the way (in my mind anyway) for finally fixing reverse dependencies in a salable way. See #15982. That PR uses an alternative script for the pre-processing of dependencies but also fundamentally relies on processing JS libraries both before and after linking. The cost is about 300ms per link operation due to double processing of the JS libraries, but results are cached so in practice this only happens the first time a given link command is run (see #18326).
This makes undefined symbol errors more precise by including the name of the object that references the undefined symbol. Its also paves the way (in my mind anyway) for finally fixing reverse dependencies in a salable way. See #15982. That PR uses an alternative script for the pre-processing of dependencies but also fundamentally relies on processing JS libraries both before and after linking. The cost is about 300ms per link operation due to double processing of the JS libraries, but results are cached so in practice this only happens the first time a given link command is run (see #18326).
Now that LLD_REPORT_UNDEFINED is always on, we can depend on JS library processing before link time. Extend the existing symbol list to be a list of symbols + native dependencies. This way we can begin to move reverse dependencies out of deps_info.py and potentially one day completely remove that file. For now it is still needed because indirect dependencies can't be specified in the JS library code yet. Replaces (and inspired by): #15982
Now that LLD_REPORT_UNDEFINED is always on, we can depend on JS library processing before link time. Extend the existing symbol list to be a list of symbols + native dependencies. This way we can begin to move reverse dependencies out of deps_info.py and potentially one day completely remove that file. For now it is still needed because indirect dependencies can't be specified in the JS library code yet. Replaces (and inspired by): #15982
Now that LLD_REPORT_UNDEFINED is always on, we can depend on JS library processing before link time. Extend the existing symbol list to be a list of symbols + native dependencies. This way we can begin to move reverse dependencies out of deps_info.py and potentially one day completely remove that file. For now it is still needed because indirect dependencies can't be specified in the JS library code yet. Replaces (and inspired by): #15982
Now that LLD_REPORT_UNDEFINED is always on, we can depend on JS library processing before link time. Extend the existing symbol list to be a list of symbols + native dependencies. This way we can begin to move reverse dependencies out of deps_info.py and potentially one day completely remove that file. For now it is still needed because indirect dependencies can't be specified in the JS library code yet. Replaces (and inspired by): #15982
Now that LLD_REPORT_UNDEFINED is always on, we can depend on JS library processing before link time. Extend the existing symbol list to be a list of symbols + native dependencies. This way we can begin to move reverse dependencies out of deps_info.py and potentially one day completely remove that file. For now it is still needed because indirect dependencies can't be specified in the JS library code yet. Replaces (and inspired by): #15982
Now that LLD_REPORT_UNDEFINED is always on, we can depend on JS library processing before link time. Extend the existing symbol list to be a list of symbols + native dependencies. This way we can begin to move reverse dependencies out of deps_info.py and potentially one day completely remove that file. For now it is still needed because indirect dependencies can't be specified in the JS library code yet. Replaces (and inspired by): #15982
Now that LLD_REPORT_UNDEFINED is always on, we can depend on JS library processing before link time. Extend the existing symbol list to be a list of symbols + native dependencies. This way we can begin to move reverse dependencies out of deps_info.py and potentially one day completely remove that file. For now it is still needed because indirect dependencies can't be specified in the JS library code yet. Replaces (and inspired by): #15982
Now that LLD_REPORT_UNDEFINED is always on, we can depend on JS library processing before link time. Extend the existing symbol list to be a list of symbols + native dependencies. This way we can begin to move reverse dependencies out of deps_info.py and potentially one day completely remove that file. For now it is still needed because indirect dependencies can't be specified in the JS library code yet. Replaces (and inspired by): #15982
I think this can be closed now that #18849 has landed. |
…ipten-core#18849) Now that LLD_REPORT_UNDEFINED is always on, we can depend on JS library processing before link time. Extend the existing symbol list to be a list of symbols + native dependencies. This way we can begin to move reverse dependencies out of deps_info.py and potentially one day completely remove that file. For now it is still needed because indirect dependencies can't be specified in the JS library code yet. Replaces (and inspired by): emscripten-core#15982
…ipten-core#18849) Now that LLD_REPORT_UNDEFINED is always on, we can depend on JS library processing before link time. Extend the existing symbol list to be a list of symbols + native dependencies. This way we can begin to move reverse dependencies out of deps_info.py and potentially one day completely remove that file. For now it is still needed because indirect dependencies can't be specified in the JS library code yet. Replaces (and inspired by): emscripten-core#15982
Fixed in #18849 |
There is a long standing issue that user JS libraries cannot depend back on C functions without a) either modifying their Emscripten installation's src/deps_info.py file, or b) manually annotating EXPORTED_FUNCTIONS with all such C functions that the JS code will call back to.
Neither of these methods is tenable, since they both make it hard to share code between developers. (This limitation could be considered outright a bug)
This PR aims to add support for user JS library code to be able to depend on C functions, via a custom
'foo__wasm_deps: ['wasmFunc1', 'wasmFunc2', ...]
directive.This way JS functions can declare which C/Wasm functions they depend on, and they only need to know whether the depended function resides in Wasm or JS land, and either use
foo__deps: ['jsFunc']
orfoo__wasm_deps: ['wasmFunc']
to declare the dependency.This is a barebones (but working) implementation, wanted to open this up for comments at this point.
The way this works is that JS libraries are parsed an additional time in the beginning to find the JS->wasm deps from them.
This will slow down builds, but slower is better than broken. However I appreciate that might create contention, so if there is pushback from someone not wanting/needing this machinery, I would be open to making this an optional cmdline flag that users have to pass to activate it (e.g. a
-sSCAN_WASM_DEPS=1
or pile on a new mode on the existing-sREVERSE_DEPS
flag?)I think in the long term it would be better to move to an architecture like this in the system JS libraries, and nuke all the registrations from deps_info.py altogether. That way we can get closer to "user JS libraries" being at the same power level as "system JS libraries" are.
Thoughts? @kripken ?