Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support for node.js workers #6567

Closed
dschuff opened this issue May 22, 2018 · 28 comments
Closed

Support for node.js workers #6567

dschuff opened this issue May 22, 2018 · 28 comments

Comments

@dschuff
Copy link
Member

dschuff commented May 22, 2018

Work has been started on workers in node.js (nodejs/node#20876). The idea looks great, and we should make sure our pthreads code works there too. The API (which presumably isn't final) looks more or less like web workers, so it shouldn't be too hard to make it work.

@dschuff
Copy link
Member Author

dschuff commented May 22, 2018

I had forgotten that someone had already filed a bug about this previously: #6332 In any case, we should make this work for node.js as well as browsers.

@kripken
Copy link
Member

kripken commented May 12, 2019

Experimenting with this, it is blocked on nodejs/node#25630 - looks like in node a blocking main does not send postMessages to workers (or perhaps blocks all logging from them).

@bnoordhuis
Copy link
Contributor

or perhaps blocks all logging from them

Just to confirm: that's what happens.

You can postMessage from main to worker, no problem, but console.log() in a worker sends the string to the main thread, to print it on the worker's behalf.

The main thread won't process that string until control returns to the event loop and fixing that is not trivial.

@kripken
Copy link
Member

kripken commented May 13, 2019

I see, thanks @bnoordhuis!

Maybe we can work around it by using stdout/stderr directly? Or would that not work either?

@bnoordhuis
Copy link
Contributor

No, unfortunately. All stdio is mediated in Node's workers.

@kripken
Copy link
Member

kripken commented May 14, 2019

I see, thanks. Ok, sadly I think that means we can't use node workers in a simple way here.

@walkingeyerobot
Copy link
Collaborator

I believe bleeding edge Node should have all the support necessary.

@kripken
Copy link
Member

kripken commented Aug 26, 2019

Latest node may be enough, yeah. Last I tried it seemed possible, but getting logging working from a pthread (while the main is busy) was hard, so it was difficult to make progress. I have a wip branch here:

https://github.com/emscripten-core/emscripten/tree/node-pthreads

I may not have time for this myself in the near future. If someone has time to look at that that would be great, it might be a matter of finding some hack for debugging (like maybe printing to a file?), then fixing some minor issues.

@kripken
Copy link
Member

kripken commented Sep 24, 2019

I looked at this a little more now, there is some progress in the branch. Overall I keep hitting the many small differences between a Web Worker and a Node Worker.

For example the current blocker is that on the web we use importScripts to load the main x.js from the x.worker.js. Node doesn't have importScripts. The closest I can find is to load the code and do a global eval, but when doing that, require() is not defined, and we need that to be able to read files etc. Any ideas?

@addaleax
Copy link

@kripken Maybe I’m misunderstanding, but what’s a “global eval” in that context? eval('require') should give the require function as long as the outer context already has it.

@kripken
Copy link
Member

kripken commented Sep 24, 2019

@addaleax

It's something like eval.call(null, '...'), which can add stuff to the global scope (and not just the current function scope if there is one), which we need to match importScripts. The emscripten code is here and here's an example of why it's needed:

function setup() {
  eval('function foo() {}'); // normal eval
  console.log(typeof foo); // ok
  eval.call(null, 'function baz() {}'); // global eval
  console.log(typeof baz); // ok
}
setup();
console.log(typeof foo); // bad!
console.log(typeof baz); // ok

Turns out in node a normal eval does see require, but a global one doesn't :( But it does work in V8 and SpiderMonkey...

Maybe there's a better way I'm missing?

@addaleax
Copy link

Okay, I see – thanks for the explanation.

In that case, maybe doing something like we do for the REPL in Node.js is a good approach? Basically, we copy/provide Node’s special per-module variables, require, module, exports, require, __dirname and __filename to the global object, and then run eval() on the script.

I think that’s something that makes sense when your goal is to have everything in the global scope, as it would be in a Web Worker?

@kripken
Copy link
Member

kripken commented Sep 25, 2019

Interesting, thanks - so there's a way I can copy require to the global scope, and global eval would see it? How would I do that copy?

@addaleax
Copy link

So, just to avoid any confusion, I’m basically talking about doing this inside the Worker:

global.require = require;
eval.call(null, 'require("console").log("hello, world!")');

The catch is that this picks one require function out of possibly many, namely the one for the main Worker script, but I think that that’s what you might want?

@kripken
Copy link
Member

kripken commented Sep 26, 2019

Thanks @addaleax! That works great.

However, we do have more things than require that we'll need (I hit this on Module now), and some of them must be written to (like noExitRuntime, buffer, etc.). Here's a testcase:

var x = 0;
eval('x = x + 1');
console.log(x);
global.x = x; // must be commented out if not in in node
eval.call(null, 'x = x + 1');
console.log(x);

That prints 1 and then 2 in non-node environments, but in node it prints 1 twice. Is there a way to make that work?

@addaleax
Copy link

@kripken I think that’s because var in Node.js does not create global variables, because the individual scripts are run inside function wrappers; Replacing the first line with global.x = 0 seems to make this work?

@kripken
Copy link
Member

kripken commented Sep 27, 2019

@addaleax Yeah, replacing all those could work. We do have a bunch of them, though, so it's not obvious what to do here, and I worry about making the code less maintainable while still supporting both the web and node.

I'll think some more on this and try to get back to it when I have time, but if someone else has a good idea of how to proceed that would be great too.

@addaleax
Copy link

@kripken If you do want everything to work like in the browser scope-wise and hav, you could probably just wrap it all in a global eval (or Node’s vm.runInThisContext() which does a similar thing) except for a small intro bit that copies some variables to the global scope?

(Maybe I’m having a simplistic view of things here, but ultimately I’m invested in making this work, too)

@cambeaney
Copy link

cambeaney commented Sep 30, 2019

I think in nodejs module scope is different to global scope (I am not a nodejs programmer to be clear).
I got this working by assigning process, __filename etc like this in globalEval

function globalEval(x) {
    this.exports = exports;
    this.require = require;
    this.module = module;
    this.__filename = __filename;
    this.__dirname = __dirname;
    eval.call(null, x);
  }

And then just changing all of the variable definitions in worker.js to global variables, instead of using var. Not a particularly elegant solution, but it does work (tested on a fairly complicated threaded application).

@kripken
Copy link
Member

kripken commented Oct 1, 2019

@addaleax @cambeaney

A problem is that we want our builds to run on both the Web and Node.js. But it seems like in worker.js we need var threadInfoStruct = 0; on the Web but global.threadInfoStruct = 0; on Node?

If there's no other option we can force such builds to be either Web or Node but not both, but this is the first time we've run into such a situation, so it would be something we need to consider carefully.

@addaleax
Copy link

addaleax commented Oct 1, 2019

@kripken A var inside a global eval does result in a global variable, both in Node.js and in the Web; so the main options for consistency between the Web and Node.js seem to be either a) grabbing the global object and then always setting properties on that, or b) wrapping most of the worker.js code itself in a global eval and using var declarations, e.g. along the lines of @cambeaney’s suggestion?

@kripken
Copy link
Member

kripken commented Oct 3, 2019

Oh thanks @addaleax, I missed the part about running worker.js itself in a global eval. Yes, that seems like it might help!

Meanwhile I am refactoring this code in #9569 which will remove some of those globals, and may remove almost all the ones we need to share between the files. After that lands I'll get back to this and see how things look.

kripken added a commit that referenced this issue Oct 16, 2019
Turns out much of the pthreads runtime was not closure-safe. This PR fixes that,
using quoted strings as closure expects.

While doing that I noticed that some of the things in place for pthreads +
MODULARIZE would also help closure. In both cases the key thing is that
worker.js, that loads the main JS file, is separate from it, so it needs proper
interfaces and can't just rely on global variables (which in closure are
minified in the main JS, and in MODULARIZE are in another scope). So I removed
things like makeAsmExportAccessInPthread, makeAsmGlobalAccessInPthread,
makeAsmExportAndGlobalAssignTargetInPthread which makes the code simpler.

Because of that I was also able to remove a bunch of globals from worker.js
which further simplifies things, and should also help node.js workers (where the
 global scope is very different, #6567).

Most of this PR is straightforward according to the above notes. One slightly
tricky thing is the stack setup, which I refactored to use a new
Module.applyStackValues method that worker.js calls at the right time (as the
stack is set up after the JS loads).

Code size effects:

Before this PR: 69,567 bytes.
After this PR: 69,644.
After this PR, plus closure: 34,909.

So this adds only 77 bytes, while also making the code simpler + making it
possible to run closure and decrease size by 50%. In addition, I have followup
work to further reduce the non-closure code size too, so this regression will
be fixed (and more).
@VirtualTim
Copy link
Collaborator

I just had a quick look, and one other issue I ran into was that node.js workers don't support addEventListener. They have a bug open to add support: nodejs/node#26856.

@addaleax
Copy link

@VirtualTim Node.js currently doesn’t support EventTarget for any built-in, and I wouldn’t count on that changing soon (unless you’re willing to put work into that yourself 🙂). The only thing we do support is .onmessage for MessagePort instances, for alignment with the web.

@kripken
Copy link
Member

kripken commented Oct 24, 2019

Ok, after merging in that PR there is good news and bad news. The good news is that it looks like a simple program can work! It creates 3 pthreads and they each do some work and print out their progress :)

The bad news is that it looks like a pthread exiting shuts down the whole node.js instance. Shutdown is done by throwing an exception, which on the web stays in the worker (it's logged to the console, but that's it), while in node it seems to halt the entire application,

a.out.js:154
      throw ex;
      ^

Error [ERR_UNHANDLED_ERROR]: Unhandled error. (55)
    at Worker.emit (events.js:198:17)
    at Worker.[kOnErrorMessage] (internal/worker.js:176:10)
    at Worker.[kOnMessage] (internal/worker.js:186:37)
    at MessagePort.<anonymous> (internal/worker.js:118:57)
    at MessagePort.emit (events.js:209:13)
    at MessagePort.onmessage (internal/worker/io.js:70:8) {
  context: 55
}

and the node.js process exits with code 1. This is a problem since threads should call pthread_exit(), and that method must throw (it's marked as noexit in C, so clang emits an unreachable at the end, which throws; we normally don't reach it and throw a JS exception, which is on line 154 there; removing that, it just prints that the unreachable is thrown). We should be able to handle pthreads exiting and new ones being started etc., so it's not immediately obvious to me what to do.

@addaleax
Copy link

@kripken Would trying to call process.exit() in Node.js Workers an option (possibly guarded by a check for the presence of it)? It throws an un-catchable JS exception that terminates the Worker, so it should do exactly what you want.

Otherwise, listening to the 'error' event on the Worker instance should be the way to go…

@kripken
Copy link
Member

kripken commented Oct 30, 2019

Excellent, thanks @addaleax! That works :)

@kripken
Copy link
Member

kripken commented Oct 31, 2019

PR is finally up, with a first passing test! #9745 Thanks again for all the help here :)

kripken added a commit that referenced this issue Nov 1, 2019
This ended up tricky because of various node/web differences, like the global
scope, the event loop, APIs, etc. With help on #6567 I think this PR finally
manages to make it work, at least for an initial "hello world" type test.
Thanks to everyone on that issue and in particular @addaleax!

This moves some code from shell.js into side .js files, so that we can use it in
multiple places.

Fixes #6567 but as we add more tests I'm sure we'll find more issues, this just
shows basic functionality so far. We may not be able to easily reuse the
existing browser tests as they have browser-specific things in them, but we
can try.
belraquib pushed a commit to belraquib/emscripten that referenced this issue Dec 23, 2020
Turns out much of the pthreads runtime was not closure-safe. This PR fixes that,
using quoted strings as closure expects.

While doing that I noticed that some of the things in place for pthreads +
MODULARIZE would also help closure. In both cases the key thing is that
worker.js, that loads the main JS file, is separate from it, so it needs proper
interfaces and can't just rely on global variables (which in closure are
minified in the main JS, and in MODULARIZE are in another scope). So I removed
things like makeAsmExportAccessInPthread, makeAsmGlobalAccessInPthread,
makeAsmExportAndGlobalAssignTargetInPthread which makes the code simpler.

Because of that I was also able to remove a bunch of globals from worker.js
which further simplifies things, and should also help node.js workers (where the
 global scope is very different, emscripten-core#6567).

Most of this PR is straightforward according to the above notes. One slightly
tricky thing is the stack setup, which I refactored to use a new
Module.applyStackValues method that worker.js calls at the right time (as the
stack is set up after the JS loads).

Code size effects:

Before this PR: 69,567 bytes.
After this PR: 69,644.
After this PR, plus closure: 34,909.

So this adds only 77 bytes, while also making the code simpler + making it
possible to run closure and decrease size by 50%. In addition, I have followup
work to further reduce the non-closure code size too, so this regression will
be fixed (and more).
belraquib pushed a commit to belraquib/emscripten that referenced this issue Dec 23, 2020
This ended up tricky because of various node/web differences, like the global
scope, the event loop, APIs, etc. With help on emscripten-core#6567 I think this PR finally
manages to make it work, at least for an initial "hello world" type test.
Thanks to everyone on that issue and in particular @addaleax!

This moves some code from shell.js into side .js files, so that we can use it in
multiple places.

Fixes emscripten-core#6567 but as we add more tests I'm sure we'll find more issues, this just
shows basic functionality so far. We may not be able to easily reuse the
existing browser tests as they have browser-specific things in them, but we
can try.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

7 participants