-
Notifications
You must be signed in to change notification settings - Fork 21
Feedback about Workers! #6
Comments
Woah, I missed this post! That's great! Here's my feedback. I'm the author of microjob, a tiny lib built on top of Node.js worker threads. AFAIK, on Node.js things are different. So, I found that there are two ways to pass something to worker threads: via message passing (strings, postMessage) or shared array buffer. Now, both approaches do work for data and that's great. So, I would ask to have an easy way to pass and execute runtime context (maybe in conjunction with vm.runInContext ?) as I do with GoLang. Another thing I'd like to ask: microjob implements the thread-pool pattern and I think it would be cool having it built-in. Thanks again and keep going this way! 🔝 💪 👏 |
Edit: Foiled by daylight savings. 1. Read this2. Then ThisRecently streaming myself working on a system where one node.js process coordinates with a second one via message passing. Start at 3m35s .. you should get the main gist by 10m |
microjob uses a worker pool to avoid performance issues in spawning new threads for each new execution. Further more, passing class instances to the workerData does not solve the problem because it uses the same v8 algorithm to serialize/deserialize data. Quoting from the docs:
|
You don't need to spawn a new process per job. Spawn the worker, then feed it work via a queue like Rabbitmq. Most queuing systems should have a way to control concurrency, so you can limit it to X level of concurrency per worker. |
I was talking about spawning new threads, not new processes. Maybe the following example can explain the situation better: class Person {
constructor(name) {
this.name = name
}
hello() {
console.log(`hello from ${this.name}`)
}
}
const foo = new Person('foo')
// passing it to an existing worker thread
job(() => foo.hello()) Now, I don't want the user to define and instantiate Person inside For instance: job(() => foo.hello(), {ctx: {foo}}) I think that the same serialization/deserialization problem applies to your case (feeding via a queue). |
Another option might be to send the program to the data, instead of the other way around. This means:
Now you can work on objects remotely. |
@wilk Ok let's use your example var data = {
class_path: "./class/person.js",
init_args: [ "foo" ],
action: "get-foo"
};
job( data, ( err, foo )=> {
// use result
}); |
The internal state of a class instance may vary in time and it does not only depend on initial args. |
@wilk we are entirely in agreement here 👍
Things to note about the demo
|
@wilk It isn't possible to actually share the runtime between two workers. JS's prototype design makes it pretty much untenable to share any object between two threads (with the exception of SharedArrayBuffer, which works because we can perform atomic operations on it) |
@Akamaozu @devsnek |
@wilk the problem isn't that no one has bothered to specify how objects would be shared, its that there is no sane way to share objects besides stopping each thread when the other thread is running, in which case, there's no point in threading, and you can put all your code in the same file. |
@devsnek The feature introduced by microjob is handy, especially if you want to declare some parts of your program to be executed in background on another thread (for instance a heavy math calculation). |
So … on the general issue of sharing JS objects between threads: Yes, that’s nothing that Node.js can do on its own; it needs JS engine support, and there’s a good chance JS engines aren’t willing to do this without some sort of standardization (there’s also a good chance that that’s not the case, I guess). People have put work into modifying V8 to allow some sort of shared heap, though: But, again, there’s not much that we can do directly as Node.js on our own. |
Although sharing data objects between processes/threads is not possible with Javascript's memory model, this does not prohibit implementing Extended Memory Semantics that make possible atomic operations on persistent, shared data objects in JS (and Python and other languages). EMS is a native module that implements shared data objects (JSON only) for Node that is agnostic to the source of parallelism so it works with Cluster and OS processes, but like most native add-ons is not compatible with Workers' implementation. EMS predates other parallel programming models for JS so it has built-in support for Bulk Synchronous Parallelism, Fork-Join parallelism, and loop-level parallelism. (Disclaimer: I am the author of EMS) |
Hi @addaleax , |
@pioardi Hi, and thanks for the feedback! Sorry I haven’t gotten around to replying yet.
I agree – that’s overdue. I’ve opened nodejs/node#31601 to hopefully address that.
I have, but I feel like there’s a few tradeoffs to be made there, and I’m not sure that Node.js should be picking a default. For example, you could have a very generic Worker pool, where you would post code to be run, or highly specialized Workers that only handle a single type of task. Also, I’m a fan of implementing features that don’t need to be in Node.js core in userland in general. But that’s just my personal opinion. |
Hi @addaleax ,
I understand your opinion, it could be interesting to have different types of pools in nodejs and a decision tree that guides the developers on which is the most suitable, I am trying to do that so if you want take a look or you will consider in future to include this in Node core than let me know. Thanks for your help @addaleax |
I've been following various solutions for multi-processing in Node.js for many years now, since some of my use-cases for Node.js have been quite processing-intensive, ranging from web scraping, to some very complex rule evaluations in an interactive context. Thus, I'm very happy that we nowadays have a standardized worker threads implementation in Node.js core. That said, I've been having a hard time utilizing worker threads for very many practical use-cases, due to the very restricted data sharing between threads, and copying data between threads being relatively slow. I have recently finished some benchmarks transferring different kinds of data between Node.js threads. The throughput is not admirable, but what I'm happy about is how the use of worker threads can help keep the main loop event processing responsive at all times. See here https://www.jakso.me/blog/nodejs-14-worker-threads-benchmarks |
Something that could truly enable a lot more use cases to benefit from worker threads would be, if we could "give away" all kinds of (at least JSON-safe) data to another thread, similar to the current |
@ollisal What
I think this sentence from your blog post puts it quite well – communication using I also agree with all of the takeaways from your post, btw. |
@ollisal perhaps the solution is for JS engines to run |
@stevenvachon Right, so some kind of asynchronous JSON.stringify & parse variants could be added, with the work done in a native worker thread? Right, if that could be done it would help quite a few common use cases and also enable utilizing worker threads for other more complex tasks because you could use them to communicate more complex data with your workers more efficiently. For JSON.stringify that probably has the same issue/danger as what I suggested about transferring ownership of entire object trees to another thread - there can still be references to parts of that object tree in the main thread. And it could be changed through those references while it's being JSON.stringified or otherwise used in the other thread, which would lead to unpredictable results. Of course, it could be documented that you shouldn't access any parts of the object tree reachable through a transferred root object after the The Another problem I guess is that technically there are separate copies of the Array, Object etc standard prototype classes in the JS context of worker threads, and things like |
@ollisal @stevenvachon I don’t think asynchronous serialization is possible here, unfortunately: const a = startAsyncJSONStringify(obj);
otherObj.property = 42; The JS engine can’t know in advance whether Asynchronous parsing is another story, and quite an interesting idea. I don’t see any reasons why this should be impossible, although this would likely require a decent amount of JS engine modifications. There are some solutions for this problem, though. Microsoft and Alibaba have experimented in the past with shared heaps, i.e. actually making JS objects accessible in multiple threads – that’s not trivial, and in particular nothing that Node.js can do on its own because it also requires extensive JS engine modifications. The other approach is sharing complex objects through |
@addaleax How would it be different if |
@stevenvachon If the serialization happens on the main thread, that’s not really an issue, true – but then I’m not sure if there’s any advantage over If serialization and JS mutations happen in parallel, I think it’s too easy to end up with inconsistent results, unfortunately…
If we could take a snapshot of an object tree, I don’t think we’d need the serialization step at all :) |
We still need serialization if we want to use I guess it'd need to be implemented and benchmarked to see which is faster (sync serialization or async threaded serialization with snapshot). |
@stevenvachon I mean, I guess that depends on what you’re referring to when you say “serialization”… |
Just wandering by to share this research paper with techniques for fast JSON parsing. Mison: A Fast JSON Parser for Data Analytics by Li et al. |
@davisjam That paper certainly looks interesting :) I’m having a hard time thinking of a way to apply the idea to Node.js core, though – on the one side, we can’t just start using JSON because JSON and HTML structured cloning serialize objects differently, and on the other hand, we don’t just want raw C++ representations of objects as in Mison, but need them in a format in which V8 understands them. It’s definitely a good idea for people who are okay with doing their own serialization/deserialization and just posting strings in that case. |
Yep, that was my intent. Probably not a good candidate for Node.js core :-). |
Thanks everyone! Future feedback can go to https://github.com/nodejs/node/ directly. :) |
Hi everyone! Something that starts coming up more and more frequently is moving Workers out of experimental status; we’re tracking that in nodejs/node#22940.
Something that would be very helpful is having feedback from people who use Workers, build libraries on top of it, or can share experiences with the API. Please use this thread for that! Whether you feedback is positive, negative, or anything else, it’s always going to be helpful as long as it’s on-topic.
If you have requests about specific changes you’d like to see, other than moving out of experimental status, feel free to comment here or open an issue at https://github.com/nodejs/node.
The text was updated successfully, but these errors were encountered: