-
Notifications
You must be signed in to change notification settings - Fork 29.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
IPC can freeze the process #7657
Comments
I just tested your example unmodified on a recent 7.0.0-pre build on Windows 7 x64 with 4 physical hyperthreaded cores for around 8 minutes , handling around 2.2 million messages with no issues. Maybe it's already fixed in the repo, just not on the 6.3.0 release? |
@httpdigest Great if it's fixed on the V7 branch, but since V6 is the next LTS from october I assume this kind of bug should be ported backed! |
I confirmed that the provided repro works under Windows 10 with node v0.10.38 and v0.10.46, but stops immediately with v6.3.0. Tested master under Windows 2012 and it stops after printing the "still alive" message once or twice. I will add this to my backlog and investigate when possible. cc @nodejs/platform-windows |
I'm able to reproduce the hang on Windows using the scripts provided by @cvillemure . In a debugger I found the main thread of the parent process was stuck waiting for a pipe write to complete. Normally writes to an IPC pipe complete immediately, but when the pipe gets full because messages are written into it faster than they are read out on the other side then the writes are blocked until space is available. While I can't figure out how to debug the child processes, I assume they are similarly blocked trying to write to their IPC pipe, so that the parent and child processes are waiting on each other. According to the documentation, the correct behavior should be for the send() method to return false instead of waiting indefinitely on a blocked pipe:
A possibly related issue is that as of node v4.0, process.send() operations are asynchronous on unix, but they appear to be still synchronous on Windows. There is a comment in the Windows pipe code that specifically mentions that IPC writes are intentionally blocking, that I don't understand. I don't see a v7.0.0 branch, but I'm using binaries built from the latest sources from the master branch. (On Windows 10 x64.) |
We are seeing a similar issue in VS Code where the application just hangs after sufficient amount of data is sent between a node process and its forked child. I would appreciate if someone can enlighten me about the
In my reproducible case I see |
Our understanding now is to use
Can someone confirm the following pseudo code? var buffer = [];
function send(msg) {
if (buffer.length > 0) {
buffer.push(msg);
return; // wait for the pending process.send to finish before sending
}
var res = process.send(msg, () => {
// send buffer now that we are good again to send
var bufferCopy = buffer.slice(0);
buffer = [];
bufferCopy.forEach(b => send(b));
});
if (!res) {
buffer.push(msg); // start adding the message to the buffer if send failed
}
} What worries me is that here I am assuming that the callback is a good place to continue sending data to the process but according to the docs this is not clear to me:
Basically I am missing a way to find out when is a good time to start sending messages again after receiving |
@bpasero The return value indicates whether node.js was able to send the message right away (true) or had to buffer it (false). You can keep sending messages and node.js will dutifully buffer them but that may result in unbounded memory growth so as a rule of thumb, when |
@bnoordhuis thanks for the explanation, this should probably go into the docs of I will try to follow that approach, however I would be surprised if the node-process deadlock is fixed with that approach. It may just make it less likely to happen. |
Super simple repro for me: index.js var cp = require("child_process");
var res = cp.fork("./fork.js");
var largeObj = {};
for (var i = 0; i < 10000; i++) {
largeObj[i] = "foo bar";
}
setInterval(function () {
console.log("PING (main side)")
}, 1000);
for (var i = 0; i < 2; i++) {
var result = res.send(JSON.stringify(largeObj), function() {
console.log("Done sending from main side");
});
console.log("Result from sending: " + result);
} fork.js setInterval(function () {
console.log("PING (fork side)")
}, 1000);
var largeObj = {};
for (var i = 0; i < 10000; i++) {
largeObj[i] = "foo bar";
}
for (var i = 0; i < 2; i++) {
process.send(JSON.stringify(largeObj));
} All it needs is a sufficient large enough data that causes the process.send call to return false and both sides need to be sending data. |
As expected, using |
@bpasero I've filed libuv/libuv#1099. It's on my radar but I'm not much of a Windows programmer. If you want a speedy resolution, maybe you can have one of your programmers look at it. |
Thanks. A workaround that seems to prevent this issue for us is to send a message in sequence always from the callback of the |
@bnoordhuis so Node.JS takes care of resending the messages? |
It isn't "resending" it, because it didn't "fail to get sent". It just queued. The return code is for flow-control, so you know you are sending faster than data can be written out the socket, it isn't an indication that data was dropped. |
@sam-github thanks for clarifying that |
1) introduce a timeout so the process shuts down even tho it hasn't send all messages to the parent 2) don't make sentMessages bump dependent on whether or not the send method returned with true or false, Mocha sometimes sends to many events at once so the buffer "exceeds a threshold that makes it unwise to send more" (https://nodejs.org/api/child_process.html#child_process_child_send_message_sendhandle_options_callback). These messages didn't fail to get sent (see nodejs/node#7657 (comment))
Should this remain open? |
Since this is a libuv issue, I'll take the liberty of closing this. Libuv PRs welcome. |
Even though this isn't resolved yet, thank you so much for documenting this! Was losing my mind when my root node process was freezing up. Glad to know its Windows-specific, not a Node.js/IPC issue. Double thanks to @cvillemure for packaging a test case. Helped me quickly confirm the issue I'm experiencing is the same, and not unique to my app. |
This fixes a bug where IPC pipe communication would deadlock when both ends of the pipe are written to simultaneously, and the kernel pipe buffer has already been filled up by earlier writes. The root cause of the deadlock is that, while writes to an IPC pipe are generally asynchronous, the IPC frame header is written synchronously. So when both ends of the pipe are sending a frame header at the same time, neither will read data off the pipe, causing both header writes to block indefinitely. Additionally, this patch somewhat reduces the spaghetti level in win/pipe.c. Fixes: #1099 Fixes: nodejs/node#7657 Fixes: electron/electron#10107 Fixes: parcel-bundler/parcel#637 Fixes: parcel-bundler/parcel#900 Fixes: parcel-bundler/parcel#1137
This fixes a bug where IPC pipe communication would deadlock when both ends of the pipe are written to simultaneously, and the kernel pipe buffer has already been filled up by earlier writes. The root cause of the deadlock is that, while writes to an IPC pipe are generally asynchronous, the IPC frame header is written synchronously. So when both ends of the pipe are sending a frame header at the same time, neither will read data off the pipe, causing both header writes to block indefinitely. Additionally, this patch somewhat reduces the spaghetti level in win/pipe.c. Fixes: #1099 Fixes: nodejs/node#7657 Fixes: electron/electron#10107 Fixes: parcel-bundler/parcel#637 Fixes: parcel-bundler/parcel#900 Fixes: parcel-bundler/parcel#1137
This fixes a bug where IPC pipe communication would deadlock when both ends of the pipe are written to simultaneously, and the kernel pipe buffer has already been filled up by earlier writes. The root cause of the deadlock is that, while writes to an IPC pipe are generally asynchronous, the IPC frame header is written synchronously. So when both ends of the pipe are sending a frame header at the same time, neither will read data off the pipe, causing both header writes to block indefinitely. Additionally, this patch somewhat reduces the spaghetti level in win/pipe.c. Fixes: #1099 Fixes: nodejs/node#7657 Fixes: electron/electron#10107 Fixes: parcel-bundler/parcel#637 Fixes: parcel-bundler/parcel#900 Fixes: parcel-bundler/parcel#1137
This fixes a bug where IPC pipe communication would deadlock when both ends of the pipe are written to simultaneously, and the kernel pipe buffer has already been filled up by earlier writes. The root cause of the deadlock is that, while writes to an IPC pipe are generally asynchronous, the IPC frame header is written synchronously. So when both ends of the pipe are sending a frame header at the same time, neither will read data off the pipe, causing both header writes to block indefinitely. Additionally, this patch somewhat reduces the spaghetti level in win/pipe.c. Fixes: #1099 Fixes: nodejs/node#7657 Fixes: electron/electron#10107 Fixes: parcel-bundler/parcel#637 Fixes: parcel-bundler/parcel#900 Fixes: parcel-bundler/parcel#1137
This fixes a bug where IPC pipe communication would deadlock when both ends of the pipe are written to simultaneously, and the kernel pipe buffer has already been filled up by earlier writes. The root cause of the deadlock is that, while writes to an IPC pipe are generally asynchronous, the IPC frame header is written synchronously. So when both ends of the pipe are sending a frame header at the same time, neither will read data off the pipe, causing both header writes to block indefinitely. Additionally, this patch somewhat reduces the spaghetti level in win/pipe.c. Fixes: #1099 Fixes: nodejs/node#7657 Fixes: electron/electron#10107 Fixes: parcel-bundler/parcel#637 Fixes: parcel-bundler/parcel#900 Fixes: parcel-bundler/parcel#1137
This fixes a bug where IPC pipe communication would deadlock when both ends of the pipe are written to simultaneously, and the kernel pipe buffer has already been filled up by earlier writes. The root cause of the deadlock is that, while writes to an IPC pipe are generally asynchronous, the IPC frame header is written synchronously. So when both ends of the pipe are sending a frame header at the same time, neither will read data off the pipe, causing both header writes to block indefinitely. Additionally, this patch somewhat reduces the spaghetti level in win/pipe.c. Fixes: #1099 Refs: nodejs/node#7657 Refs: electron/electron#10107 Refs: parcel-bundler/parcel#637 Refs: parcel-bundler/parcel#900 Refs: parcel-bundler/parcel#1137
This fixes a bug where IPC pipe communication would deadlock when both ends of the pipe are written to simultaneously, and the kernel pipe buffer has already been filled up by earlier writes. The root cause of the deadlock is that, while writes to an IPC pipe are generally asynchronous, the IPC frame header is written synchronously. So when both ends of the pipe are sending a frame header at the same time, neither will read data off the pipe, causing both header writes to block indefinitely. Additionally, this patch somewhat reduces the spaghetti level in win/pipe.c. Fixes: #1099 Refs: nodejs/node#7657 Refs: electron/electron#10107 Refs: parcel-bundler/parcel#637 Refs: parcel-bundler/parcel#900 Refs: parcel-bundler/parcel#1137
This fixes a bug where IPC pipe communication would deadlock when both ends of the pipe are written to simultaneously, and the kernel pipe buffer has already been filled up by earlier writes. The root cause of the deadlock is that, while writes to an IPC pipe are generally asynchronous, the IPC frame header is written synchronously. So when both ends of the pipe are sending a frame header at the same time, neither will read data off the pipe, causing both header writes to block indefinitely. Additionally, this patch somewhat reduces the spaghetti level in win/pipe.c. Fixes: #1099 Refs: nodejs/node#7657 Refs: electron/electron#10107 Refs: parcel-bundler/parcel#637 Refs: parcel-bundler/parcel#900 Refs: parcel-bundler/parcel#1137
This fixes a bug where IPC pipe communication would deadlock when both ends of the pipe are written to simultaneously, and the kernel pipe buffer has already been filled up by earlier writes. The root cause of the deadlock is that, while writes to an IPC pipe are generally asynchronous, the IPC frame header is written synchronously. So when both ends of the pipe are sending a frame header at the same time, neither will read data off the pipe, causing both header writes to block indefinitely. Additionally, this patch somewhat reduces the spaghetti level in win/pipe.c. Fixes: #1099 Refs: nodejs/node#7657 Refs: electron/electron#10107 Refs: parcel-bundler/parcel#637 Refs: parcel-bundler/parcel#900 Refs: parcel-bundler/parcel#1137
This fixes a bug where IPC pipe communication would deadlock when both ends of the pipe are written to simultaneously, and the kernel pipe buffer has already been filled up by earlier writes. The root cause of the deadlock is that, while writes to an IPC pipe are generally asynchronous, the IPC frame header is written synchronously. So when both ends of the pipe are sending a frame header at the same time, neither will read data off the pipe, causing both header writes to block indefinitely. Additionally, this patch somewhat reduces the spaghetti level in win/pipe.c. Fixes: #1099 Refs: nodejs/node#7657 Refs: electron/electron#10107 Refs: parcel-bundler/parcel#637 Refs: parcel-bundler/parcel#900 Refs: parcel-bundler/parcel#1137
This fixes a bug where IPC pipe communication would deadlock when both ends of the pipe are written to simultaneously, and the kernel pipe buffer has already been filled up by earlier writes. The root cause of the deadlock is that, while writes to an IPC pipe are generally asynchronous, the IPC frame header is written synchronously. So when both ends of the pipe are sending a frame header at the same time, neither will read data off the pipe, causing both header writes to block indefinitely. Additionally, this patch somewhat reduces the spaghetti level in win/pipe.c. Fixes: #1099 Refs: nodejs/node#7657 Refs: electron/electron#10107 Refs: parcel-bundler/parcel#637 Refs: parcel-bundler/parcel#900 Refs: parcel-bundler/parcel#1137
This fixes a bug where IPC pipe communication would deadlock when both ends of the pipe are written to simultaneously, and the kernel pipe buffer has already been filled up by earlier writes. The root cause of the deadlock is that, while writes to an IPC pipe are generally asynchronous, the IPC frame header is written synchronously. So when both ends of the pipe are sending a frame header at the same time, neither will read data off the pipe, causing both header writes to block indefinitely. Additionally, this patch somewhat reduces the spaghetti level in win/pipe.c. Fixes: #1099 Refs: nodejs/node#7657 Refs: electron/electron#10107 Refs: parcel-bundler/parcel#637 Refs: parcel-bundler/parcel#900 Refs: parcel-bundler/parcel#1137
This fixes a bug where IPC pipe communication would deadlock when both ends of the pipe are written to simultaneously, and the kernel pipe buffer has already been filled up by earlier writes. The root cause of the deadlock is that, while writes to an IPC pipe are generally asynchronous, the IPC frame header is written synchronously. So when both ends of the pipe are sending a frame header at the same time, neither will read data off the pipe, causing both header writes to block indefinitely. Additionally, this patch somewhat reduces the spaghetti level in win/pipe.c. Fixes: #1099 Refs: nodejs/node#7657 Refs: electron/electron#10107 Refs: parcel-bundler/parcel#637 Refs: parcel-bundler/parcel#900 Refs: parcel-bundler/parcel#1137
This fixes a bug where IPC pipe communication would deadlock when both ends of the pipe are written to simultaneously, and the kernel pipe buffer has already been filled up by earlier writes. The root cause of the deadlock is that, while writes to an IPC pipe are generally asynchronous, the IPC frame header is written synchronously. So when both ends of the pipe are sending a frame header at the same time, neither will read data off the pipe, causing both header writes to block indefinitely. Additionally, this patch somewhat reduces the spaghetti level in win/pipe.c. Fixes: #1099 Refs: nodejs/node#7657 Refs: electron/electron#10107 Refs: parcel-bundler/parcel#637 Refs: parcel-bundler/parcel#900 Refs: parcel-bundler/parcel#1137 PR-URL: #1843 Reviewed-By: Anna Henningsen <anna@addaleax.net> Reviewed-By: Bartosz Sosnowski <bartosz@janeasystems.com> Reviewed-By: Gireesh Punathil <gpunathi@in.ibm.com>
The node documentation fails to correctly document when the backlog of unsent messages exceeds a certain threshhold the function will return false. This does not mean it will refuse to send- simply that it will take time. Issue in point: nodejs/node#7657 (comment)
The node documentation fails to correctly document when the backlog of unsent messages exceeds a certain threshhold the function will return false. This does not mean it will refuse to send- simply that it will take time. Issue in point: nodejs/node#7657 (comment)
How about re-opening this issue until this fix in libuv lands in Node.JS? Now I had to run the test script to check if it works already (it doesn't) |
Notable changes: - Building via cmake is now supported. PR-URL: libuv/libuv#1850 - Stricter checks have been added to prevent watching the same file descriptor multiple times. PR-URL: libuv/libuv#1851 Refs: #3604 - An IPC deadlock on Windows has been fixed. PR-URL: libuv/libuv#1843 Fixes: #9706 Fixes: #7657 - uv_fs_lchown() has been added. PR-URL: libuv/libuv#1826 Refs: #19868 - uv_fs_copyfile() sets errno on error. PR-URL: libuv/libuv#1881 Fixes: #21329 - uv_fs_fchmod() supports -A files on Windows. PR-URL: libuv/libuv#1819 Refs: #12803 PR-URL: #21466 Reviewed-By: Anna Henningsen <anna@addaleax.net> Reviewed-By: Ben Noordhuis <info@bnoordhuis.nl> Reviewed-By: Santiago Gimeno <santiago.gimeno@gmail.com> Reviewed-By: James M Snell <jasnell@gmail.com>
The node documentation fails to correctly document when the backlog of unsent messages exceeds a certain threshhold the function will return false. This does not mean it will refuse to send- simply that it will take time. Issue in point: nodejs/node#7657 (comment)
- Adicionado QUEUE na comunicação IPC devido ao bug do envio sincrono no windows, causa congelamento da tela (nodejs/node#7657) - Otimizando código do FastScan, agora escreve arquivo temporário para passar a tarefa para WorkerUpload - Outras otimizações
Notable changes: - Building via cmake is now supported. PR-URL: libuv/libuv#1850 - Stricter checks have been added to prevent watching the same file descriptor multiple times. PR-URL: libuv/libuv#1851 Refs: nodejs#3604 - An IPC deadlock on Windows has been fixed. PR-URL: libuv/libuv#1843 Fixes: nodejs#9706 Fixes: nodejs#7657 - uv_fs_lchown() has been added. PR-URL: libuv/libuv#1826 Refs: nodejs#19868 - uv_fs_copyfile() sets errno on error. PR-URL: libuv/libuv#1881 Fixes: nodejs#21329 - uv_fs_fchmod() supports -A files on Windows. PR-URL: libuv/libuv#1819 Refs: nodejs#12803 PR-URL: nodejs#21466 Reviewed-By: Anna Henningsen <anna@addaleax.net> Reviewed-By: Ben Noordhuis <info@bnoordhuis.nl> Reviewed-By: Santiago Gimeno <santiago.gimeno@gmail.com> Reviewed-By: James M Snell <jasnell@gmail.com>
Notable changes: - Building via cmake is now supported. PR-URL: libuv/libuv#1850 - Stricter checks have been added to prevent watching the same file descriptor multiple times. PR-URL: libuv/libuv#1851 Refs: #3604 - An IPC deadlock on Windows has been fixed. PR-URL: libuv/libuv#1843 Fixes: #9706 Fixes: #7657 - uv_fs_lchown() has been added. PR-URL: libuv/libuv#1826 Refs: #19868 - uv_fs_copyfile() sets errno on error. PR-URL: libuv/libuv#1881 Fixes: #21329 - uv_fs_fchmod() supports -A files on Windows. PR-URL: libuv/libuv#1819 Refs: #12803 Backport-PR-URL: #24103 PR-URL: #21466 Reviewed-By: Anna Henningsen <anna@addaleax.net> Reviewed-By: Ben Noordhuis <info@bnoordhuis.nl> Reviewed-By: Santiago Gimeno <santiago.gimeno@gmail.com> Reviewed-By: James M Snell <jasnell@gmail.com>
Since we upgraded our app from 0.10.38 to v6, we experienced a lot of problem with IPC messaging.
Essentially, we have a web application with a few workers to handle all the requests. The workers contains caches to speed up the request and theses caches are synchronized between process with IPC messaging. We were also using log4js as a logging library with the clustered appender that uses IPC to send all child logs back to the master to have a single process handling the logs.
All was working fine under 0.10.38, but when we upgraded to 6.0.0 (and then 6.2.0) our app kept crashing under various circumstances
We soon realized that if we send too much data (or too fast) through IPC, that it was freezing our application.
We began refactoring our entire app to use IPC to the strict minimum.
All thoses changes are good for our application, since it reduced dependencies from master/worker and did a better separation of responsibilities, but I still see it as a flaw in Node.JS since the IPC is a fairly simple communication mechanism to exchange information between workers, but it seems so fragile now that we are afraid of using it.
I attached a simple script that reproduce the problem. It is not a real scenario, just a test case I created to reproduce the problem of the application that stop responding.
ipc_test_scripts.zip
On my laptop, the app crash at startup (or before the first log) with 5 forks (maybe because I have 4 physical core)
At first I tested with 3 workers and It froze after 5-10 minutes (all process CPU go down to 0 and there's no more log output)
If I remove the "bacon ipsum" from the worker message, it works (might freeze after a while)
If I increase the message interval from 1ms to 10ms, it works (might freeze after a while)
If I spawn only 4 workers it works (will probably freeze after 5-10 minutes)
If I execute it with 0.10.38 it works (as long as I ran it)
So if you play with the timings, size of messages and/or number of forks, you should be able to reproduce the problem.
One thing I observed is that the IPC messaging seem to have improve in performance big time from 0.10 to 6. If i run the test with 3 workers for 10 seconds with 0.10.38 the master only handle 1902 messages and in comparison with 6.3.0, in the same 10 seconds, the master handles 25514 messages.
I also tested it with 4.4.7 and it freeze at startup with 5 forks and after 4 minutes with 4 forks
My specs :
NodeJS Windows 6.3.0 64 bits (bug)
NodeJS Windows 6.2.0 64 bits (bug)
NodeJS Windows 4.4.7 64 bits (bug)
NodeJS Windows 0.10.38 64 bits (OK)
The text was updated successfully, but these errors were encountered: