-
Notifications
You must be signed in to change notification settings - Fork 29.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Investigate flaky test-worker-syntax-error-file #22762
Comments
I have a hard time attempting to reproduce this… do people on @nodejs/platform-windows have more luck on Windows? |
Cannot reproduce on Win10 |
Not exactly sure but this seems to be related:
Example
|
Refs: nodejs#20750 Refs: nodejs#22327 Refs: nodejs#22762 Refs: nodejs/reliability#16 PR-URL: nodejs#22941 Reviewed-By: Colin Ihrig <cjihrig@gmail.com> Reviewed-By: Ruben Bridgewater <ruben@bridgewater.de> Reviewed-By: Anna Henningsen <anna@addaleax.net> Reviewed-By: James M Snell <jasnell@gmail.com> Reviewed-By: Refael Ackermann <refack@gmail.com>
Refs: #20750 Refs: #22327 Refs: #22762 Refs: nodejs/reliability#16 PR-URL: #22941 Reviewed-By: Colin Ihrig <cjihrig@gmail.com> Reviewed-By: Ruben Bridgewater <ruben@bridgewater.de> Reviewed-By: Anna Henningsen <anna@addaleax.net> Reviewed-By: James M Snell <jasnell@gmail.com> Reviewed-By: Refael Ackermann <refack@gmail.com>
https://ci.nodejs.org/job/node-test-commit-linux/22621/nodes=centos7-64-gcc6/console 22:48:52 not ok 2016 parallel/test-worker-syntax-error-file
22:48:52 ---
22:48:52 duration_ms: 0.343
22:48:52 severity: fail
22:48:52 exitcode: 1
22:48:52 stack: |-
22:48:52 Mismatched <anonymous> function calls. Expected exactly 1, actual 0.
22:48:52 at Object.mustCall (/home/iojs/build/workspace/node-test-commit-linux/nodes/centos7-64-gcc6/test/common/index.js:306:10)
22:48:52 at Object.<anonymous> (/home/iojs/build/workspace/node-test-commit-linux/nodes/centos7-64-gcc6/test/parallel/test-worker-syntax-error-file.js:13:24)
22:48:52 at Module._compile (internal/modules/cjs/loader.js:707:30)
22:48:52 at Object.Module._extensions..js (internal/modules/cjs/loader.js:718:10)
22:48:52 at Module.load (internal/modules/cjs/loader.js:605:32)
22:48:52 at tryModuleLoad (internal/modules/cjs/loader.js:544:12)
22:48:52 at Function.Module._load (internal/modules/cjs/loader.js:536:3)
22:48:52 at Function.Module.runMain (internal/modules/cjs/loader.js:760:12)
22:48:52 at startup (internal/bootstrap/node.js:304:19)
22:48:52 ... |
AIX Mismatched <anonymous> function calls. Expected exactly 1, actual 0.
at Object.mustCall (/home/iojs/build/workspace/node-test-commit-aix/nodes/aix61-ppc64/test/common/index.js:312:10)
at Object.<anonymous> (/home/iojs/build/workspace/node-test-commit-aix/nodes/aix61-ppc64/test/parallel/test-worker-syntax-error.js:12:24)
at Module._compile (internal/modules/cjs/loader.js:722:30)
at Object.Module._extensions..js (internal/modules/cjs/loader.js:733:10)
at Module.load (internal/modules/cjs/loader.js:620:32)
at tryModuleLoad (internal/modules/cjs/loader.js:560:12)
at Function.Module._load (internal/modules/cjs/loader.js:552:3)
at Function.Module.runMain (internal/modules/cjs/loader.js:775:12)
at startup (internal/bootstrap/node.js:300:19) |
Couldn't reproduce this in 10000 runs on either x64 Linux or x64 Windows. |
|
It's back. https://ci.nodejs.org/job/node-test-commit-linux/27284/nodes=centos7-64-gcc6/console test-rackspace-centos7-x64-1 00:13:00 not ok 2142 parallel/test-worker-syntax-error-file
00:13:00 ---
00:13:00 duration_ms: 0.487
00:13:00 severity: fail
00:13:00 exitcode: 1
00:13:00 stack: |-
00:13:00 Mismatched <anonymous> function calls. Expected exactly 1, actual 0.
00:13:00 at Object.mustCall (/home/iojs/build/workspace/node-test-commit-linux/nodes/centos7-64-gcc6/test/common/index.js:339:10)
00:13:00 at Object.<anonymous> (/home/iojs/build/workspace/node-test-commit-linux/nodes/centos7-64-gcc6/test/parallel/test-worker-syntax-error-file.js:12:24)
00:13:00 at Module._compile (internal/modules/cjs/loader.js:759:30)
00:13:00 at Object.Module._extensions..js (internal/modules/cjs/loader.js:770:10)
00:13:00 at Module.load (internal/modules/cjs/loader.js:628:32)
00:13:00 at Function.Module._load (internal/modules/cjs/loader.js:555:12)
00:13:00 at Function.Module.runMain (internal/modules/cjs/loader.js:824:10)
00:13:00 at internal/main/run_main_module.js:17:11
00:13:00 ... |
@Trott I don’t think this can be fixed without somebody logging into the CI machines and figuring out where one can get a reasonable reproduction rate, and then work on that machine (Debug logs might also be helpful, esp. |
Was able to replicate it on test-rackspace-centos7-x64-1 with Running it prefaced with === release test-worker-syntax-error-file ===
Path: parallel/test-worker-syntax-error-file
Mismatched <anonymous> function calls. Expected exactly 1, actual 0.
at Object.mustCall (/home/iojs/build/workspace/node-test-commit-linux/nodes/centos7-64-gcc6/test/common/index.js:339:10)
at Object.<anonymous> (/home/iojs/build/workspace/node-test-commit-linux/nodes/centos7-64-gcc6/test/parallel/test-worker-syntax-error-file.js:12:24)
at Module._compile (internal/modules/cjs/loader.js:759:30)
at Object.Module._extensions..js (internal/modules/cjs/loader.js:770:10)
at Module.load (internal/modules/cjs/loader.js:628:32)
at Function.Module._load (internal/modules/cjs/loader.js:555:12)
at Function.Module.runMain (internal/modules/cjs/loader.js:824:10)
at internal/main/run_main_module.js:17:11
Worker (0:2) Creating new worker instance with thread id 1
MessagePort (0:3) Created message port
Worker (0:2) Preparation for worker 1 finished
MessagePort (0:3) Start receiving messages
MessagePort (0:3) Start receiving messages
MessagePort (0:6) Created message port
MessagePort (0:7) Created message port
MessagePort (0:6) Start receiving messages
MessagePort (0:7) Closing message port, data set = 1
MessagePort (0:6) Start receiving messages
MessagePort (0:7) MessagePort::OnClose()
Worker (0:2) Creating isolate for worker with id 1
Worker (0:2) Starting worker with id 1
Worker (0:2) Created Environment for worker with id 1
MessagePort (1:2) Created message port
Worker (0:2) Created message port for worker 1
MessagePort (1:2) Start receiving messages
MessagePort (1:2) Start receiving messages
Worker (0:2) Loaded environment for worker 1
MessagePort (1:2) Running MessagePort::OnMessage()
MessagePort (1:2) MessagePort has message, receiving = 1
MessagePort (1:3) Created message port
MessagePort (0:3) Adding message to incoming queue
MessagePort (0:3) Running MessagePort::OnMessage()
MessagePort (0:3) MessagePort has message, receiving = 1
MessagePort (0:3) MessagePort has message, receiving = 1
MessagePort (0:3) Adding message to incoming queue
Worker (0:2) Worker 1 called Exit(1)
Worker (0:2) Exiting thread for worker 1 with exit code 1
MessagePort (1:2) Closing message port, data set = 1
MessagePort (1:2) Closing message port, data set = 1
MessagePort (1:3) Closing message port, data set = 1
MessagePort (1:3) MessagePort::OnClose()
MessagePort (1:2) MessagePort::OnClose()
MessagePort (0:3) Closing message port, data set = 1
MessagePort (0:6) Running MessagePort::OnMessage()
MessagePort (0:6) MessagePort has message, receiving = 1
MessagePort (0:6) Closing message port, data set = 1
MessagePort (0:6) MessagePort::OnClose()
MessagePort (0:3) MessagePort::OnClose()
Worker (0:2) Worker 1 thread stops
Worker (0:2) Worker 1 dispose isolate
Worker (0:2) Worker 1 destroyed
Command: out/Release/node /home/iojs/build/workspace/node-test-commit-linux/nodes/centos7-64-gcc6/test/parallel/test-worker-syntax-error-file.js |
@Trott Thanks, that’s definitely helpful.
For some reasons, it seems the internal message port on the parent side closes prematurely, before the @Trott Do you think you could run this with this extra debugging patch + post failure output again? I’m not sure if it helps, but it could confirm that a) we close the message port with a message still pending and b) help figure out why the message port is being closed in this way to begin with. diff --git a/src/node_messaging.cc b/src/node_messaging.cc
index 77ed959477f2..37f0080f9c46 100644
--- a/src/node_messaging.cc
+++ b/src/node_messaging.cc
@@ -516,6 +516,7 @@ void MessagePort::TriggerAsync() {
void MessagePort::Close(v8::Local<v8::Value> close_callback) {
Debug(this, "Closing message port, data set = %d", static_cast<int>(!!data_));
+ DumpBacktrace(stderr);
if (data_) {
// Wrap this call with accessing the mutex, so that TriggerAsync()
@@ -640,7 +641,8 @@ bool MessagePort::IsSiblingClosed() const {
}
void MessagePort::OnClose() {
- Debug(this, "MessagePort::OnClose()");
+ Debug(this, "MessagePort::OnClose() with %zd pending messages",
+ (ssize_t)(data_ ? data_->incoming_messages_.size() : -1));
if (data_) {
data_->owner_ = nullptr;
data_->Disentangle(); |
@addaleax Here's the output with your patch: === release test-worker-syntax-error-file ===
Path: parallel/test-worker-syntax-error-file
Mismatched <anonymous> function calls. Expected exactly 1, actual 0.
at Object.mustCall (/home/iojs/build/workspace/node-test-commit-linux/nodes/centos7-64-gcc6/test/common/index.js:339:10)
at Object.<anonymous> (/home/iojs/build/workspace/node-test-commit-linux/nodes/centos7-64-gcc6/test/parallel/test-worker-syntax-error-file.js:12:24)
at Module._compile (internal/modules/cjs/loader.js:759:30)
at Object.Module._extensions..js (internal/modules/cjs/loader.js:770:10)
at Module.load (internal/modules/cjs/loader.js:628:32)
at Function.Module._load (internal/modules/cjs/loader.js:555:12)
at Function.Module.runMain (internal/modules/cjs/loader.js:824:10)
at internal/main/run_main_module.js:17:11
Worker (0:2) Creating new worker instance with thread id 1
MessagePort (0:3) Created message port
Worker (0:2) Preparation for worker 1 finished
MessagePort (0:3) Start receiving messages
MessagePort (0:3) Start receiving messages
MessagePort (0:6) Created message port
MessagePort (0:7) Created message port
MessagePort (0:6) Start receiving messages
MessagePort (0:7) Closing message port, data set = 1
1: 0x9d1c42 node::worker::Message::Serialize(node::Environment*, v8::Local<v8::Context>, v8::Local<v8::Value>, v8::Local<v8::Value>, v8::Local<v8::Object>) [out/Release/node]
2: 0x9d2037 node::worker::MessagePort::PostMessage(node::Environment*, v8::Local<v8::Value>, v8::Local<v8::Value>) [out/Release/node]
3: 0x9d252c node::worker::MessagePort::PostMessage(v8::FunctionCallbackInfo<v8::Value> const&) [out/Release/node]
4: 0xb8c926 [out/Release/node]
5: 0xb8e839 v8::internal::Builtin_HandleApiCall(int, unsigned long*, v8::internal::Isolate*) [out/Release/node]
6: 0x1a84022 [out/Release/node]
MessagePort (0:6) Start receiving messages
MessagePort (0:7) MessagePort::OnClose() with -1 pending messages
Worker (0:2) Creating isolate for worker with id 1
Worker (0:2) Starting worker with id 1
Worker (0:2) Created Environment for worker with id 1
MessagePort (1:2) Created message port
Worker (0:2) Created message port for worker 1
MessagePort (1:2) Start receiving messages
MessagePort (1:2) Start receiving messages
Worker (0:2) Loaded environment for worker 1
MessagePort (1:2) Running MessagePort::OnMessage()
MessagePort (1:2) MessagePort has message, receiving = 1
MessagePort (1:3) Created message port
MessagePort (0:3) Adding message to incoming queue
MessagePort (0:3) Running MessagePort::OnMessage()
MessagePort (0:3) MessagePort has message, receiving = 1
MessagePort (0:3) MessagePort has message, receiving = 1
MessagePort (0:3) Adding message to incoming queue
Worker (0:2) Worker 1 called Exit(1)
Worker (0:2) Exiting thread for worker 1 with exit code 1
MessagePort (1:2) Closing message port, data set = 1
1: 0x9d0d4e node::worker::MessagePort::Close(v8::Local<v8::Value>) [out/Release/node]
2: 0xa19338 [out/Release/node]
3: 0xa1dc36 node::worker::Worker::Run() [out/Release/node]
4: 0xa1e19c [out/Release/node]
5: 0x7fa13e1c3dd5 [/lib64/libpthread.so.0]
6: 0x7fa13deecead clone [/lib64/libc.so.6]
MessagePort (1:2) Closing message port, data set = 1
1: 0x9d0d4e node::worker::MessagePort::Close(v8::Local<v8::Value>) [out/Release/node]
2: 0x932db5 node::Environment::CleanupHandles() [out/Release/node]
3: 0x93304b node::Environment::RunCleanup() [out/Release/node]
4: 0xa1938f [out/Release/node]
5: 0xa1dc36 node::worker::Worker::Run() [out/Release/node]
6: 0xa1e19c [out/Release/node]
7: 0x7fa13e1c3dd5 [/lib64/libpthread.so.0]
8: 0x7fa13deecead clone [/lib64/libc.so.6]
MessagePort (1:3) Closing message port, data set = 1
1: 0x9d0d4e node::worker::MessagePort::Close(v8::Local<v8::Value>) [out/Release/node]
2: 0x932db5 node::Environment::CleanupHandles() [out/Release/node]
3: 0x93304b node::Environment::RunCleanup() [out/Release/node]
4: 0xa1938f [out/Release/node]
5: 0xa1dc36 node::worker::Worker::Run() [out/Release/node]
6: 0xa1e19c [out/Release/node]
7: 0x7fa13e1c3dd5 [/lib64/libpthread.so.0]
8: 0x7fa13deecead clone [/lib64/libc.so.6]
MessagePort (1:3) MessagePort::OnClose() with 0 pending messages
MessagePort (1:2) MessagePort::OnClose() with 2 pending messages
Worker (0:2) Worker 1 thread stops
Worker (0:2) Worker 1 dispose isolate
MessagePort (0:3) Closing message port, data set = 1
1: 0x9d392b node::worker::MessagePort::OnMessage() [out/Release/node]
2: 0xae597f [out/Release/node]
3: 0xaf7228 [out/Release/node]
4: 0xae62bb uv_run [out/Release/node]
5: 0x9cdabc node::NodeMainInstance::Run() [out/Release/node]
6: 0x964041 node::Start(int, char**) [out/Release/node]
7: 0x7fa13de113d5 __libc_start_main [/lib64/libc.so.6]
8: 0x902655 [out/Release/node]
MessagePort (0:6) Running MessagePort::OnMessage()
MessagePort (0:6) MessagePort has message, receiving = 1
MessagePort (0:6) Closing message port, data set = 1
1: 0x9d392b node::worker::MessagePort::OnMessage() [out/Release/node]
2: 0xae597f [out/Release/node]
3: 0xaf7228 [out/Release/node]
4: 0xae62bb uv_run [out/Release/node]
5: 0x9cdabc node::NodeMainInstance::Run() [out/Release/node]
6: 0x964041 node::Start(int, char**) [out/Release/node]
7: 0x7fa13de113d5 __libc_start_main [/lib64/libc.so.6]
8: 0x902655 [out/Release/node]
MessagePort (0:6) MessagePort::OnClose() with 0 pending messages
MessagePort (0:3) MessagePort::OnClose() with 1 pending messages
Worker (0:2) Worker 1 destroyed |
When a `MessagePort` connected to another `MessagePort` closes, the latter `MessagePort` will be closed as well. Until now, this is done by testing whether the ports are still entangled after processing messages. This leaves open a race condition window in which messages sent just before the closure can be lost when timing is unfortunate. (A description of the timing is in the test file.) This can be addressed by using a special message instead, which is the last message received by a `MessagePort`. This way, all previously sent messages are processed first. Fixes: nodejs#22762
When a `MessagePort` connected to another `MessagePort` closes, the latter `MessagePort` will be closed as well. Until now, this is done by testing whether the ports are still entangled after processing messages. This leaves open a race condition window in which messages sent just before the closure can be lost when timing is unfortunate. (A description of the timing is in the test file.) This can be addressed by using a special message instead, which is the last message received by a `MessagePort`. This way, all previously sent messages are processed first. Fixes: #22762 PR-URL: #27705 Reviewed-By: Rich Trott <rtrott@gmail.com> Reviewed-By: Colin Ihrig <cjihrig@gmail.com>
https://ci.nodejs.org/job/node-test-binary-windows/19875/COMPILED_BY=vs2017,RUNNER=win10,RUN_SUBSET=3/console
@nodejs/workers
The text was updated successfully, but these errors were encountered: