-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Telemetry server randomly crashes with segmentation fault #51
Comments
Today it has been difficult to reproduce the issue. recurrent error:Every time it occurred, the terminal error log is always the same as the one in the description.
Node.js exited with error 139. As per the exit codes in https://nodejs.org/api/process.html#exit-codes, the error code = (128+11),
...so the actual signal code is 11, which matches the same segmentation we see in the terminal log. Which process died?That segmentation fault occurred in the Node.js main process running the script npm log contentThe generated
The stack log isn't deep enough as it starts at the callback |
I wasn't able to break at the segmentation fault using a breakpoint in Here is the breakpoint location w.r.t. the function call tree we expect when the segmentation fault occurs:
Another way is to add a listener to the signal
...it's worth trying. There is actually a package that kind of does the same thing:
then add the following lines to the very beginning of the main application script: const SegfaultHandler = require('segfault-handler');
SegfaultHandler.registerHandler('crash.log'); The module listens for any SIGSEGV signal, and reports the detailed stack trace that caused it before the process shuts down. |
Segfault Handler outputAfter the segmentation fault occurs, we get the following output log:
So, the issue occurs in More precisely, the issue occurs in:
|
I realised that the problem might be related to the Yarp version used during the server installation and the one used to run the server are different.
I'm doing some further checks:
(*) Using https://www.npmjs.com/package/segfault-handler (bsed on the project https://github.com/JochenKalmbach/StackWalker) and |
Not sure if it can be useful to try to run the server under |
I'm doing in parallel:
|
WIP: Analyse YarpJS_BufferedPort_Bottle::_callback_onRead
Disassemble
=> address 0x120c0, Demangled format...
Don't know yet how to generate the source code to better understand the failure... |
First Memory leak analysis using Valgrind or DrMemoryIt was easier to go for Valgrind. It seems that Valgrind official release (latest is not yet supported on MacOS Catalina (10.15). But there is a modified fork fixing that problem (https://github.com/LouisBrunner/valgrind-macos), which can be installed from source by downloading the repository and building or via Installation from source:
Installation via homebrew:
Using Valgring requires installing the target modules (YARP) with debugging symbols. So, after building the whole superbuild in Release mode, we build YARP in Debug mode...
Valgrind output
I have to rule out false errors, so for now the output doesn't seem very helpful. |
ping @traversaro @S-Dafarra |
If we are blocked on this, could it make sense to work on robotology/yarp.js#19 and hope that the situation changes/improves with that? Not really a rational move, but sometimes working on something else leads to fixes to other issues (so-called "serendipity"). |
Yes, good point. I actually prepared for that as I duplicated my setups, having one for resuming the other development/fixes, and one for running the memory tests on the background. |
I'm currently configuring CLion setup for debugging and memory leek analysis. This tool has the approximately the same workflow than Webstorm and integrates already Valgrind, so it's quite promising, even for future eventual issues. |
I'm currently configuring CLion setup for debugging and memory leek analysis. This tool has the approximately the same workflow than Webstorm and integrates already Valgrind, so it's quite promising, even for future eventual issues. Refer to issue robotology/yarp.js#31 and particularly robotology/yarp.js#31 (comment). |
Valgrind Memceck Analysis of YarpJS TargetConfigurationWe setup a Run/Debug configuration of type "CMake Application" with the following settings:
RunRun the YarpJS target with Valgrind Memcheck ( After a couple of hours running, no segmentation fault occurred so the program was stopped manually and take a look at the warnings in the Valgrind pane. We can see warnings in the "InvalidRead" and "Leak_DefinitelyLost" sections. We can see that the memory leak and access problem occurs in the function call |
I think that a memory leak does not explain a segmentation fault (unless you run out of memory I suppose). What other warnings are there? Anything related to uninitialized memory for example? Edit: The "InvalidRead" seems more juicy instead. Also it might be worth compiling in |
Yes, the invalid read was the warning that most worried me, I just did't complete the description in the above comment yet. Anyway, the point is that the issue is occurring in that function call sequence.
Indeed, but I had already reproduced the segfault in the past with the debug build. I'll try anyway with |
I was able to reproduce the segmentation fault after loading the Yarp project under Xcode and attaching to the Node.js process. The debugger breaked on the line causing the segmentation fault occurring after the call sequence:
Analysing... |
I will open later an issue describing the procedure to setup Xcode. |
Quick initial analysis... In line 322 we are dereferencing an invalid address since
By the way, in line 318
|
This is correct, at least as long as |
I think there is something fishy. You wrote that in #51 (comment) the size is 0, but still you get into the for loop, and you shouldn't. Infact |
yes that was what stroke me most.
I have the full function call sequence before the segfault in Xcode environment, so we can check all that. |
Just a clarification with respect to a previous discussion we had during the last software engineering update: the problem is not related to |
I was observing a bottle We can indeed see garbage in the But this was a nested print
I wonder why Analyzing now what |
NoteI'm reproducing the issue with a minimalist javascript code executed under the node command line interface: var yarp = require('./yarp')
var port = yarp.portHandler.open('/yarpjs/left_leg/stateExt:i')
port.onRead(function (bottle){console.log(bottle.toArray())})
yarp.Network.connect('/icubSim/left_leg/stateExt:o','/yarpjs/left_leg/stateExt:i') |
This became irrelevant as we can have corrupted data at the very beginning of the |
Moving the issue to robotology/yarp.js#32 since we can reproduce it without running yarp-openmct server. |
Error message log:
The text was updated successfully, but these errors were encountered: