-
-
Notifications
You must be signed in to change notification settings - Fork 203
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[bug] Application stops responding with CPU stuck at 100% #174
Comments
@AlCalzone I see a tons of:
This happens for multiple nodes. What could it be? |
Once we've sorted out the logging issue, I'll need a zwave-js log to see exactly what is going on. |
What make/model is node 54? Those three values are typical of an energy report being sent. |
It's a Qubino ZMNHBD2 Flush 2 Relays with temperature probe. Firmware 1.02. Is there anything I can do to get more informative logs? |
@AlCalzone Separate from this specific issue... Obviously the controller can’t do anything about a device flooding the mesh itself, but is it feasible to throttle misbehaving devices and drop excessive messages in some manner? Or to drop meter-type reports altogether if the system gets under too heavy of a load? The idea being that the mesh may go to hell but at least the broker stays running and accessible to reconfigure the device/debug. |
Wait for us to fix logging :) |
@blhoward2 good idea ==> zwave-js/node-zwave-js#1318 |
@kaaelhaa Is this a new device? Was it working previously with another platform (without any config changes)? |
@blhoward2 The device have been in the network for at least a few years. It has always been very chatty though. Same goes for node 36 which is also another type of Qubino device (ZMNHAD1 Flush 1 Relay). Have been in use on Z-way, HASS and qt-openzwave previously without issues. |
So, I executed Processed the profiler output and put it in this Gist: https://gist.github.com/kaaelhaa/4ce04c21554d69082cbb1a48aaa248d3 Do note I got several errors logged by the processor like
Haven't investigated those yet, so I don't know how much they impact the result. But, from the output of the processor it seems to be stuck somewhere in https://github.com/zwave-js/zwavejs2mqtt/blob/master/lib/Gateway.js#L1242. This makes somewhat sense, as Node 48 is a climate device (Eurotronic Comet Z ). Node 51 and 83 are similar devices. |
My guess is on this loop: There's probably a more performant way than looping through all values (which can be many) every few ms. |
@AlCalzone I tried adding some extra logging to that loop and the 20 node values for node 48 is looped through in an instant. |
My testing confirms this loop is infinite: That's what causes everything to freeze. Is it necessary to have this stanza as it seems #68 improved the logic quite a bit? |
Let me check this. |
@kaaelhaa Really bad, thanks for the issue report! Fixed on master now! |
Version
Build/Run method
zwavejs2mqtt version: 1.0.0-alpha.2
Describe the bug
Few minutes after starting the application (ranging from 10-30 minutes) it will stop responding and hog one CPU thread at 100%.
This always happens when zwavejs2mqtt has logged the following:
I have uploaded an export of node 48 and log files from zwavejs2mqtt and zwavejs to this Gist: https://gist.github.com/kaaelhaa/6ea4695ed2b5b556b70c0968a59fb49c
To Reproduce
I can reproduce reliably. This always happens after the log lines for Node 48 has been output.
Replicated on latest stable and dev versions.
Expected behavior
Application should be responsive.
Additional context
N/A.
The text was updated successfully, but these errors were encountered: