Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bluepy-helper locking cpu on debian 9 when in a vm on proxmox #239

Open
dgonneau opened this issue Nov 23, 2017 · 17 comments
Open

bluepy-helper locking cpu on debian 9 when in a vm on proxmox #239

dgonneau opened this issue Nov 23, 2017 · 17 comments

Comments

@dgonneau
Copy link

Hi,

Running debian9 in a vm, we are several people from the jeedom community having a problem with our vm crashing/locking randomly after a certain amount of time (seems random).

The VM suddenly have one of its core used at 100% by a process, then seems to die.

Looking into the the kern.log, we see that the culprit seems to be pluepy-helper locking the cpu.

Nov 23 03:13:16 jeedom-debian9-bluetooth kernel: [13228.136257] NMI watchdog: BUG: soft lockup - CPU#2 stuck for 22s! [bluepy-helper:5618]
Nov 23 03:13:16 jeedom-debian9-bluetooth kernel: [13228.136294] CPU: 2 PID: 5618 Comm: bluepy-helper Tainted: G L 4.9.0-4-amd64 #1 Debian 4.9.51-1

I got 3 hours of this then nothing. Myu guess is that after 3 hours the VM died.

Strangely enough, this does not happen with debian 8.

@alexsarmiento
Copy link

alexsarmiento commented Jan 15, 2018

Same issue under ubuntu 16.04 and docker python:3.6 (Debian 8), for home-assistant and magicblue plugin . The bluepy-helper gets stuck at %100 CPU usage. As a workaround till this gets fixed, the helper can be killed with the monit service using this configuration:

check process bluepy-helper
	matching "bluepy-helper"
	stop program = "/usr/bin/killall bluepy-helper"
        if cpu usage > 15% for 1 cycles then restart

The parent python process will restart it .
Here's an excerpt from the monit.log


[EST Jan 14 11:04:26] error    : 'bluepy-helper' cpu usage of 84.1% matches resource limit [cpu usage>15.0%]
[EST Jan 14 11:04:26] info     : 'bluepy-helper' trying to restart
[EST Jan 14 11:04:26] info     : 'bluepy-helper' stop: /usr/bin/killall
[EST Jan 14 11:06:27] info     : 'bluepy-helper' cpu usage check succeeded [current cpu usage=0.0%]
[EST Jan 14 13:18:32] error    : 'bluepy-helper' cpu usage of 21.4% matches resource limit [cpu usage>15.0%]
[EST Jan 14 13:18:32] info     : 'bluepy-helper' trying to restart
[EST Jan 14 13:18:32] info     : 'bluepy-helper' stop: /usr/bin/killall
[EST Jan 14 13:20:32] info     : 'bluepy-helper' cpu usage check succeeded [current cpu usage=0.0%]
[EST Jan 14 13:56:33] error    : 'bluepy-helper' cpu usage of 36.7% matches resource limit [cpu usage>15.0%]
[EST Jan 14 13:56:33] info     : 'bluepy-helper' trying to restart
[EST Jan 14 13:56:33] info     : 'bluepy-helper' stop: /usr/bin/killall
[EST Jan 14 13:58:34] info     : 'bluepy-helper' cpu usage check succeeded [current cpu usage=0.0%]
[EST Jan 14 17:34:45] error    : 'bluepy-helper' cpu usage of 50.7% matches resource limit [cpu usage>15.0%]
[EST Jan 14 17:34:45] info     : 'bluepy-helper' trying to restart
[EST Jan 14 17:34:45] info     : 'bluepy-helper' stop: /usr/bin/killall
[EST Jan 14 17:36:46] info     : 'bluepy-helper' cpu usage check succeeded [current cpu usage=0.0%]
[EST Jan 14 18:22:49] error    : 'bluepy-helper' cpu usage of 99.9% matches resource limit [cpu usage>15.0%]
[EST Jan 14 18:22:49] info     : 'bluepy-helper' trying to restart
[EST Jan 14 18:22:49] info     : 'bluepy-helper' stop: /usr/bin/killall
[EST Jan 14 18:24:49] info     : 'bluepy-helper' cpu usage check succeeded [current cpu usage=0.0%]

As you can see the thing hangs randomly every few hours.

@jimreich
Copy link

jimreich commented Apr 28, 2018

I'm getting this same bug (or something that very much looks like it) on debian stretch on a Raspberry pi. Bluepy1.1.4. No virtualization. Happens maybe once per day.

I'm pretty sure I saw the same issue back on Jessie. I used to solve it by automatically bouncing my whole app and letting upstart bring me back. But this seems at least 50% less icky, while still being a hideous solution.

@jimreich
Copy link

Except that it didn't quite work longer-term -- around 12 hours later, I get the error "An error occurred: Error from Bluetooth stack (badstate)" after a while, even after killing and restarting bluepy-helper. And this error didn't have the telltale 100% CPU on bluepy-helper.

@Overbull
Copy link

Same problem on Debian 9.4 and ESXi 6.5 U2.

@alexsarmiento in my case monit didn't works.

@pinootto
Copy link

I have the same problem on raspberry pi. After a while (around 1 day) the system has a lot of bluepy-helper processes in state D (uninterruptible sleep), which I cannot kill:

root 32545 0.0 0.0 3644 696 ? D 18:11 0:00 /usr/local/lib/python2.7/dist-packages/bluepy/bluepy-helper
root 32553 0.0 0.0 3644 720 ? D 18:11 0:00 /usr/local/lib/python2.7/dist-packages/bluepy/bluepy-helper
root 32568 0.0 0.0 3644 640 ? D 18:12 0:00 /usr/local/lib/python2.7/dist-packages/bluepy/bluepy-helper
root 32576 0.0 0.0 3644 668 ? D 18:12 0:00 /usr/local/lib/python2.7/dist-packages/bluepy/bluepy-helper
root 32591 0.0 0.0 3644 668 ? D 18:13 0:00 /usr/local/lib/python2.7/dist-packages/bluepy/bluepy-helper
root 32599 0.0 0.0 3644 648 ? D 18:13 0:00 /usr/local/lib/python2.7/dist-packages/bluepy/bluepy-helper
root 32614 0.0 0.0 3644 712 ? D 18:14 0:00 /usr/local/lib/python2.7/dist-packages/bluepy/bluepy-helper
root 32622 0.0 0.0 3644 624 ? D 18:14 0:00 /usr/local/lib/python2.7/dist-packages/bluepy/bluepy-helper
root 32647 0.0 0.0 3644 664 ? D 18:15 0:00 /usr/local/lib/python2.7/dist-packages/bluepy/bluepy-helper
root 32656 0.0 0.0 3644 648 ? D 18:15 0:00 /usr/local/lib/python2.7/dist-packages/bluepy/bluepy-helper
root 32671 0.0 0.0 3644 624 ? D 18:16 0:00 /usr/local/lib/python2.7/dist-packages/bluepy/bluepy-helper
root 32679 0.0 0.0 3644 588 ? D 18:16 0:00 /usr/local/lib/python2.7/dist-packages/bluepy/bluepy-helper
root 32708 0.0 0.0 3644 656 ? D 18:17 0:00 /usr/local/lib/python2.7/dist-packages/bluepy/bluepy-helper
root 32716 0.0 0.0 3644 616 ? D 18:17 0:00 /usr/local/lib/python2.7/dist-packages/bluepy/bluepy-helper
root 32731 0.0 0.0 3644 704 ? D 18:18 0:00 /usr/local/lib/python2.7/dist-packages/bluepy/bluepy-helper
root 32739 0.0 0.0 3644 668 ? D 18:18 0:00 /usr/local/lib/python2.7/dist-packages/bluepy/bluepy-helper
root 32754 0.0 0.0 3644 696 ? D 18:19 0:00 /usr/local/lib/python2.7/dist-packages/bluepy/bluepy-helper
root 32762 0.0 0.0 3644 668 ? D 18:19 0:00 /usr/local/lib/python2.7/dist-packages/bluepy/bluepy-helper

The only way I found is to reboot the raspberry pi.

Is there a fix for this issue?

@irielord
Copy link

Same problem 100% CPU on bluepy-helper on debian9 4.9.0-6-amd64 ... unusable for me
Still no fix ?
**

@pbac
Copy link

pbac commented Jul 21, 2018

Same problem here, on debian9 and ESXi 6.7
I've conducted many test :

  • debian 8 + ESXi : Stable
  • debian 9 + ESXI with 1 vCPU : Stable
  • debian 9 + ESXi with more than 1 vCPU : crash after 1-2 hours.

@IanHarvey
Copy link
Owner

I have just made a 1.2.0 release of Bluepy which has updated the underlying Bluez sources to 5.47, which has a lot of bugs fixed since the original 5.29.

It would be useful to know if this changes the behaviour in any way.

@jbfuzier
Copy link

jbfuzier commented Aug 8, 2018

Thanks, the issue is still there with v1.2.0

@jimreich
Copy link

jimreich commented Aug 8, 2018

I still have the bluepy-helper randomly just die midstream with 1.2.0/5.47, and it happens considerably more frequently than it used to. I don't see anything obvious in daemon.log-- is there anywhere else I should look?

I haven't seen the case where it goes to 100% CPU yet, though, but it's only been a day.

Bluez is up to 5.50 now, although I don't see anything that looks like a related bug.

Any ideas on how to debug, or what would make bluepy-helper unexpectedly disappear?

@DylanSale
Copy link

I'm getting this on Ubuntu 18.04 with a UD100 device running in Docker.
I started the server with:
docker run -d --restart always --name server --net host --cap-add NET_ADMIN --device /dev/bus/usb server:1.0

and it works fine for a few days, then after that it stops responding (I have a lock so only one thread can run bluetooth commands at once, which is causing a deadlock).
Top shows:
27541 root 20 0 12192 1128 1020 R 100.0 0.0 222:49.27 bluepy-helper
The last things to happen on the device was that it was connected to a bluetooth device (iPhone) and was sending a GATT characteristic to it when it stopped responding.

running strace on the bluepy_helper process shows it seems to be stuck in an infinite loop running poll forever.

poll([{fd=0, events=POLLIN}, {fd=3, events=POLLIN}, {fd=4, events=POLLIN}, {fd=5, events=0}], 4, -1) = 1 ([{fd=5, revents=POLLNVAL}])
poll([{fd=0, events=POLLIN}, {fd=3, events=POLLIN}, {fd=4, events=POLLIN}, {fd=5, events=0}], 4, -1) = 1 ([{fd=5, revents=POLLNVAL}])
poll([{fd=0, events=POLLIN}, {fd=3, events=POLLIN}, {fd=4, events=POLLIN}, {fd=5, events=0}], 4, -1) = 1 ([{fd=5, revents=POLLNVAL}])
poll([{fd=0, events=POLLIN}, {fd=3, events=POLLIN}, {fd=4, events=POLLIN}, {fd=5, events=0}], 4, -1) = 1 ([{fd=5, revents=POLLNVAL}])
poll([{fd=0, events=POLLIN}, {fd=3, events=POLLIN}, {fd=4, events=POLLIN}, {fd=5, events=0}], 4, -1) = 1 ([{fd=5, revents=POLLNVAL}])

Any help would be appreciated.

@DylanSale
Copy link

I discovered that the issue for me was the peripheral application was throwing an exception and failing to return a response for the write (it has withResponse=True set). btmon showed that the connection was eventually being terminated (likely the cause of the socket being closed in the strace output?).

I think this is a bug in bluepy_helper - if the connection is closed on a failed write with response set, the helper should exit and an error should be returned, rather than seemingly retrying forever.

@nemccarthy
Copy link

+1 seeing this error running on rpi 4 with home assistant in a virtual env. Have the latest versions of bluez and bluepy. See similar behaviour as @DylanSale

@xsun1213
Copy link

xsun1213 commented Nov 2, 2019

Bugfix of high CPU usage

After the BLE Connection is established, a L2CAP socket will be added the GLib
Poll Set via "g_io_add_watch()". But unfortunately the "G_IO_NVAL" is missing
in the Condition Parameter of the "g_io_add_watch()".
Therefore if Disconnection happens (no matter it's triggered by the BluePy side,
or another side), the L2CAP Socket will be closed by the bluez low-level ballbacks,
e.g. 'bluez-5.29/src/shared/att.c -> timeout_cb()'. Namely the 'EBADF'
will happen immediately at the next poll.
The missing condition "G_IO_NVAL" causes that the cleanup callback "channel_watcher"
will not be called. In this case, the GIO Event Source will not be removed from the GLib
Poll Set properly. And the following infinite Loop happens and raises CPU Usage to 100%.

poll([{fd=0, events=POLLIN}, {fd=4, events=POLLIN}, {fd=5, events=0}], 3, -1) = 2 ([{fd=4, revents=POLLIN}, {fd=5, revents=POLLNVAL}])
// cleanup should happen, but not
poll([{fd=0, events=POLLIN}, {fd=4, events=POLLIN}, {fd=5, events=0}], 3, -1) = 1 ([{fd=5, revents=POLLNVAL}])
poll([{fd=0, events=POLLIN}, {fd=4, events=POLLIN}, {fd=5, events=0}], 3, -1) = 1 ([{fd=5, revents=POLLNVAL}])
poll([{fd=0, events=POLLIN}, {fd=4, events=POLLIN}, {fd=5, events=0}], 3, -1) = 1 ([{fd=5, revents=POLLNVAL}])
poll([{fd=0, events=POLLIN}, {fd=4, events=POLLIN}, {fd=5, events=0}], 3, -1) = 1 ([{fd=5, revents=POLLNVAL}])

hereby is the patch to fix

0001-Bugfix-of-high-CPU-usage.patch.txt

@michapr
Copy link

michapr commented Mar 26, 2020

@xsun1213 it seems it was not merged until today, right?

Thanks,
Michael

@sneko
Copy link

sneko commented Jul 7, 2020

@IanHarvey it seems the patch above has been "approved" by multiple people. Do you think it could benefit from a new release :) ? There is no breaking change as far as I know.

It would be great! Thank you for your work,

@8none1
Copy link

8none1 commented Sep 16, 2021

Looks like the patch was merged in 8818eb2
and this issue can now be closed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests