Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DBus backend high CPU usage and it grows over time #500

Closed
devbis opened this issue Apr 1, 2021 · 16 comments · Fixed by #902
Closed

DBus backend high CPU usage and it grows over time #500

devbis opened this issue Apr 1, 2021 · 16 comments · Fixed by #902
Labels
Backend: BlueZ Issues and PRs relating to the BlueZ backend bug Something isn't working

Comments

@devbis
Copy link
Contributor

devbis commented Apr 1, 2021

  • bleak version: 0.11.0
  • Python version: 3.7.9
  • Operating System: OpenWrt 19.07
  • BlueZ version (bluetoothctl -v) in case of Linux: 5.50

Description

I'm using bleak to run scanning devices and connect to some other devices.
When I start my application it consumes high CPU percent but it is affordable.
About 30-60% of the 800MHz device.
And my application is responsive at the start so it is affordable.

After working for 10-20 hours CPU usage raising and it could take 95-100% all the time.
The problem It that it leads to timeout exceptions while tasks don't get their CPU time.

What I Did

  1. I restarted bluetoothd to flush caches or so, it didn't help
  2. I tried to investigate the source and monitored the number of tasks that are running. But the number of tasks is constant in general. (Some tasks appeared for connected devices and then they are stopped).

Actually, I'm empty of ideas how I can stop this growing CPU usage. What else can I check to find out the root of the problem?

@devbis devbis changed the title DBus backend high CPU usage and grows over time DBus backend high CPU usage and it grows over time Apr 1, 2021
@dlech dlech added Backend: BlueZ Issues and PRs relating to the BlueZ backend bug Something isn't working labels Apr 1, 2021
@dlech
Copy link
Collaborator

dlech commented Apr 1, 2021

Can you share the full program?

Does the high CPU usage still happen when there are no devices to connect to? (This could narrow it down to the scanner code vs. the client code.)

@devbis
Copy link
Contributor Author

devbis commented Apr 1, 2021

Yes, sure

https://github.com/devbis/ble2mqtt/

Does the high CPU usage still happen when there are no devices to connect to? (This could narrow it down to the scanner code vs. the client code.)

That's a good idea, I'll run my code for several days with passive devices only to check the scanner code and post the results later.

@devbis
Copy link
Contributor Author

devbis commented Apr 2, 2021

Ran my code for a while in scanning-only mode.
It scans for 3 seconds, then sleeps for 1 sec.
The scanning time CPU usage raised from 30% in the beginning up to 80-95% after 10 hours.
But during sleep time it drops to 3-5% as in the beginning.

I'll try to simplify the code to make a minimal program that just scans devices without complex handling to reduce the impact of my code.

@devbis
Copy link
Contributor Author

devbis commented Apr 2, 2021

Снимок экрана 2021-04-02 в 16 15 34

I checked on the minimal example:
import asyncio as aio
from bleak.backends.device import BLEDevice
from bleak import BleakScanner


def device_detection_callback(device: BLEDevice, advertisement_data):
    print(device, advertisement_data)


async def start():
    while True:
        async with BleakScanner(scanning_mode='passive') as scanner:
            scanner.register_detection_callback(
                device_detection_callback,
            )
            await aio.sleep(3)
            devices = await scanner.get_discovered_devices()
            print('New devices: ', devices)
        await aio.sleep(1)


def main():
    loop = aio.get_event_loop()
    try:
        loop.run_until_complete(start())
    finally:
        loop.close()


if __name__ == '__main__':
    main()

After working for 12 hours it consumes 95% CPU while scanning and then drops to minimal.

@dlech
Copy link
Collaborator

dlech commented Apr 2, 2021

What happens if you rewrite the loop like this?

async def start():
    async with BleakScanner(scanning_mode="passive") as scanner:
        scanner.register_detection_callback(
            device_detection_callback,
        )
        while True:
            await aio.sleep(3)
            await scanner.stop()
            print("New devices: ", scanner.discovered_devices)
            await aio.sleep(1)
            await scanner.start()

@devbis
Copy link
Contributor Author

devbis commented Apr 2, 2021

Started the new code. Will post the details in 10 hours or so.

@devbis
Copy link
Contributor Author

devbis commented Apr 3, 2021

Снимок экрана 2021-04-03 в 09 46 48

After 8 hours of working with updated code peak usage is again about 95%. On the start this value was 25-30%

@dlech
Copy link
Collaborator

dlech commented Apr 3, 2021

Did you run top to confirm that it is, in fact` the Python process that is using most of the CPU?

I've been running the test program for about 24 hours now on my Linux desktop and it is just now sometimes jumping from 0% to 1% of CPU.

Are there any exceptions that are being ignored? Have you run with debug logging enabled?

You could also run sudo dbus-monitor --system if that is available to see if there is an increase in D-Bus traffic between starting the program and some time later.

@devbis
Copy link
Contributor Author

devbis commented Apr 4, 2021

In fact, I'm running the script on Freescale i.MX6ULL CP 528 MHz.It is a single-core ARM7 CPU. I suppose it is not as fast as your Linux desktop. :)

Yes, I'm running htop to check the current CPU usage of processes.

Are there any exceptions that are being ignored?
I run the script as is, no exceptions are ignored

Have you run with debug logging enabled?
No, the sample script doesn't include any logging configuration.

I ran dbus-monitor --system after a day running for 1 minute and dumped it to a file:
https://gist.github.com/devbis/225fcac9f28fd9a3d6aa26c075f329c3

Then I restarted the python script and check CPU usage. It didn't change much.
I restarted bluetoothd and CPU usage reduced.

Then I ran dbus-monitor --system for 1 minute again:
https://gist.github.com/devbis/79dedeb920808ed406e58f5c4d6c9e06

(I wonder can anyone find any clues in the files)

The result file has decreased from 5.8 MB to 700 kb.
I suppose the culprit of the problem is growing traffic from bluetoothd and inefficient processing of DBUS traffic in python.

@dlech
Copy link
Collaborator

dlech commented Apr 5, 2021

Thanks for all of the logs. I have a new theory about what is going on. BlueZ caches information on each device that it sees where a device is identified by its Bluetooth address. However, many Bluetooth Low Energy devices use random addresses that change frequently. So a huge cache of devices builds up in BlueZ, even if there is only one physical device.

Each time we start and stop scanning or connect to a client, we create an new D-Bus connection and call org.freedesktop.DBus.ObjectManager.GetManagedObjects. In your 2nd log (the one with lower CPU usage), the response to this is 1158 lines long. Compared to the 1st log where the response is 53170 lines long, it is easy to see why it would be using significantly more CPU (it is processing >45x the amount of data).

So, I think the solution will be to maintain a single D-Bus bus connection for monitoring D-Bus objects for the lifetime of a Bleak program. However, BleakClient will still need to have it's own bus connection for calling the Connect method to avoid #472 (comment).

I think it would also be worthwhile raising an issue with BlueZ to see if there is a way for it to not cache devices with random addresses.

@devbis
Copy link
Contributor Author

devbis commented Apr 5, 2021

Thanks for looking into it.
I want to try to fork the dbus-next and rewrite the unmarshaller part to plain C for speed. I expect that it can significantly improve the performance of parsing data. This part is not asyncio-based and the code is of a good quality.

It is not related to bleak directly, but I home it will help me solve the problem with the cpu.

@dlech
Copy link
Collaborator

dlech commented Apr 5, 2021

https://github.com/igo95862/python-sdbus/ looks interesting as well.

@rbuffat
Copy link

rbuffat commented Jul 8, 2021

I suspect I observe the same issue on a raspberry pi 4. Is there a workaround available until a permanent solution is implemented?

Bluez 5.50, bleak 0.12.0

scanner

Drop in CPU load is after restarting the container / pi.

import asyncio
import logging
import os
import sys

import aiohttp
from bleak import BleakScanner

root = logging.getLogger()
root.setLevel(logging.INFO)

handler = logging.StreamHandler(sys.stdout)
handler.setLevel(logging.INFO)
formatter = logging.Formatter("%(asctime)s - %(name)s - %(levelname)s - %(message)s")
handler.setFormatter(formatter)
root.addHandler(handler)

TEMPERATURE = "temperature"
HUMIDITY = "humidity"
VDD = "vdd"
PRESSURE = "pressure"
TIMESTAMP = "ts"
BROADCAST_ID = "bid"
DEVICE_ID = "uid"

URL = os.environ["MIDDLEWARE_URL"]
PORT = os.environ["MIDDLEWARE_PORT"]

record_url = f"http://{URL}:{PORT}/recordEvent"
logging.info(f"Send events to {record_url}")


async def send_data(data, session):
    mes_id = int.from_bytes(data[0:2], byteorder="big", signed=False)
    temp_k = int.from_bytes(data[2:4], byteorder="big", signed=False)
    rel_hum_raw = int.from_bytes(data[4:6], byteorder="big", signed=False)
    vdd = int.from_bytes(data[6:8], byteorder="big", signed=False)
    uid = int.from_bytes(data[8:], byteorder="big", signed=False)

    temp = float(temp_k - 27315) / 100.0
    rel_hum = float(rel_hum_raw) / 1000.0

    event = {}
    event[TEMPERATURE] = str(temp)
    event[HUMIDITY] = str(rel_hum)
    event[VDD] = str(vdd)
    event[DEVICE_ID] = str(uid)
    event[BROADCAST_ID] = str(mes_id)

    try:
        async with session.get(record_url, params=event) as _:
            pass
    except Exception as e:
        logging.exception(f"Could not send event to middleware: {e}")


async def run():
    logging.info("Start scanner")
    async with aiohttp.ClientSession() as session:
        while True:
            logging.info("Start scan")
            async with BleakScanner() as scanner:
                await asyncio.sleep(4.0)
                devices = scanner.discovered_devices
            for device in devices:
                if device.name == "SNS":
                    logging.info(f"Discovered [{device.address}] {device.name}")
                    await send_data(
                        device.metadata["manufacturer_data"][65535], session
                    )
            await asyncio.sleep(1.0)


loop = asyncio.get_event_loop()
loop.run_until_complete(run())

FROM python:3.9

COPY . /app
WORKDIR /app

RUN apt-get update && apt-get install -y bluez bluez-tools bluetooth
RUN pip install -r requirements.txt

ENV PYTHONUNBUFFERED 1

ENTRYPOINT sh entrypoint.sh

@hbldh
Copy link
Owner

hbldh commented Jul 8, 2021

@rbuffat Could you run the same code outside of a Docker container, on the same system, and confirm that the problem exists there as well?

@devbis
Copy link
Contributor Author

devbis commented Jul 9, 2021

I periodically restart the bluetoothd with cron several times a day. It helps me to keep the CPU usage at an acceptable level.
But the solution is fixing dbus-next unmarshalling (e.g. rewrite that file in pure C) or write another backend (I tried to use bluepy as the custom backend and it worked for passive scan consuming 1-5% CPU for my hardware but I don't like it's approach)

dlech added a commit that referenced this issue Jul 26, 2022
This fixes a number of bugs by moving the `BleakClient` state management
to the new global BlueZ manager object.

- Calling "GetManagedObjects" each time we connected caused performance
  issues. Fixes #500.
- Calling "ConnectDevice" didn't work as expected and has been removed/
  Fixes #806.
- BleakClient didn't handle "InterfacesRemoved" which resulted in an
  invalid service dictionary in some cases. Fixes #882.
@dlech
Copy link
Collaborator

dlech commented Jul 26, 2022

We finally the the changes in place to fix this. Can you please try #902?

dlech added a commit that referenced this issue Jul 29, 2022
Added
-----

* Added new ``assigned_numbers`` module and ``AdvertisementDataType`` enum.
* Added new ``bluez`` kwarg to ``BleakScanner`` in BlueZ backend.
* Added support for passive scanning in the BlueZ backend. Fixes #606.
* Added option to use cached services, characteristics and descriptors in WinRT backend. Fixes #686.
* Added ``PendingDeprecationWarning`` to use of ``address_type`` as keyword argument. It will be moved into the
  ``winrt`` keyword instead according to #623.
* Added better error message when adapter is not present in BlueZ backend. Fixes #889.

Changed
-------

* Add ``py.typed`` file so mypy discovers Bleak's type annotations.
* UUID descriptions updated to 2022-03-16 assigned numbers document.
* Replace use of deprecated ``asyncio.get_event_loop()`` in Android backend.
* Adjust default timeout for ``read_gatt_char()`` with CoreBluetooth to 10s. Merged #891.
* ``BleakScanner()`` args ``detection_callback`` and ``service_uuids`` are no longer keyword-only.
* ``BleakScanner()`` arg ``scanning_mode`` is no longer Windows-only and is no longer keyword-only.
* All ``BleakScanner()`` instances in BlueZ backend now use common D-Bus object manager.
* Deprecated ``filters`` kwarg in ``BleakScanner`` in BlueZ backend.
* BlueZ version is now checked on first connection instead of import to avoid import side effects. Merged #907.

Fixed
-----

* Documentation fixes.
* On empty characteristic description from WinRT, use the lookup table instead of returning empty string.
* Fixed detection of first advertisement in BlueZ backend. Merged #903.
* Fixed performance issues in BlueZ backend caused by calling "GetManagedObjects" each time a
  ``BleakScanner`` scans or ``BleakClient`` is connected. Fixes #500.
* Fixed not handling "InterfacesRemoved" in ``BleakClient`` in BlueZ backend. Fixes #882.
* Fixed leaking D-Bus socket file descriptors in BlueZ backend. Fixes #805.
@dlech dlech mentioned this issue Jul 29, 2022
dlech added a commit that referenced this issue Jul 29, 2022
Added
-----

* Added new ``assigned_numbers`` module and ``AdvertisementDataType`` enum.
* Added new ``bluez`` kwarg to ``BleakScanner`` in BlueZ backend.
* Added support for passive scanning in the BlueZ backend. Fixes #606.
* Added option to use cached services, characteristics and descriptors in WinRT backend. Fixes #686.
* Added ``PendingDeprecationWarning`` to use of ``address_type`` as keyword argument. It will be moved into the
  ``winrt`` keyword instead according to #623.
* Added better error message when adapter is not present in BlueZ backend. Fixes #889.

Changed
-------

* Add ``py.typed`` file so mypy discovers Bleak's type annotations.
* UUID descriptions updated to 2022-03-16 assigned numbers document.
* Replace use of deprecated ``asyncio.get_event_loop()`` in Android backend.
* Adjust default timeout for ``read_gatt_char()`` with CoreBluetooth to 10s. Merged #891.
* ``BleakScanner()`` args ``detection_callback`` and ``service_uuids`` are no longer keyword-only.
* ``BleakScanner()`` arg ``scanning_mode`` is no longer Windows-only and is no longer keyword-only.
* All ``BleakScanner()`` instances in BlueZ backend now use common D-Bus object manager.
* Deprecated ``filters`` kwarg in ``BleakScanner`` in BlueZ backend.
* BlueZ version is now checked on first connection instead of import to avoid import side effects. Merged #907.

Fixed
-----

* Documentation fixes.
* On empty characteristic description from WinRT, use the lookup table instead of returning empty string.
* Fixed detection of first advertisement in BlueZ backend. Merged #903.
* Fixed performance issues in BlueZ backend caused by calling "GetManagedObjects" each time a
  ``BleakScanner`` scans or ``BleakClient`` is connected. Fixes #500.
* Fixed not handling "InterfacesRemoved" in ``BleakClient`` in BlueZ backend. Fixes #882.
* Fixed leaking D-Bus socket file descriptors in BlueZ backend. Fixes #805.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Backend: BlueZ Issues and PRs relating to the BlueZ backend bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants