Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bluer causes bluetoothd to crash on armv7 #48

Closed
alexmoon opened this issue Sep 17, 2022 · 16 comments
Closed

Bluer causes bluetoothd to crash on armv7 #48

alexmoon opened this issue Sep 17, 2022 · 16 comments
Labels
external caused by bluetoothd or kernel

Comments

@alexmoon
Copy link
Contributor

On a Raspberry Pi 4 running 32-bit Pi OS, the following program will cause bluetoothd to crash when the program exits:

#[tokio::main]
async fn main() -> Result<(), Box<dyn std::error::Error>> {
    {
        let session = bluer::Session::new().await?;
        let adapter = session.default_adapter().await?;
        let _scan = adapter.discover_devices().await?;
    }

    // This delay is just to demonstrate that the crash occurs on program exit, not dropping of bluer objects
    // The crash still occurs with no delay
    tokio::time::sleep(std::time::Duration::from_secs(5)).await;
    Ok(())
}

Running this program causes bluetoothd to crash with an error similar to the following:

Sep 17 17:54:53 raspberrypi bluetoothd[2535]: realloc(): invalid next size

I believe this is the same issue that was previously reported in #9. However, the bluetoothctl program does not cause bluetoothd when performing scans, so there must be something different with bluer's interactions. It is also interesting to note that the crash does not occur until the program exits, even though the bluer objects are already dropped. Also, unlike #9, this crash is occurring on normal program exit, not Ctrl+C.

One thing I note in the DBus logs is that on initialization, a number of DBus matches are added:

method call time=1663451688.404791 sender=:1.206 -> destination=org.freedesktop.DBus serial=2 path=/org/freedesktop/DBus; interface=org.freedesktop.DBus; member=AddMatch
   string "type='signal',sender='org.bluez',interface='org.freedesktop.DBus.ObjectManager',member='InterfacesAdded'"
method return time=1663451688.404874 sender=org.freedesktop.DBus -> destination=:1.206 serial=3 reply_serial=2
method call time=1663451688.406506 sender=:1.206 -> destination=org.freedesktop.DBus serial=3 path=/org/freedesktop/DBus; interface=org.freedesktop.DBus; member=AddMatch
   string "type='signal',sender='org.bluez',interface='org.freedesktop.DBus.ObjectManager',member='InterfacesRemoved'"
method return time=1663451688.406605 sender=org.freedesktop.DBus -> destination=:1.206 serial=4 reply_serial=3
method call time=1663451688.408146 sender=:1.206 -> destination=org.freedesktop.DBus serial=4 path=/org/freedesktop/DBus; interface=org.freedesktop.DBus; member=AddMatch
   string "type='signal',sender='org.bluez',interface='org.freedesktop.DBus.Properties',member='PropertiesChanged'"
method return time=1663451688.408211 sender=org.freedesktop.DBus -> destination=:1.206 serial=5 reply_serial=4
method call time=1663451688.409329 sender=:1.206 -> destination=org.freedesktop.DBus serial=5 path=/org/freedesktop/DBus; interface=org.freedesktop.DBus; member=AddMatch
   string "type='method_call'"
method return time=1663451688.409430 sender=org.freedesktop.DBus -> destination=:1.206 serial=6 reply_serial=5

However, when the objects are dropped, only one of those matches is removed:

method call time=1663451688.577775 sender=:1.206 -> destination=org.freedesktop.DBus serial=11 path=/org/freedesktop/DBus; interface=org.freedesktop.DBus; member=RemoveMatch
   string "type='signal',sender='org.bluez',interface='org.freedesktop.DBus.ObjectManager',member='InterfacesAdded'"
method return time=1663451688.577808 sender=org.freedesktop.DBus -> destination=:1.206 serial=7 reply_serial=11

I don't know enough about DBus or bluez to know if that could cause the problem or if it's just a red herring, but it's the only unusual thing in the DBus logs that jumped out to me.

Versions:

  • bluer: 0.15.0
  • bluez: 5.55-3.1+rpt1
  • bluez-firmware: 1.2-4+rpt8
@alexmoon
Copy link
Contributor Author

A couple of additional notes.

  • The crash still occurs with bluez 5.64.
  • Commenting out the call to "SetDiscoveryFilter" in src/adapter.rs prevents the crash.

@surban
Copy link
Collaborator

surban commented Sep 19, 2022

Please open a bug with BlueZ, i.e. bluetoothd, since it should not crash under any circumstances.

@surban surban added the external caused by bluetoothd or kernel label Sep 19, 2022
@alexmoon
Copy link
Contributor Author

I believe the upstream issue is bluez/bluez#196. I agree that the root problem should be fixed there, but there doesn't seem to have been much movement on that issue since it was opened a year ago. And other clients seem to work fine on armv7, so I thought it might be worth investigating what it is about bluer's messages that cause the daemon to crash. If not, it might be worth documenting that armv7 targets are not supported by bluer until the upstream issue is resolved.

@surban
Copy link
Collaborator

surban commented Sep 19, 2022

I would suggest running bluetoothd under Memcheck and try to track down the bug that way. It should be easy to find that way. Working around the symptoms is usually not a good idea, especially in open source projects when the underlying cause can be fixed by submitting a patch.

@Vudentz
Copy link

Vudentz commented Sep 19, 2022

It is crashing inside libdbus though and I don't see how we could avoid that since it appears to be something internal when dealing with a DBusMessage.

otaviojr added a commit to otaviojr/bluer that referenced this issue May 10, 2023
# This is the 1st commit message:

BLE Passive Scanning

# This is the commit message #2:

monitor

# This is the commit message bluez#3:

monitor

# This is the commit message bluez#4:

monitor

# This is the commit message bluez#5:

monitor

# This is the commit message bluez#6:

monitor

# This is the commit message bluez#7:

monitor

# This is the commit message bluez#8:

monitor

# This is the commit message bluez#9:

monitor

# This is the commit message bluez#10:

monitor

# This is the commit message bluez#11:

monitor

# This is the commit message bluez#12:

monitor

# This is the commit message bluez#13:

monitor

# This is the commit message bluez#14:

monitor

# This is the commit message bluez#15:

monitor

# This is the commit message bluez#16:

monitor

# This is the commit message bluez#17:

monitor

# This is the commit message bluez#18:

monitor

# This is the commit message bluez#19:

monitor

# This is the commit message bluez#20:

monitor

# This is the commit message bluez#21:

monitor

# This is the commit message bluez#22:

monitor

# This is the commit message bluez#23:

monitor

# This is the commit message bluez#24:

monitor

# This is the commit message bluez#25:

monitor

# This is the commit message bluez#26:

monitor

# This is the commit message bluez#27:

monitor

# This is the commit message bluez#28:

monitor

# This is the commit message bluez#29:

monitor

# This is the commit message bluez#30:

monitor

# This is the commit message bluez#31:

monitor

# This is the commit message bluez#32:

monitor

# This is the commit message bluez#33:

monitor

# This is the commit message bluez#34:

monitor

# This is the commit message bluez#35:

monitor

# This is the commit message bluez#36:

monitor

# This is the commit message bluez#37:

monitor

# This is the commit message bluez#38:

monitor

# This is the commit message bluez#39:

monitor

# This is the commit message bluez#40:

monitor

# This is the commit message bluez#41:

monitor

# This is the commit message bluez#42:

monitor

# This is the commit message bluez#43:

monitor

# This is the commit message bluez#44:

monitor

# This is the commit message bluez#45:

monitor

# This is the commit message bluez#46:

monitor

# This is the commit message bluez#47:

monitor

# This is the commit message bluez#48:

monitor

# This is the commit message bluez#49:

monitor

# This is the commit message bluez#50:

monitor

# This is the commit message bluez#51:

monitor

# This is the commit message bluez#52:

monitor

# This is the commit message bluez#53:

monitor

# This is the commit message bluez#54:

monitor

# This is the commit message bluez#55:

monitor

# This is the commit message bluez#56:

monitor

# This is the commit message bluez#57:

monitor

# This is the commit message bluez#58:

monitor

# This is the commit message bluez#59:

monitor

# This is the commit message bluez#60:

monitor

# This is the commit message bluez#61:

monitor

# This is the commit message bluez#62:

monitor
@Mille2525
Copy link

Mille2525 commented Aug 23, 2023

I can confirm that this is still an issue.

After some digging, it seems like blueR does not exit cleanly in any circumstances.
Consider this example program:

fn main() {
    simple_logger::SimpleLogger::new()
        .with_level(log::LevelFilter::Trace)
        .init()
        .unwrap();

    let rt = tokio::runtime::Builder::new_current_thread()
        .enable_time()
        .enable_io()
        .build()
        .expect("failed to build tokio runtime");

    log::info!("Starting async tokio task...");

    let _ = rt.block_on(async {
        let session = bluer::Session::new().await?;
        let adapter = session.default_adapter().await?;
        adapter.set_powered(true).await?;

        log::info!("Starting discovery");
        let event_stream = adapter.discover_devices().await?;
        tokio::time::sleep(tokio::time::Duration::from_secs(1)).await;
        log::info!("Stopping discovery");

        bluer::Result::Ok(())
    });

    //explicit drop and flush to make sure messages are not stuck somewhere
    drop(rt);
    log::logger().flush();
    std::io::stdout().flush().unwrap();
    std::io::stderr().flush().unwrap();
}

For me, this procudes the following log output:

2023-08-23T10:37:28.942Z INFO [us_service] Starting async tokio task...
2023-08-23T10:37:28.943Z TRACE [bluer::session] Connected to D-Bus with unique name :1.28
2023-08-23T10:37:28.944Z TRACE [bluer::session] Starting event loop for :1.28
2023-08-23T10:37:28.945Z TRACE [bluer::adapter] /org/bluez/hci0: org.bluez.Adapter1.Powered := true
2023-08-23T10:37:28.945Z INFO [us_service] Starting discovery
2023-08-23T10:37:28.945Z TRACE [bluer::session] Starting new single session for /org/bluez/hci0
2023-08-23T10:37:28.945Z TRACE [bluer::adapter] /org/bluez/hci0: org.bluez.Adapter1.SetDiscoveryFilter ({"UUIDs": Variant([]), "DuplicateData": Variant(true), "Transport": Variant("auto"), "Discoverable": Variant(false)},)
2023-08-23T10:37:28.946Z TRACE [bluer::adapter] /org/bluez/hci0: org.bluez.Adapter1.SetDiscoveryFilter (...) -> Ok(())
2023-08-23T10:37:28.946Z TRACE [bluer::adapter] /org/bluez/hci0: org.bluez.Adapter1.StartDiscovery ()
2023-08-23T10:37:28.954Z TRACE [bluer::adapter] /org/bluez/hci0: org.bluez.Adapter1.StartDiscovery (...) -> Ok(())
2023-08-23T10:37:28.954Z TRACE [bluer::session] Adding event subscription for /org/bluez/hci0 with child_objects=true
2023-08-23T10:37:28.988Z TRACE [bluer::session] Event: ObjectAdded { object: Path("/org/bluez/hci0/dev_A8_51_AB_12_52_9C\0"), interfaces: {"org.freedesktop.DBus.Properties", "org.freedesktop.DBus.Introspectable", "org.bluez.Device1"} }
2023-08-23T10:37:29.070Z TRACE [bluer::session] Event: ObjectAdded { object: Path("/org/bluez/hci0/dev_B8_BC_5B_71_CA_07\0"), interfaces: {"org.bluez.Device1", "org.freedesktop.DBus.Introspectable", "org.freedesktop.DBus.Properties"} }
2023-08-23T10:37:29.747Z TRACE [bluer::session] Event: ObjectRemoved { object: Path("/org/bluez/hci0/dev_38_18_4C_BB_F1_E0\0"), interfaces: {"org.freedesktop.DBus.Introspectable", "org.bluez.Device1", "org.freedesktop.DBus.Properties"} }
2023-08-23T10:37:29.747Z TRACE [bluer::session] Event: ObjectAdded { object: Path("/org/bluez/hci0/dev_54_15_89_48_F1_45\0"), interfaces: {"org.freedesktop.DBus.Introspectable", "org.bluez.Device1", "org.freedesktop.DBus.Properties"} }
2023-08-23T10:37:29.956Z INFO [us_service] Stopping discovery

When setting the log level to trace, one would expect at least a "StopDiscovery" message on the dbus, which doesnt happen.
This seems to be a new issue because earlier posts of @surban showed log output including cleanup messages.

I tried to dig even deeper into it, and it seems like the whole cleanup routine of "SessionInner" is never executed. From my understanding, at least this code should be executed on program exit:

        tokio::spawn(async move {
            let _ = term_rx.await;
            log::trace!("Terminating single session for {}", &path);
            stop_fn.await;
            let _ = termed_tx.send(());
            log::trace!("Terminated single session for {}", &path);
        });

However, the drop trait of "SessionInner" only calls ".abort" on the tokio Task.

I also noticed that by not assigning the event stream, the cleanup DOES happen. Most likely because in this case the compiler removes the "start_discovery" call alltogether. Maybe this calls causes some circular referencing issue, or the event stream uses a reference for too long. I also tried dropping the event stream manually via "drop" before destroying the adapter/session, without effect.

This seems like quite an important issue. It could also explain other posts, e.g. the "busy" issue in BlueZ after termination.

Edit: When dropping the "event_stream" and waiting for some time (e.g. 1 second) before leaving the scope, the cleanup does happen. Seems like it needs some time to send those messages. Seems weird. What would be the proper way of handling the wait time? Hardcoding a certain time period is not a good solution.

However, even with successfull clean up, the crash on armv7 still happens. This is strange, because the bluetoothctl tool works without issues. For me it still seems like maybe something isnt cleaned up properly by blueR.

@fujexo
Copy link

fujexo commented Aug 24, 2023

On my Raspi, bluetoothd started crashing in the last few days, probably some os package updates introduced that, don't know. After updating all dependencies, the issue was still there. I've now added a DiscoveryFilter which keeps bluetoothd happy so far.

    let filter = DiscoveryFilter {
        transport: DiscoveryTransport::Le,
        ..Default::default()
    };
    adapter.set_discovery_filter(filter).await?;

fujexo/rs-terrarium@f00721c#diff-ce3ebd3348cca1aec72ae2cb137e89396eb56453317c7c8df472f9bec70f7d3bR67-R72

@surban
Copy link
Collaborator

surban commented Aug 28, 2023

I can confirm that this is still an issue.

After some digging, it seems like blueR does not exit cleanly in any circumstances. Consider this example program:

fn main() {
    simple_logger::SimpleLogger::new()
        .with_level(log::LevelFilter::Trace)
        .init()
        .unwrap();

    let rt = tokio::runtime::Builder::new_current_thread()
        .enable_time()
        .enable_io()
        .build()
        .expect("failed to build tokio runtime");

    log::info!("Starting async tokio task...");

    let _ = rt.block_on(async {
        let session = bluer::Session::new().await?;
        let adapter = session.default_adapter().await?;
        adapter.set_powered(true).await?;

        log::info!("Starting discovery");
        let event_stream = adapter.discover_devices().await?;
        tokio::time::sleep(tokio::time::Duration::from_secs(1)).await;
        log::info!("Stopping discovery");

        bluer::Result::Ok(())
    });

    //explicit drop and flush to make sure messages are not stuck somewhere
    drop(rt);
    log::logger().flush();
    std::io::stdout().flush().unwrap();
    std::io::stderr().flush().unwrap();
}

From the Tokio docs:

Shutting down the runtime is done by dropping the value, or calling shutdown_background or shutdown_timeout.

Tasks spawned through Runtime::spawn keep running until they yield. Then they are dropped. They are not guaranteed to run to completion, but might do so if they do not yield until completion.

Dropping the Tokio runtime does not guarantee that all background tasks are run to completion, thus StopDiscovery is never called. Until Rust gets async drop there is no proper way to fix this. A possible workaround would be to add an explicit stop() method on the discovery event stream.

Nevertheless, bluetoothd as a system service should not crash, even if an application exits with a discovery session in progress. Imagine what would happen if your OS kernel crashes, just because a program exited without closing a file descriptor or TCP socket. Thus the bug lies within bluetoothd or libdbus and should be fixed there.

@Mille2525
Copy link

Mille2525 commented Nov 14, 2023

Hey everyone and @surban ,

I have an update on the issue. I spend some days investigating further and found the cause of the problem.
It is indeed an error in the official blueZ implementation.

In blueZ, the scan filter gets parsed when receiving the dbus message. During deserialization, blueZ has to deserialize two booleans (the flags for discoverable and duplicate data in the scan filter). The error here was that dbus always requires booleans to be 4 bytes, but blueZ uses stdbool which on many platforms only has one byte. This leads to a heap corruption of the blueZ service, causing a random crash shortly later.

I have submitted a patch for blueZ here: https://patchwork.kernel.org/project/bluetooth/patch/20231107103507.505581-1-lukas.funke-oss@weidmueller.com/
With the patch, the issue is fixed for me and i can use blueR with blueZ on ARMv7 just fine.

So why does this only happen on ARM? My best guess here is that on x86 the filter struct has more padding bytes after the two boolean flags (which come last in the C struct), so its actually pure luck that it doesnt crash on x86. The code is still wrong on x86, too.

Here is another interesting find:

The crash does not happen when using bluetoothctl. So why is that? After some more digging it turns out that bluetoothctl behaves different from blueR when it comes to setting the scan filter. It does NOT set the two boolean flags in the scan filter if they are set to false, which is the default. The behavior of bluetoothctl seems to be that, if a flag has its default value, its not send at all. However, blueR sends the two flags with a "false" value, causing the deserialization and heap corruption in blueZ.

@surban Even though this issue is theoretically fixed now, I think it might still be worth investigating a "hack fix" for blueR here. Almost all users of blueZ will experience this crash otherwise and I can not tell when the fix will be live. Maybe it would be a possibility to not send the discoverable and duplicate flags at all when they are set to false (just like bluetoothctl does)? Even though this is not blueRs fault, it would at least improve compatbility with blueZ vastly for allmost all users (think e.g. Raspberry Pi projects).
If thats not possible, maybe we can include a big warning for these two flags in regards to ARM / the blueZ verison crahses in the documentation + sources?

@surban
Copy link
Collaborator

surban commented Nov 14, 2023

Thanks for figuring that out @Mille2525!

Could you try with version 0.16.2-pre1 and check if it fixes the issue with an unpatched bluetoothd?

@Mille2525
Copy link

@surban I can check, but currently I only see changes to the release notes and cargo.toml file in that branch. Or am I mistaken? What changed in the code specifically?

@surban
Copy link
Collaborator

surban commented Nov 14, 2023

51a3c30

@Mille2525
Copy link

Mille2525 commented Nov 17, 2023

Hi @surban ,

thanks for the clarification! How did you determine the default values? I see that you assume true and false as defaults for duplicateData and discoverable, respectively.

However, to me it seems like both are "false" by default, at least if I interpret the source code of bluetoothctl correctly (See: https://github.com/bluez/bluez/blob/8b035b70f379e45a32c94579ec8d00e48070c21e/client/main.c#L1251 )

Also: I just noticed that the bluez heap crash patch was accepted into the master! :) So this issue should hopefully be gone in newer bluez versions.

@surban
Copy link
Collaborator

surban commented Nov 17, 2023

I have taken the default values from the D-Bus API documentation. However, from the code it seems that you are right and both default to false.

@surban
Copy link
Collaborator

surban commented Nov 17, 2023

I've changed the default assumption in version 0.17.0-pre1. Please test.

@surban
Copy link
Collaborator

surban commented Feb 1, 2024

Fixed in version 0.17.0.

@surban surban closed this as completed Feb 1, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
external caused by bluetoothd or kernel
Projects
None yet
Development

No branches or pull requests

5 participants