Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bluetooth: bt_conn: Unable to allocate buffer within timeout #43718

Closed
aolowin opened this issue Mar 11, 2022 · 10 comments · Fixed by #46165
Closed

Bluetooth: bt_conn: Unable to allocate buffer within timeout #43718

aolowin opened this issue Mar 11, 2022 · 10 comments · Fixed by #46165
Assignees
Labels
area: Bluetooth Host area: Bluetooth bug The issue is a bug, or the PR is fixing a bug priority: medium Medium impact/importance bug

Comments

@aolowin
Copy link

aolowin commented Mar 11, 2022

Describe the bug
A BLE peripheral can sometimes get into a state where it will no longer send notifications. It seems to occur if connections are repeatedly dropped due to range or antenna issues. It may happen if the connection is lost during discovery but that's a guess.

The following errors are generated:
[00:02:21.702,667] bt_conn: Disconnected while allocating context
[00:02:29.443,542] bt_conn: Unable to allocate buffer within timeout
[00:02:29.443,572] bt_l2cap: Unable to allocate buffer for op 0x12
[00:02:52.495,300] bt_conn: Unable to allocate buffer within timeout
[00:02:52.495,300] bt_att: Unable to allocate buffer for op 0x07

Once it gets into this state only a reboot will fix it. A reset on the central side has no effect.

To Reproduce
Use the central_hr and peripheral_hr sample apps on the nrf52840dk_nrf52840 boards
west build samples/bluetooth/peripheral_hr --build-dir=./build/peripheral_hr -b nrf52840dk_nrf52840
west build samples/bluetooth/central_hr --build-dir=./build/central_hr -b nrf52840dk_nrf52840

For ease of desktop testing it's convenient to use:
CONFIG_BT_CTLR_TX_PWR_MINUS_40=y
on the central_hr device. This allows a disconnect by moving the boards a short distance apart.

Move the boards closer and farther from each other to trigger disconnect/reconnect events and eventually generate the erroneous state.

Expected behavior
A peripheral can cleanly reconnect after a disconnect.

Impact
Serious impact since the peripheral will be unable to communicate without a reboot.

Logs and console output
central_hr:

<snip>
<this is what is normal>

Scanning successfully started
[DEVICE]: E6:56:45:48:5D:4A (random), AD evt type 0, AD data len 11, RSSI -41
[AD]: 1 data_len 1
[AD]: 3 data_len 6
Connected: E6:56:45:48:5D:4A (random)
[ATTRIBUTE] handle 25
[ATTRIBUTE] handle 26
[ATTRIBUTE] handle 28
[SUBSCRIBED]
[NOTIFICATION] data 0x20005b9d length 2
[NOTIFICATION] data 0x20005b9d length 2
[NOTIFICATION] data 0x20005b9d length 2
[NOTIFICATION] data 0x20005b9d length 2
[NOTIFICATION] data 0x20005b9d length 2
[NOTIFICATION] data 0x20005b9d length 2
[NOTIFICATION] data 0x20005b9d length 2
[UNSUBSCRIBED]
Disconnected: E6:56:45:48:5D:4A (random) (reason 0x08)
Scanning successfully started

<snip>
<this is what eventually happens>

[DEVICE]: E6:56:45:48:5D:4A (random), AD evt type 0, AD data len 11, RSSI -18
[AD]: 1 data_len 1
[AD]: 3 data_len 6
Connected: E6:56:45:48:5D:4A (random)
Discover complete
[00:04:51.470,947] <err> bt_att: ATT Timeout
Disconnected: E6:56:45:48:5D:4A (random) (reason 0x22)
Scanning successfully started

peripheral_hr:

*** Booting Zephyr OS build zephyr-v3.0.0-1041-g3ab10f2d52a5  ***
Bluetooth initialized
Advertising successfully started
[00:00:00.257,202] <inf> bt_hci_core: HW Platform: Nordic Semiconductor (0x0002)
[00:00:00.257,232] <inf> bt_hci_core: HW Variant: nRF52x (0x0002)
[00:00:00.257,263] <inf> bt_hci_core: Firmware: Standard Bluetooth controller (0x00) Version 3.0 Build 99
[00:00:00.257,995] <inf> bt_hci_core: Identity: E6:56:45:48:5D:4A (random)
[00:00:00.258,026] <inf> bt_hci_core: HCI: version 5.3 (0x0c) revision 0x0000, manufacturer 0x05f1
[00:00:00.258,056] <inf> bt_hci_core: LMP: version 5.3 (0x0c) subver 0xffff
Connected
[00:00:03.341,705] <inf> hrs: HRS notifications enabled

<snip>

Disconnected (reason 0x08)
[00:02:10.680,572] <inf> hrs: HRS notifications disabled
Connected
[00:02:12.349,761] <inf> hrs: HRS notifications enabled
Disconnected (reason 0x08)
Connected
[00:02:21.700,195] <inf> hrs: HRS notifications disabled
[00:02:21.702,667] <wrn> bt_conn: Disconnected while allocating context
[00:02:29.443,542] <wrn> bt_conn: Unable to allocate buffer within timeout
[00:02:29.443,572] <err> bt_l2cap: Unable to allocate buffer for op 0x12
[00:02:52.495,300] <wrn> bt_conn: Unable to allocate buffer within timeout
[00:02:52.495,300] <err> bt_att: Unable to allocate buffer for op 0x07
Disconnected (reason 0x22)
Connected
[00:03:22.495,391] <wrn> bt_conn: Unable to allocate buffer within timeout
[00:03:22.495,422] <err> bt_att: Unable to allocate buffer for op 0x01
[00:03:29.817,321] <wrn> bt_conn: Unable to allocate buffer within timeout
[00:03:29.817,352] <err> bt_l2cap: Unable to allocate buffer for op 0x12
[00:03:52.869,079] <wrn> bt_conn: Unable to allocate buffer within timeout
[00:03:52.869,110] <err> bt_att: Unable to allocate buffer for op 0x07

Environment (please complete the following information):

  • OS: Linux
  • Toolchain: Zephyr SDK 13.2
  • Commit SHA or Version used Zephyr 3.0: 3ab10f2
@aolowin aolowin added the bug The issue is a bug, or the PR is fixing a bug label Mar 11, 2022
@mbolivar-nordic mbolivar-nordic added area: Bluetooth priority: medium Medium impact/importance bug labels Mar 15, 2022
@carlescufi carlescufi assigned jori-nordic and unassigned alwa-nordic Apr 7, 2022
jori-nordic added a commit to jori-nordic/zephyr that referenced this issue Apr 29, 2022
`att_send_process` doesn't pass the errors up the stack, so if a
disconnection happens during the allocation of a TX context, it will
leak buffers.

Fixes zephyrproject-rtos#43718

Signed-off-by: Jonathan Rico <jonathan.rico@nordicsemi.no>
@cvinayak
Copy link
Contributor

@jori-nordic I will assume you will resume discussion based on the work you are doing here jori-nordic@2215bf0

@aolowin
Copy link
Author

aolowin commented May 13, 2022

@jori-nordic Thank you for tracking this down. Is the fix in your branch likely to be the final solution or will there be other changes before it goes into the main branch?

@jori-nordic
Copy link
Contributor

jori-nordic commented May 23, 2022

I was on vacation, sorry for the delay.

@aolowin the fix I have might have to change a bit, because while the bug still exists in latest upstream, it has different symptoms due to the l2cap work that has been done.

@hermabe has a PR (https://github.com/zephyrproject-rtos/zephyr/pull/45682/files) to fix some issues that were introduced there, and I'll be trying to reproduce this particular bug today with his fixes applied locally.

EDIT: so I haven't been able to reproduce the issue with herman's PR applied locally. I think we can close this when that PR is merged. Could you try to reproduce it after that PR is merged @aolowin ?

@aolowin
Copy link
Author

aolowin commented May 23, 2022

@jori-nordic I'll try and reproduce it after that PR is in. Thanks.

@carlescufi
Copy link
Member

@jori-nordic I'll try and reproduce it after that PR is in. Thanks.

@aolowin would you mind trying to reproduce the issue with the PR applied before it's merged? This way we would know that this is indeed a fix for this issue.

You can check out this branch: https://github.com/hermabe/zephyr/tree/fix/meta_free from the PR.

@aolowin
Copy link
Author

aolowin commented May 26, 2022

I've tested #45682 and it does seem to fix the issue. By keeping the peripheral at the edge of the RF range I was able to force numerous disconnect/reconnect cycles and the peripheral was always able to recover.

There were a few warnings:

Connected
[00:09:34.331,970] <inf> hrs: HRS notifications enabled
Disconnected (reason 0x08)
[00:09:38.373,413] <wrn> bt_att: Unable to allocate ATT TX meta
[00:09:38.373,443] <wrn> bt_gatt: No buffer available to send notification
[00:09:38.482,421] <inf> hrs: HRS notifications disabled
Connected
[00:09:40.669,647] <inf> hrs: HRS notifications enabled

but no errors.

@carlescufi
Copy link
Member

I've tested #45682 and it does seem to fix the issue. By keeping the peripheral at the edge of the RF range I was able to force numerous disconnect/reconnect cycles and the peripheral was always able to recover.

There were a few warnings:

Connected
[00:09:34.331,970] <inf> hrs: HRS notifications enabled
Disconnected (reason 0x08)
[00:09:38.373,413] <wrn> bt_att: Unable to allocate ATT TX meta
[00:09:38.373,443] <wrn> bt_gatt: No buffer available to send notification
[00:09:38.482,421] <inf> hrs: HRS notifications disabled
Connected
[00:09:40.669,647] <inf> hrs: HRS notifications enabled

but no errors.

Thanks for testing. Could you make sure that these warnings do not prevent the stack from sending data once buffers are available again? i.e. this is a recoverable warning that doesn't require rebooting any of the Zephyr-based devices.

@hermabe
Copy link
Member

hermabe commented May 27, 2022

There were a few warnings

I think these warnings are fine. The buffer itself is freed after it is sent to the controller, but the metadata is not freed until the callbacks are called after receiving the num_complete event from the controller. The allocation of the buffer blocks, but the allocation of the metadata does not, so if no metadata could be allocated the warning is printed and -ENOMEM is returned by the GATT API functions.

@aolowin
Copy link
Author

aolowin commented May 27, 2022

I can confirm that the central_hr was able to receive notifications from the peripheral_hr once connected - regardless of whether the warnings occurred. No reboots required.

@danielstuart14
Copy link
Contributor

danielstuart14 commented Mar 15, 2023

@jori-nordic @hermabe any chance of the fixes for this issue being backported to 2.7?
#46165
#48395

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area: Bluetooth Host area: Bluetooth bug The issue is a bug, or the PR is fixing a bug priority: medium Medium impact/importance bug
Projects
None yet
Development

Successfully merging a pull request may close this issue.

8 participants