Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

mt76: fix system recovery routine for MT7915 #3436

Draft
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

blocktrron
Copy link
Member

This is a draft-PR to fix the longstanding "Message timed out" errors with mt7915 based radios.

I've had positive experience with this patchset over the last 3 weeks and appreciate feedback from communities which frequently face this problem.

Sometimes, the node itself might crash when attempting recovery. I'm not entirely sure where this is happening, but it seems to be somewhere else in the mac80211 stack. The node comes back online by itself, so i still consider this an improvement over the current situation.

Log from a recovered node:

[1029591.814363] br-client: received packet on bat0 with own address as source address (addr:b8:ec:a3:e1:75:4f, vlan:0)
[1029592.855657] br-client: received packet on bat0 with own address as source address (addr:b8:ec:a3:e1:75:4f, vlan:0)
[1029592.856005] br-client: received packet on bat0 with own address as source address (addr:b8:ec:a3:e1:75:4f, vlan:0)
[1031792.254655] mt7915e 0000:02:00.0: Retry message 0000aded (seq 7)
[1031794.334830] mt7915e 0000:02:00.0: Message 0000aded (seq 7) timeout
[1031794.479827] mt7915e 0000:02:00.0: HW/SW Version: 0x8a108a10, Build Time: 20240429200716a
[1031794.479827] 
[1031794.495189] mt7915e 0000:02:00.0: WM Firmware Version: ____000000, Build Time: 20240429200752
[1031794.518127] mt7915e 0000:02:00.0: WA Firmware Version: DEV_000000, Build Time: 20240429200812
[1031800.085604] ieee80211 phy0: Hardware restart was requested
[1031800.085669] ieee80211 phy1: Hardware restart was requested
[1206189.188440] br-client: received packet on bat0 with own address as source address (addr:b8:ec:a3:e1:75:4f, vlan:0)
[1206189.188719] br-client: received packet on bat0 with own address as source address (addr:b8:ec:a3:e1:75:4f, vlan:0)
[1211397.565504] br-client: received packet on bat0 with own address as source address (addr:b8:ec:a3:e1:75:4f, vlan:0)
[1211397.565790] br-client: received packet on bat0 with own address as source address (addr:b8:ec:a3:e1:75:4f, vlan:0)
[1275209.524711] Ignoring NSS change in VHT Operating Mode Notification from 90:9c:4a:ba:e4:a4 with invalid nss 3
[1278619.260856] br-client: port 6(owe1) entered disabled state
[1278619.298741] mt7915e 0000:02:00.0 owe1 (unregistering): left allmulticast mode
[1278619.298791] mt7915e 0000:02:00.0 owe1 (unregistering): left promiscuous mode

@blocktrron blocktrron force-pushed the mt7915-20250113-restart branch 2 times, most recently from 1afebf7 to 7fbd7f8 Compare February 1, 2025 19:59
@Djfe
Copy link
Contributor

Djfe commented Feb 2, 2025

small note for everyone building this:
If you have an existing build folder and don't increase the makefile variable PKG_RELEASE by one, then mt76 won't be rebuilt with the new patches.
https://github.com/openwrt/openwrt/blob/openwrt-24.10/package/kernel/mt76/Makefile#L4

@Djfe
Copy link
Contributor

Djfe commented Feb 2, 2025

on d-link dap-x1860, covr-x1860 and tp-link archer ax23 (mt7621) I see
Sun Feb 2 07:46:59 2025 kern.err kernel: [ 24.279424] mt7915e 0000:02:00.0: Retry message 000021ed (seq 11)
early during boot now (early as in: before ntp was able to get a sync). Is this relevant? Or is one pretty much none since it doesn't act on it when it's just happening once?

It doesn't show up on a netgear wax202, but that could also be due to a different wireless config.
it shows up on a netgear wax206 (mt7622)
it doesn't show up on filogic (cudy wr3000, nwa50ax pro, netgear wax220)

@blocktrron
Copy link
Member Author

blocktrron commented Feb 2, 2025

You can ignore retries w/o timeouts, this is just requiring adjusting message type specific timeouts.

it doesn't show up on filogic (cudy wr3000, nwa50ax pro, netgear wax220)

This PR is not aimed at those.

@blocktrron blocktrron force-pushed the mt7915-20250113-restart branch from 7fbd7f8 to 22c48f1 Compare February 6, 2025 23:17
@blocktrron blocktrron force-pushed the mt7915-20250113-restart branch from 22c48f1 to 94a7eb1 Compare February 8, 2025 13:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants