Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ipq40xx: add MikroTik hAP ac2 support #3037

Closed
wants to merge 1 commit into from

Conversation

robimarko
Copy link
Contributor

@robimarko robimarko commented May 22, 2020

This commit adds support for the MikroTik RouterBOARD RBD52G-5HacD2HnD-TC
(hAP ac²), a indoor dual band, dual-radio 802.11ac
wireless AP with integrated omnidirectional antennae, USB port and five
10/100/1000 Mbps Ethernet ports.

See https://mikrotik.com/product/hap_ac2 for more info.

Specifications:

  • SoC: Qualcomm Atheros IPQ4018
  • RAM: 128 MB
  • Storage: 16 MB NOR
  • Wireless:
    · Built-in IPQ4018 (SoC) 802.11b/g/n 2x2:2, 2.5 dBi antennae
    · Built-in IPQ4018 (SoC) 802.11a/n/ac 2x2:2, 2.5 dBi antennae
  • Ethernet: Built-in IPQ4018 (SoC, QCA8075) , 5x 1000/100/10 port, passive PoE in
  • 1x USB Type A port

Installation:
Boot the initramfs image via TFTP and then flash the sysupgrade
image using "sysupgrade -n"

Signed-off-by: Robert Marko robimarko@gmail.com

@adschm adschm added core packages pull request/issue for core (in-tree) packages target/ipq40xx pull request/issue for ipq40xx target labels May 22, 2020
@robimarko
Copy link
Contributor Author

@f00b4r0 Any remarks?

@f00b4r0
Copy link
Contributor

f00b4r0 commented May 23, 2020

@robimarko besides the fact that the rbcfg changes will soon no longer be necessary if #3032 is accepted, I'll bite the elephant in the room:

Why do we need what looks like a nearly complete loader, most of which appears to be straight copy-paste from u-boot with a questionnable copyright slapped onto unchanged source code?

I haven't yet taken a look at RouterOS image format, but from the quick look I took at the dump of a factory hap-ac2 flash content, it looks like yaffs (like for other routerboot-equipped devices)? If the matter is the kernel compression, why can't lzma-loader suffice (possibly with minor changes)?

I think this is going to either require some in depth explanation of why so much extra code is needed (and frankly at the moment I doubt it is), along with a careful cleanup of the source to restore upstream copyright where applicable, if this new loader is pushed forward.

Anyway, thanks a lot for your efforts, I will take a better look at all this ASAP. I also have a brand new hap-ac2 waiting for OpenWRT, so the incentive to get this clean and working is there 😉

@robimarko
Copy link
Contributor Author

robimarko commented May 23, 2020

Yeah, I know that rbcfg wont be necessary.
It will be dropped if your PR gets merged first.

Well, I did not really dig deep for reasons as bootloader predates my work on Mikrotik in IPQ40xx and was originally made for RB3011(IPQ8064) to not have to use U-boot.
I agree that lzma-loader would be great, for SPI-NOR only devices they are using yaffs while for NAND devices they are packing the kernel into UBIFS.
So, kernel2minor is already used to avoid yaffs.

@f00b4r0
Copy link
Contributor

f00b4r0 commented May 28, 2020

Hi @robimarko, #3032 has been merged, as well as the caldata patch. I've been lagging on looking at this PR but I'll get to it over the weekend hopefully. Stay tuned :)

@robimarko
Copy link
Contributor Author

Great, I will then update this to drop rbcfg

@robimarko
Copy link
Contributor Author

@f00b4r0 I have rebased the PR to current master and dropped rbcfg.
BTW, what to do with 4K sectors?
I can enable it, but the NOR is too large for them to be used.

@f00b4r0
Copy link
Contributor

f00b4r0 commented May 28, 2020

@robimarko 4K_SECTORS can be enabled with a sane 4K_SECTORS_LIMIT (4MB is the default IIRC). Then if everything works fine and partial erase still works, you should see 64K EB on large partitions and 4K EB on the small ones. I've heard reports that this no longer works with kernel after 4.17 (which implements a major rework of the mtd subsystem) but I haven't been able to test this myself yet.

@robimarko
Copy link
Contributor Author

Yeah, I know about the size limit.
It appears that partial writes might actually work, I tried changing the boot protocol and committing that and it worked without destroying the soft_config.

@f00b4r0
Copy link
Contributor

f00b4r0 commented May 29, 2020

Cat you give the output of /proc/mtd? Thanks

@robimarko
Copy link
Contributor Author

Sure,

root@OpenWrt:/# cat /proc/mtd 
dev:    size   erasesize  name
mtd0: 01000000 00001000 "partitions"
mtd1: 0000e000 00001000 "RouterBoot"
mtd2: 00001000 00001000 "hard_config"
mtd3: 00007bbc 00001000 "dtb_config"
mtd4: 00001000 00001000 "soft_config"
mtd5: 00f00000 00001000 "firmware"
mtd6: 00350000 00001000 "kernel"
mtd7: 00bb0000 00001000 "rootfs"
mtd8: 00833000 00001000 "rootfs_data"

@f00b4r0
Copy link
Contributor

f00b4r0 commented May 29, 2020

OK. Did you change CONFIG_MTD_SPI_NOR_USE_4K_SECTORS_LIMIT ?

@robimarko
Copy link
Contributor Author

robimarko commented May 29, 2020

No, its still at 4096K

@f00b4r0
Copy link
Contributor

f00b4r0 commented May 29, 2020

Then there's a bug. As you can see, all mtd partitions including the big ones have 4K EB, whereas with a 4MB LIMIT only the partitions smaller than 4MB should have 4K EB. Maybe @nbd168 can help us here?

@robimarko
Copy link
Contributor Author

Hm, but how is then soft config still working normally?
Should using larger then 4K on it wreck the config

@f00b4r0
Copy link
Contributor

f00b4r0 commented May 29, 2020

Hm, but how is then soft config still working normally?
Should using larger then 4K on it wreck the config

It's working because erasesize is 4K for soft_config (which is good). What's not good is that erasesize is 4K for all partitions. If you use larger than 4K, it won't work because the adjacent partition might be erased together with soft_config.

@robimarko
Copy link
Contributor Author

robimarko commented May 29, 2020

Well, now it looks obvious.
4K on everything is not a good idea, especially for performance

@f00b4r0
Copy link
Contributor

f00b4r0 commented Jun 3, 2020

Just a quick update to let you know that I'm currently investigating #3026 and it might turn out that the intermediary loader may be completely unnecessary after all. I'll report back here once we've clarified things on ath79 (granted, different archs but I suspect routerboot behaves the same regardless).

@robimarko
Copy link
Contributor Author

That would be awesome, getting rid of a bunch of code

@rogerpueyo
Copy link
Contributor

rogerpueyo commented Jun 12, 2020

@robimarko ,

Is sysupgrade actually working for you? I've added support for the SXTsq 5 ac, largely based on this PR, and on my device it does not actually write anything to the "firmware" partition (even if I can correctly write to it with mtd/dd).

I'm booting via TFTP and then performing sysupgrade. If I enter failsafe mode, sysupgrade works; if I go through the whole booting process and then perform sysupgrade, it fails (it does not even touch the flash).

@robimarko
Copy link
Contributor Author

Yes, its working fine.
Have you added your device to platform_pre_upgrade()?

Also, is your SPI NOR recognized properly

@rogerpueyo
Copy link
Contributor

rogerpueyo commented Jun 12, 2020

OK, good to know.

Yes, it's there, right after your device.

And yes, SPI NOR is properly recognized, because I can flash from failsafe. Also files written to the flash overlay persist reboots.

Cool then. Thanks, I don't want to hijack the PR :)

Edit: fixed, I removed one of the items in the DTS which caused sysupgrade to fail.

@f00b4r0
Copy link
Contributor

f00b4r0 commented Jun 13, 2020

@robimarko do you have serial on your hapac2? If so, could you try tftp booting the bare ELF kernel that should be found in build_dir/target*/linux*_ipq40xx/vmlinux.elf ?
It will panic since it doesn't have a DTB or an initramfs, but if it boots up to the panic, that means we don't need the extra loader. Thanks!

@robimarko
Copy link
Contributor Author

robimarko commented Jun 13, 2020

@f00b4r0 Yes, I have serial enabled.
I have tried booting vmlinux.elf and it actually seems to be loading that fine, it will reset the board couple of seconds after booting and no output is present but I would expect that since no DTB is present as well.
I have tried booting the vmlinux-initramfs.elf but its too large and RouterBoot will throw an out of range error.

Serial log:

RouterBOOT booter 6.46.6

RBD52G-5HacD2HnD

CPU frequency: 716 MHz
  Memory size: 128 MiB
 Storage size:  16 MiB

Press any key within 2 seconds to enter setup..
trying dhcp protocol.................. OK
resolved mac address 74:4D:28:87:ED:E1
Gateway: 192.168.2.1
transfer started ............................................................ transfer ok, time=3.32s
setting up elf image... OK
jumping to kernel code
Jumping to kernel

@f00b4r0
Copy link
Contributor

f00b4r0 commented Jun 13, 2020

Thanks! Both datapoints are helpful.
I was faintly hoping that maybe RouterBoot would make some use of the embedded DTB on flash, but clearly it doesn't.

The ARM kernel doesn't support (yet) an appended DTB with an ELF kernel, and routerboot will only boot an ELF binary. And in a DTS world we need to embed the DTB.

Therefore it seems there are two options:

  • add support for ELF appended DTB, as in MIPS (haven't checked if there are already patches to do that or if that's already been NACK'd)
  • build upon lzma-loader/relocate code to pack a raw kernel+appended DTB and boot it (ideally I would aim to use a very simple ELF loader that would just bootstrap a regular self-decompressing kernel+initramfs, i.e. as simple as what relocate is on MIPS. This way all standard kernel compression schemes would be available and we introduce less opportunities for bugs by keeping interaction with the hardware until kernel code is started at an absolute bare minimum).

Either way, I think the current approach is clearly excessive: there is no need to handle DTB/FIT ourselves for instance. I don't think we're out to write a complete bootloader. What we want is to make an image that will work with RouterBoot, IMO.

@f00b4r0
Copy link
Contributor

f00b4r0 commented Jun 13, 2020

I should add that in any case if the limitation on the size of the bootable ELF binary are similar to those found on some MIPS routerboards, we will need to build a very small (as small as possible) install initramfs image, that will not contain all the stuff found on the main sysupgrade image.

IIRC @jow- mentioned that currently the build system doesn't allow this, but I suspect it will be necessary for us (and maybe for other devices as the kernels keep growing).

@robimarko
Copy link
Contributor Author

robimarko commented Sep 29, 2021

@f00b4r0 The only reason I can think of is if ath10k only loads the BDF once and not per each radio.
As it still supports loading the board.bin which is just plain BDF in a file, no encapsulation but it's only for one radio so if it only loads the BDF once it won't work as we need to package BDF for each radio.

@robimarko
Copy link
Contributor Author

@f00b4r0 Do you maybe have your old code for loading this during runtime somewhere?

@f00b4r0
Copy link
Contributor

f00b4r0 commented Oct 9, 2021

@robimarko luckily I do: f00b4r0@bd48e8f

Note the commit message and the likely reason why this doesn't work. Considering the incoming ipq Mikrotik PRs (#4625 #4266 #4055 #3806) plus those which have already been merged, which are all likely to expose the same problem, I hope we can find a "fix".

HTH

@robimarko
Copy link
Contributor Author

Great, I just remembered this as I finally completed the hAP ac3 support after a couple of months and made a PR.
I am sure that MikroTik is breaking the BDF for it at this moment.

But yeah, the commit message confirms what I was afraid of, it only loading BDF once, and that won't work as we have 2 radios.

@f00b4r0
Copy link
Contributor

f00b4r0 commented Oct 9, 2021

@robimarko my memory of my investigations back then is mostly gone, but IIRC the driver can only load 1 BDF from flat board.bin, and needs board2 to load multiple ones.

If that assumption is correct, maybe we can improve the loading script to dynamically provide a board2.bin binary stream instead of the flat board.bin? Maybe that could work?

Alternatively, maybe the driver can be fixed? I'm afraid I don't have much time on my hands right now to be more helpful, but I'll try to get another look at the code later.

@robimarko
Copy link
Contributor Author

@f00b4r0 Hm, it looks like the driver actually does load everything twice as it should as the radios are a separate driver instance.
Removed the board-2.bin and used request_firmware instead of the nowarn version.

[   16.601436] ath10k_ahb a000000.wifi: Direct firmware load for ath10k/fwcfg-ahb-a000000.wifi.txt failed with error -2
[   16.601493] ath10k_ahb a000000.wifi: Falling back to sysfs fallback for: ath10k/fwcfg-ahb-a000000.wifi.txt
[   16.806232] ath10k_ahb a000000.wifi: Direct firmware load for ath10k/pre-cal-ahb-a000000.wifi.bin failed with error -2
[   16.806371] ath10k_ahb a000000.wifi: Falling back to sysfs fallback for: ath10k/pre-cal-ahb-a000000.wifi.bin
[   16.947011] ath10k_ahb a000000.wifi: Direct firmware load for ath10k/QCA4019/hw1.0/ct-firmware-5.bin failed with error -2
[   16.947177] ath10k_ahb a000000.wifi: Falling back to sysfs fallback for: ath10k/QCA4019/hw1.0/ct-firmware-5.bin
[   17.087173] ath10k_ahb a000000.wifi: Direct firmware load for ath10k/QCA4019/hw1.0/ct-firmware-2.bin failed with error -2
[   17.087243] ath10k_ahb a000000.wifi: Falling back to sysfs fallback for: ath10k/QCA4019/hw1.0/ct-firmware-2.bin
[   17.218338] ath10k_ahb a000000.wifi: Direct firmware load for ath10k/QCA4019/hw1.0/firmware-6.bin failed with error -2
[   17.218398] ath10k_ahb a000000.wifi: Falling back to sysfs fallback for: ath10k/QCA4019/hw1.0/firmware-6.bin
[   17.906414] ath10k_ahb a000000.wifi: qca4019 hw1.0 target 0x01000000 chip_id 0x003b00ff sub 0000:0000
[   17.906473] ath10k_ahb a000000.wifi: kconfig debug 0 debugfs 1 tracing 0 dfs 1 testmode 0
[   17.918396] ath10k_ahb a000000.wifi: firmware ver 10.4b-ct-4019-fW-13-5ae337bb1 api 5 features mfp,peer-flow-ctrl,txstatus-noack,wmi-10.x-CT,ratemask-CT,regdump-CT,txrate-CT,flush-all-CT,pingpong-CT,ch-regs-CT,nop-CT,set-special-CT,tx-rc-CT,cust-stats-CT,txrate2-CT,beacon-cb-CT,wmi-block-ack-CT,wmi-bcn-rc-CT crc3b
[   17.970065] ath10k_ahb a000000.wifi: Loading BDF type 0
[   17.970887] ath10k_ahb a000000.wifi: Direct firmware load for ath10k/QCA4019/hw1.0/board-2.bin failed with error -2
[   17.974118] ath10k_ahb a000000.wifi: Falling back to sysfs fallback for: ath10k/QCA4019/hw1.0/board-2.bin
[   18.044345] ath10k_ahb a000000.wifi: Direct firmware load for ath10k/QCA4019/hw1.0/board.bin failed with error -2
[   18.044405] ath10k_ahb a000000.wifi: Falling back to sysfs fallback for: ath10k/QCA4019/hw1.0/board.bin
[   18.173884] ath10k_ahb a000000.wifi: failed to fetch board-2.bin or board.bin from ath10k/QCA4019/hw1.0
[   18.173973] ath10k_ahb a000000.wifi: failed to fetch board file: -12
[   18.182466] ath10k_ahb a000000.wifi: could not probe fw (-12)
[   18.360087] ath10k_ahb a800000.wifi: Direct firmware load for ath10k/fwcfg-ahb-a800000.wifi.txt failed with error -2
[   18.360151] ath10k_ahb a800000.wifi: Falling back to sysfs fallback for: ath10k/fwcfg-ahb-a800000.wifi.txt
[   18.437701] ath10k_ahb a800000.wifi: Direct firmware load for ath10k/pre-cal-ahb-a800000.wifi.bin failed with error -2
[   18.437763] ath10k_ahb a800000.wifi: Falling back to sysfs fallback for: ath10k/pre-cal-ahb-a800000.wifi.bin
[   18.577747] ath10k_ahb a800000.wifi: Direct firmware load for ath10k/QCA4019/hw1.0/ct-firmware-5.bin failed with error -2
[   18.577861] ath10k_ahb a800000.wifi: Falling back to sysfs fallback for: ath10k/QCA4019/hw1.0/ct-firmware-5.bin
[   18.742292] ath10k_ahb a800000.wifi: Direct firmware load for ath10k/QCA4019/hw1.0/ct-firmware-2.bin failed with error -2
[   18.742362] ath10k_ahb a800000.wifi: Falling back to sysfs fallback for: ath10k/QCA4019/hw1.0/ct-firmware-2.bin
[   18.874999] ath10k_ahb a800000.wifi: Direct firmware load for ath10k/QCA4019/hw1.0/firmware-6.bin failed with error -2
[   18.875070] ath10k_ahb a800000.wifi: Falling back to sysfs fallback for: ath10k/QCA4019/hw1.0/firmware-6.bin
[   19.020800] ath10k_ahb a800000.wifi: qca4019 hw1.0 target 0x01000000 chip_id 0x003b00ff sub 0000:0000
[   19.020869] ath10k_ahb a800000.wifi: kconfig debug 0 debugfs 1 tracing 0 dfs 1 testmode 0
[   19.034277] ath10k_ahb a800000.wifi: firmware ver 10.4b-ct-4019-fW-13-5ae337bb1 api 5 features mfp,peer-flow-ctrl,txstatus-noack,wmi-10.x-CT,ratemask-CT,regdump-CT,txrate-CT,flush-all-CT,pingpong-CT,ch-regs-CT,nop-CT,set-special-CT,tx-rc-CT,cust-stats-CT,txrate2-CT,beacon-cb-CT,wmi-block-ack-CT,wmi-bcn-rc-CT crc3b
[   19.084503] ath10k_ahb a800000.wifi: Loading BDF type 0
[   19.084639] ath10k_ahb a800000.wifi: Direct firmware load for ath10k/QCA4019/hw1.0/board-2.bin failed with error -2
[   19.088580] ath10k_ahb a800000.wifi: Falling back to sysfs fallback for: ath10k/QCA4019/hw1.0/board-2.bin
[   19.178353] ath10k_ahb a800000.wifi: Direct firmware load for ath10k/QCA4019/hw1.0/board.bin failed with error -2
[   19.178419] ath10k_ahb a800000.wifi: Falling back to sysfs fallback for: ath10k/QCA4019/hw1.0/board.bin
[   19.307588] ath10k_ahb a800000.wifi: failed to fetch board-2.bin or board.bin from ath10k/QCA4019/hw1.0
[   19.307658] ath10k_ahb a800000.wifi: failed to fetch board file: -12
[   19.316008] ath10k_ahb a800000.wifi: could not probe fw (-12)

So simply passing it via sysfs like is done for caldata should work just fine here as well

@f00b4r0
Copy link
Contributor

f00b4r0 commented Oct 9, 2021

@robimarko interesting. Maybe something changed then since my initial testing. You may want to thoroughly test this because I do remember it not working as intended. I wish I could remember more though, sorry :/

@robimarko
Copy link
Contributor Author

robimarko commented Oct 9, 2021

@f00b4r0 I see the potential issue here, you were serving the same BDF data for both radios as there is no way to differentiate for which radio is the sysfs call meant as it simply requests the same board.bin filename.
Ideally, we would package the BDF-s in the API2 aka board-2.bin on the fly before, but that would require implementing the current Python script in C

It looks that the kernel actually passes the device to the sysfs helper as well, gotta find a way to utilize that.
This is gonna take some digging to figure out how does the script even get triggered.

@robimarko
Copy link
Contributor Author

robimarko commented Oct 9, 2021

@f00b4r0 Ok, so instead of going down this rabbit hole I remembered that caldata is nicely formated in the pre-cal-bus-device.bin format, so why not use that so that you can load API1 BDF-s that are radio specific and it looks to work.

So, it nows just looks for: ath10k/QCA4019/hw1.0/board-ahb-a800000.wifi.bin and we can pass that by sysfs just fine.
Now, I gotta make this pretty and try sending it upstream to ath10k and get some testing on hap ac2 and ac3 devices.

@f00b4r0
Copy link
Contributor

f00b4r0 commented Oct 9, 2021

@f00b4r0 I see the potential issue here, you were serving the same BDF data for both radios as there is no way to differentiate for which radio is the sysfs call meant as it simply requests the same board.bin filename.

Yes, that's it indeed. Now I remember :)

Ideally, we would package the BDF-s in the API2 aka board-2.bin on the fly before, but that would require implementing the current Python script in C

I might be able to help with that if it turns out to be necessary.

It looks that the kernel actually passes the device to the sysfs helper as well, gotta find a way to utilize that. This is gonna take some digging to figure out how does the script even get triggered.

See /sbin/hotplug-call. It receives the kernel hotplug event.

@robimarko
Copy link
Contributor Author

robimarko commented Oct 9, 2021

@f00b4r0 Do you know how can I get the firmware hotplug script to print things?
It just won't print and it annoying me really badly.

Ok, I made it print into a file, and that works weirdly.
And we may have a winner actually, as DEVPATH can tell us from which radio is request.
DEVPATH=/devices/platform/soc/a000000.wifi/firmware/ath10k!QCA4019!hw1.0!board.bin

@f00b4r0
Copy link
Contributor

f00b4r0 commented Oct 9, 2021

@f00b4r0 Do you know how can I get the firmware hotplug script to print things?
It just won't print and it annoying me really badly.

Print to a file in /tmp

@robimarko
Copy link
Contributor Author

robimarko commented Oct 9, 2021

Now, this thing is just annoying me, it fetches the same BDF for both radios despite the full DEVPATH being a condition.
Here is the "code" that I am trying to use:

"ath10k/QCA4019/hw1.0/board.bin")                  
        case "$board" in                                                             
        mikrotik,hap-ac2 |\                                                          
        mikrotik,hap-ac3)                                                            
                wlan_data="/sys/firmware/mikrotik/hard_config/wlan_data"
                case "$DEVPATH" in                                      
                "/devices/platform/soc/a000000.wifi/firmware/ath10k!QCA4019!hw1.0!board.bin") 
                        ( [ -f "$wlan_data" ] && caldata_sysfsload_from_file "$wlan_data" 0x2f20 0x2f20 ) || \
                        ( [ -d "$wlan_data" ] && caldata_sysfsload_from_file "$wlan_data/data_0" 0x2f20 0x2f20 )
                        echo "BDF for $DEVPATH" >> /tmp/hotplug_log                                             
                        ;;                                              
                "/devices/platform/soc/a800000.wifi/firmware/ath10k!QCA4019!hw1.0!board.bin") 
                        ( [ -f "$wlan_data" ] && caldata_sysfsload_from_file "$wlan_data" 0xaf20 0x2f20 ) || \
                        ( [ -d "$wlan_data" ] && caldata_sysfsload_from_file "$wlan_data/data_2" 0x2f20 0x2f20 )
                        echo "BDF for $DEVPATH" >> /tmp/hotplug_log                                             
                        ;;                                                                                      
                *)                                                                                           
                        echo "BDF for $DEVPATH" >> /tmp/hotplug_log                                           
                        ;;                                                                                      
                esac                                                                                            
                ;;                                                                                              
        esac                                                                                       
        ;;

I think that driver modification will be required as I can see it requesting the BDF with the same DEVPATH twice, so no wonder it loads the same one.

root@OpenWrt:/# cat /tmp/hotplug_log 
BDF for /devices/platform/soc/a000000.wifi/firmware/ath10k!QCA4019!hw1.0!board.bin
BDF for /devices/platform/soc/a000000.wifi/firmware/ath10k!QCA4019!hw1.0!board.bin

@f00b4r0
Copy link
Contributor

f00b4r0 commented Oct 9, 2021

That rings a bell. I kinda remember hitting this problem as well.
If you want to start digging in the ath10k code, IIRC start with ath10k_core_fetch_board_data_api_1 in core.c. That's the routine we're hitting when we want it to load flat 1.0 board.bin. Sorry I can't be of more help, I'll hopefully have more time in a week or so.

@robimarko
Copy link
Contributor Author

robimarko commented Oct 9, 2021

Yeah, I already implemented trying to load ath10k/QCA4019/hw1.0/board-ahb-a800000.wifi.bin like caldata in the ath10k_core_fetch_board_data_api_1 first and if that fails then use the default board.bin format

I am just cleaning it up to send upstream as I can see that the CRC of the loaded BDF matches the one packaged in board-2.bin.

@robimarko
Copy link
Contributor Author

robimarko commented Oct 9, 2021

Here is the current version
https://github.com/robimarko/openwrt/tree/mikrotik/hap-ac3-ubi-bdf

ath10k-ct probably needs further clean up due to the stupid logic for copying the BDF name.
Upstream ath10k has nothing of the fwcfg stuff, so its much easier to add it there.

@notr1ch Can you give it a go on your hAP ac2?

@notr1ch
Copy link

notr1ch commented Oct 11, 2021

@robimarko Thank you very much for working on this, I'm happy to report it works great! Both interfaces are now showing other APs with signal strengths equal to RouterOS during a scan, and running as an AP the signal also looks great on my phone. I will leave it running for a while to check stability, but it certainly seems like this was the fix.

Kernel log if it's useful:
https://gist.githubusercontent.com/notr1ch/34b7766fe885bf2c1069f09644e14773/raw/bedd4cbf5698ae8a8592e471797c495fcc5e1ff0/dmesg.txt

If there's any other diagnostic info or commands that would help please don't hesitate to ping me.

For anyone else following along, here's a pre-built initramfs of robimarko's tree for testing (config from OpenWRT 21.02):
https://r-1.ch/openwrt-snapshot-r17730-7c9fa909dc-ipq40xx-mikrotik-mikrotik_hap-ac2-initramfs-kernel.bin

@robimarko
Copy link
Contributor Author

robimarko commented Oct 11, 2021

@notr1ch Awesome, thanks for testing.
You can clearly see that both CRC32 values for BDF-s are different from the packaged board-2.bin

I have sent the patch already to linux-wireless, they deferred it for now(Whatever that means)
I will make a PR in OpenWrt so we can get the discussion moving to get this sorted out as this is only gonna be a bigger issue as more devices are being added.

@f00b4r0
Copy link
Contributor

f00b4r0 commented Oct 11, 2021

@robimarko deferred is ok. It means their current merge window is closed (likely for upcoming rc). It'll move again later.

@robimarko
Copy link
Contributor Author

Great, I thought it was like that.
The upstream ath10k patch is really simple, ath10k-ct is way more messy as it also supports fwcfg and then preserves the BDF name for debugfs, all of which upstream doesn't have.

@notr1ch
Copy link

notr1ch commented Oct 12, 2021

Unfortunately the stability testing didn't go as planned as it randomly rebooted overnight. Could I use a USB serial cable work to get console output or would I have to solder the board? Anything else I could do to get more information?

@robimarko
Copy link
Contributor Author

robimarko commented Oct 12, 2021

You need to solder to the pads, there is a pinout somewhere on the forum.
Also, I doubt that any crashes had anything to do with the BDF-s with them it's either they are right or they are not situation.

@notr1ch
Copy link

notr1ch commented Oct 20, 2021

I was able to reproduce the crash, the device OOMs after enough traffic passed with both interfaces up. It seems like the config already uses ath10k-ct-smallbuffers, but perhaps something is wrong with my build. Is there a way to tell from kernel output if the smallbuffers firmware was loaded instead of regular ath10k-ct?

[112162.111664] kthreadd invoked oom-killer: gfp_mask=0x400dc0(GFP_KERNEL_ACCOUNT|__GFP_ZERO), order=1, oom_score_adj=0
[112162.134104] CPU: 0 PID: 2 Comm: kthreadd Not tainted 5.4.150 #0
[112162.144806] Hardware name: Generic DT based system
[112162.150793] Function entered at [<c030df50>] from [<c030a970>]
[112162.155383] Function entered at [<c030a970>] from [<c08a7e88>]
[112162.161285] Function entered at [<c08a7e88>] from [<c03e3a30>]
[112162.167187] Function entered at [<c03e3a30>] from [<c03e498c>]
[112162.173088] Function entered at [<c03e498c>] from [<c0421198>]
[112162.178993] Function entered at [<c0421198>] from [<c031e7e8>]
[112162.184894] Function entered at [<c031e7e8>] from [<c031ffd4>]
[112162.190796] Function entered at [<c031ffd4>] from [<c03202a4>]
[112162.196700] Function entered at [<c03202a4>] from [<c033fcb0>]
[112162.202603] Function entered at [<c033fcb0>] from [<c03010e8>]
[112162.208506] Exception stack(0xc783bfb0 to 0xc783bff8)
[112162.214411] bfa0:                                     00000000 00000000 00000000 00000000
[112162.219626] bfc0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
[112162.227870] bfe0: 00000000 00000000 00000000 00000000 00000013 00000000
[112162.236149] Mem-Info:
[112162.242999] active_anon:3154 inactive_anon:506 isolated_anon:0
[112162.242999]  active_file:0 inactive_file:0 isolated_file:0
[112162.242999]  unevictable:0 dirty:0 writeback:0 unstable:0
[112162.242999]  slab_reclaimable:575 slab_unreclaimable:2808
[112162.242999]  mapped:186 shmem:3602 pagetables:11 bounce:0
[112162.242999]  free:4726 free_pcp:101 free_cma:0
[112162.255487] Node 0 active_anon:12616kB inactive_anon:2024kB active_file:0kB inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB mapped:744kB dirty:0kB writeback:0kB shmem:14408kB writeback_tmp:0kB unstable:0kB all_unreclaimable? yes
[112162.277801] Normal free:18904kB min:16384kB low:20480kB high:24576kB active_anon:12616kB inactive_anon:2024kB active_file:0kB inactive_file:0kB unevictable:0kB writepending:0kB present:129024kB managed:119832kB mlocked:0kB kernel_stack:544kB pagetables:44kB bounce:0kB free_pcp:404kB local_pcp:0kB free_cma:0kB
[112162.305033] lowmem_reserve[]: 0 0 0
[112162.327262] Normal: 326*4kB (UMH) 152*8kB (UMH) 82*16kB (UMH) 73*32kB (UM) 7*64kB (UM) 2*128kB (UH) 1*256kB (U) 1*512kB (H) 1*1024kB (H) 5*2048kB (UM) 0*4096kB = 18904kB
[112162.331090] 3602 total pagecache pages
[112162.346105] 0 pages in swap cache
[112162.349842] Swap cache stats: add 0, delete 0, find 0/0
[112162.353297] Free swap  = 0kB
[112162.358770] Total swap = 0kB
[112162.361624] 32256 pages RAM
[112162.364574] 0 pages HighMem/MovableOnly
[112162.367540] 2298 pages reserved
[112162.371435] Tasks state (memory values in pages):
[112162.374647] [  pid  ]   uid  tgid total_vm      rss pgtables_bytes swapents oom_score_adj name
[112162.379284] Out of memory and no killable processes...
[112162.387864] Kernel panic - not syncing: System is deadlocked on memory

@robimarko
Copy link
Contributor Author

Maybe debugfs has some limits displayed through which it can be figured out what variant it is though I see no point as small buffers is selected.
It's a build time ath10k-ct option for lower memory devices like this and it uses the same fw.

@john-tho
Copy link
Contributor

I was able to reproduce the crash, the device OOMs after enough traffic passed with both interfaces up. It seems like the config already uses ath10k-ct-smallbuffers, but perhaps something is wrong with my build. Is there a way to tell from kernel output if the smallbuffers firmware was loaded instead of regular ath10k-ct?

Nice catch.

There has very recently been an updated version of ath10k-ct added to OpenWrt, so we may as well retest with that: 1d2bc94

One quirk of the build system is that DEVICE_PACKAGES is not parsed for the initramfs image (squashfs is fine) for BUILDBOT settings: CONFIG_TARGET_MULTI_PROFILE & CONFIG_TARGET_PER_DEVICE_ROOTFS details here: 1984a6b. This means that the initramfs image for hap ac2 can use ath10k-ct (from the ipq40xx Makefile), rather than the smallbuffers variant. If you are compiling for a single device, this does not happen / matter.

The firmware is the same in either case:
ath10k-firmware-qca4019-ct

When I started testing the per-device-boarddata PR on my hapac2 through initramfs (it was using ath10k-ct for my build settings), I also saw the device crash a number of hours after the Wi-Fi was brought up (without need for and traffic / clients connecting). But, I had made a number of other changes, and I had not compared this to the same settings without the per-device-boarddata patches.

You can check the installed ath10k variant, and firmware on device through:
opkg list-installed | grep ath10k

You can check the dmesg ath10k firmware crc32 against the output provided by ath10k-fwencoder against the file on device in /lib/firmware…, or on build host in under build_dir/

This example uses ath10k (not -ct), so the paths and crc32s are different to the ath10k-ct firmwares:

dmesg | grep firmware | grep ath10k
[    9.086102] ath10k_ahb a000000.wifi: firmware ver 10.4-3.6-00140 api 5 features no-p2p,mfp,peer-flow-ctrl,btcoex-param,allows-mesh-bcast,no-ps crc32 ba79b746
[   11.248299] ath10k_ahb a800000.wifi: firmware ver 10.4-3.6-00140 api 5 features no-p2p,mfp,peer-flow-ctrl,btcoex-param,allows-mesh-bcast,no-ps crc32 ba79b746
python2 /mnt/pool_ssd/code/qca-swiss-army-knife/tools/scripts/ath10k/ath10k-fwencoder --info \
/mnt/pool_ssd/code/openwrt/build_dir/target-arm_cortex-a7+neon-vfpv4_musl_eabi/linux-firmware-20210511/ath10k/QCA4019/hw1.0/firmware-5.bin 
FileSize: 583344
FileCRC32: ba79b746
FileMD5: d56944163e5e257cd2b65739d36401f3
FirmwareVersion: 10.4-3.6-00140
Timestamp: 2018-04-17 10:43:38
Features: no-p2p,mfp-support,peer-flow-ctrl,btcoex-param,allows-mesh-bcast,no-ps

@notr1ch
Copy link

notr1ch commented Oct 21, 2021

One quirk of the build system is that DEVICE_PACKAGES is not parsed for the initramfs image (squashfs is fine) for BUILDBOT settings: CONFIG_TARGET_MULTI_PROFILE & CONFIG_TARGET_PER_DEVICE_ROOTFS details here: 1984a6b. This means that the initramfs image for hap ac2 can use ath10k-ct (from the ipq40xx Makefile), rather than the smallbuffers variant. If you are compiling for a single device, this does not happen / matter.

Thanks for the detailed reply, this sounds like the problem. I had copied the build config from the 21.02 release as I wasn't sure how to configure it from scratch and that config has the CONFIG_TARGET_PER_DEVICE_ROOTFS set (and I am also booting from the initramfs image).

I will try changing the ipq40xx Makefile to the smallbuffers variant and see how it goes.

@john-tho
Copy link
Contributor

this sounds like the problem

I am more suspicious of a problem with mac80211, hostapd, or ath10k-ct, as my device crashed after a few hours of wireless AP active with no stations / clients.

A build I installed which was based on b519997 + the per-device-boarddata PR ran fine overnight, and there have been more minor hostapd fixes committed since.

I had copied the build config from the 21.02 release as I wasn't sure how to configure it from scratch

For reference: remove any .config, then work through make menuconfig
Or the following (minimal, non-buildbot) as .config, and run make defconfig

CONFIG_TARGET_ipq40xx=y
CONFIG_TARGET_ipq40xx_mikrotik=y
CONFIG_TARGET_ipq40xx_mikrotik_DEVICE_mikrotik_hap-ac2=y

A minimal config like this can be generated from ./scripts/diffconfig.sh after you finish make menuconfig

@notr1ch
Copy link

notr1ch commented Oct 21, 2021

Even after changing the Makefile, opkg reported that kmod-ath10k-ct was still installed in my initramfs. I removed it and installed kmod-ath10k-ct-smallbuffers instead and reloaded the driver. So far I pushed 100 GB over the 5 GHz AP without any issues (last time it died after around 10 GB). I'll keep an eye on memory use over the next few days in case it is also a leak over time somewhere else.

@adschm
Copy link
Member

adschm commented Oct 23, 2021

initramfs does not account for DEVICE_PACKAGES AFAIK

@notr1ch
Copy link

notr1ch commented Oct 26, 2021

Just wanted to update, the device has been running great since using kmod-ath10k-ct-smallbuffers. I also picked up a hAP ac3 which I will try this build on as well (hopefully without the smallbuffers being necessary).

This pull request was closed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
core packages pull request/issue for core (in-tree) packages target/ipq40xx pull request/issue for ipq40xx target
Projects
None yet
Development

Successfully merging this pull request may close these issues.