Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

v2023.x: FUTRO running with x86-generic image does not run wgpeerselector after some boots #3173

Open
maurerle opened this issue Jan 26, 2024 · 2 comments

Comments

@maurerle
Copy link
Member

maurerle commented Jan 26, 2024

While migrating our FFAC domain, we have seen some futros behave weird.
Some did not come online, while most did not have any problem.
In total we have about 100 futros and more than 70 are using the x86-generic image (even though the hardware would support x86-64 if i am not mistaken)

I could reproduce this issue on two FUJITSU SIEMENS FUTRO S550 - and suspect that a handful of our devices are offline because of this - and I don't know how many devices would die after a reboot.

Problem:

After booting, the wgpeerselector is not started - so something with this invocation did not work:
https://github.com/ffac/packages/blob/master/net/wgpeerselector/files/lib/netifd/proto/wgpeerselector.sh

Therefore, no mesh-vpn is established.
On some firmwares, it is the boot directly after a sysupgrade, while on others, it occurs after rebooting the first time.

I tried to run wgpeerselector -i wg_mesh --group gluon-mesh-vpn -v on a broken node, which did not help.

Am i affected?

You are, if you can reach the node through its WAN-IP or next-node address and see a missing mesh-vpn section, even though mesh-vpn is active:

image

If this issue does not occur after sysupgrade or after a reboot, it should not appear anytime else..

Fixes

Flash the x86-64 image for your device on top.

use an image built from this commit or earlier: 13a6617

Affected versions

Probably all versions starting with 1b3b121
I also tested v2023.1 (the tag) - which was affected
and 1b3b121 (nearly v2023.1.2) - which was affected.
So curent v2023.1.x is probably affected as well.

Looks like v2023.2.x is not affected as my build from this version worked fine:
bfbefa4

logread of an affected device:
offloader-broken-v2023.1.0-3.txt


Looks like I already tested the versions which are only once commit away from each other and seems to introduce this issue:
13a6617...1b3b121

which is this changeset (if I did not miss anything in my analysis):
openwrt/openwrt@05f7435...a08553b

@maurerle
Copy link
Member Author

In FFAC we updated all x86-generic images with a x86-64 image, which worked well, as the hardware is generelly supported.
Though I would not recommend this step generally, as it will brick true i386 devices if there are any.
We did not seem to have any.

This issue will not be relevant anymore when v2023.1.x runs out of support (probably even if the v2023.1.x branch gets a kernel-bump again) and I don't mind closing it.

But if someone wants to keep this open, just reopen it.

@maurerle
Copy link
Member Author

maurerle commented Jan 29, 2024

I have just seen this issue also occur on a v2023.2.x firmware built bfbefa4 when testing..

This is not far behind v2023.2.x
bfbefa4...v2023.2.x

To reproduce - one should have a Futro with a gluon firmware from a community which uses wgpeerselector to connect meshvpn

  1. run a sysupgrade - check if wgpeerselector works
  2. run a reboot - check if wgpeerselector works

If both works fine, you are probably not affected, though you can not be too sure 🤷

@maurerle maurerle reopened this Jan 29, 2024
@maurerle maurerle changed the title v2023.1.x: FUTRO running with x86-generic image does not run wgpeerselector after some boots v2023.x: FUTRO running with x86-generic image does not run wgpeerselector after some boots Jan 29, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant