Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

systemd-networkd-wait-online fails with multiple ethernet where one or more is disconnected #2898

Closed
Tracked by #2909
bencorrado opened this issue Sep 26, 2024 · 7 comments · Fixed by kairos-io/packages#1081
Labels
bug Something isn't working triage Add this label to issues that should be triaged and prioretized in the next planning call unconfirmed

Comments

@bencorrado
Copy link
Contributor

Kairos version:
PRETTY_NAME="Ubuntu 24.04.1 LTS"
NAME="Ubuntu"
VERSION_ID="24.04"
VERSION="24.04.1 LTS (Noble Numbat)"
VERSION_CODENAME=noble
ID=ubuntu
ID_LIKE=debian
HOME_URL="https://www.ubuntu.com/"
SUPPORT_URL="https://help.ubuntu.com/"
BUG_REPORT_URL="https://bugs.launchpad.net/ubuntu/"
PRIVACY_POLICY_URL="https://www.ubuntu.com/legal/terms-and-policies/privacy-policy"
UBUNTU_CODENAME=noble
LOGO=ubuntu-logo
KAIROS_BUG_REPORT_URL="https://github.com/kairos-io/kairos/issues"
KAIROS_GITHUB_REPO="kairos-io/kairos"
KAIROS_SOFTWARE_VERSION_PREFIX="k3s"
KAIROS_VERSION="v3.1.3-1-g2daaf78-dirty"
KAIROS_FLAVOR="ubuntu"
KAIROS_TARGETARCH="amd64"
KAIROS_PRETTY_NAME="kairos-standard-ubuntu-24.04 v3.1.3-1-g2daaf78-dirty"
KAIROS_FLAVOR_RELEASE="24.04"
KAIROS_ID="kairos"
KAIROS_ID_LIKE="kairos-standard-ubuntu-24.04"
KAIROS_VERSION_ID="v3.1.3-1-g2daaf78-dirty"
KAIROS_REGISTRY_AND_ORG="quay.io/kairos"
KAIROS_ARTIFACT="kairos-ubuntu-24.04-standard-amd64-generic-v3.1.3-1-g2daaf78-dirty"
KAIROS_VARIANT="standard"
KAIROS_RELEASE="v3.1.3-1-g2daaf78-dirty"
KAIROS_FAMILY="ubuntu"
KAIROS_MODEL="generic"
KAIROS_HOME_URL="https://github.com/kairos-io/kairos"
KAIROS_NAME="kairos-standard-ubuntu-24.04"
KAIROS_IMAGE_REPO="quay.io/kairos/ubuntu:24.04-standard-amd64-generic-v3.1.3-1-g2daaf78-dirty"
KAIROS_IMAGE_LABEL="24.04-standard-amd64-generic-v3.1.3-1-g2daaf78-dirty"

CPU architecture, OS, and Version:
Linux localhost 6.8.0-45-generic #45-Ubuntu SMP PREEMPT_DYNAMIC Fri Aug 30 12:02:04 UTC 2024 x86_64 x86_64 x86_64 GNU/Linux

Describe the bug
In a system with more than one ethernet port, if not all the ethernet ports have a connection, Karios waits for all connections even if one interface is successfully connected.

If system fails to get internet on all Ethernet interfaces that are auto assigned DHCP that system does not complete boot successfully.
This is beacuse systemd-networkd-wait-online to spin and wait for all interfaces to come up.

The interactive installer (and other services waiting on systemd-networkd-wait-online) fail to launch, as the state never goes to online.

This is happening because of the * wildcard that is in /etc/systemd/network/20-dhcp.network

[Match]
Name=en*
[Network]
DHCP=yes
[DHCP]
ClientIdentifier=mac

The way systemd-networkd-wait-online works by default is to wait for all these interfaces to come online. I think we should only require one interface to be online, not all of them to allow the system to proceed to boot normally.

To Reproduce
Attempt to boot a Karios installer image with a system using systemd-networkd on a machine with more than one network interface, where at least one of those interfaces does not have a DHCP server and is not otherwise configured with network config from the cloud-init file.

Expected behavior
If the Kairos system is online with at least one network interface, it should proceed to boot normally. It should only wait on systemd-networkd-wait-online if there are no online interfaces.

Resolution

I was able to add the following to the end of my Dockerfile to patch systemd-networkd-wait-online. This override tells systemd-networkd-wait-online it can use any online interface and does not need to wait for all of them.

# Create override for systemd-networkd-wait-online to use any online interface, not waiting for all of them
RUN mkdir -p /etc/systemd/system/systemd-networkd-wait-online.service.d/ \
  && echo -e "[Service]\nExecStart=\nExecStart=/usr/lib/systemd/systemd-networkd-wait-online --any --ipv4" \
  > /etc/systemd/system/systemd-networkd-wait-online.service.d/override.conf

Ultimately, this should probably be added as an overlay in packages

@bencorrado bencorrado added bug Something isn't working triage Add this label to issues that should be triaged and prioretized in the next planning call unconfirmed labels Sep 26, 2024
bencorrado added a commit to bencorrado/packages that referenced this issue Sep 28, 2024
For systems with more than one unconfigured NIC, this allows the system to see any one interface online as enough to proceed with online status.
Fixes kairos-io/kairos#2898
bencorrado added a commit to bencorrado/packages that referenced this issue Sep 28, 2024
For systems with more than one unconfigured NIC, this allows the system to see any one interface online as enough to proceed with online status.
Fixes kairos-io/kairos#2898

Signed-off-by: Ben Corrado <ben@nerdnode.io>
@Itxaka
Copy link
Member

Itxaka commented Sep 28, 2024

This is a good one indeed. Feels like a bit wrong on the systemd side no? Like systemd-networkd-wait-online should success once at least 1 nic is online, not wait for all of them....feels like its on the systemd side or we are not understanding it correctly and maybe it needs a different config to once it get one up, then it should just continue....

@Itxaka
Copy link
Member

Itxaka commented Sep 28, 2024

ah yes, now I see your PR and does indeed that :D

@Itxaka
Copy link
Member

Itxaka commented Sep 28, 2024

ahh interesting, it will wait for all ifaces to either fail or succeed.

The service systemd-networkd-wait-online.service invokes systemd-networkd-wait-online without any options. Thus, it waits for all managed interfaces to be configured or failed, and for at least one to be online.

@Itxaka
Copy link
Member

Itxaka commented Sep 28, 2024

testing this on a vm with 2 nics, one connected and one not, resulted into the service timing out after 2 minutes.

@jimmykarily jimmykarily moved this to In Progress 🏃 in 🧙Issue tracking board Sep 30, 2024
@jimmykarily jimmykarily moved this from In Progress 🏃 to Under review 🔍 in 🧙Issue tracking board Sep 30, 2024
@Itxaka Itxaka mentioned this issue Oct 3, 2024
42 tasks
@github-project-automation github-project-automation bot moved this from Under review 🔍 to Done ✅ in 🧙Issue tracking board Oct 3, 2024
@Itxaka Itxaka reopened this Oct 3, 2024
@github-project-automation github-project-automation bot moved this from Done ✅ to Under review 🔍 in 🧙Issue tracking board Oct 3, 2024
@Itxaka
Copy link
Member

Itxaka commented Oct 3, 2024

open until framework lands on kairos

@Itxaka
Copy link
Member

Itxaka commented Oct 4, 2024

on master

@Itxaka Itxaka closed this as completed Oct 4, 2024
@github-project-automation github-project-automation bot moved this from Under review 🔍 to Done ✅ in 🧙Issue tracking board Oct 4, 2024
@clyra
Copy link

clyra commented Oct 17, 2024

Hi,

I also stumbled on this, but got it working by adding this to the user-data:

- path: /etc/systemd/system/systemd-networkd-wait-online.service.d/override.conf
          permissions: 0644
          content: |
            [Service]
            ExecStart=
            ExecStart=/usr/lib/systemd/systemd-networkd-wait-online --any

I didnt bother to report because it seemed to be a ubuntu issue, not kairos! Is this the case or the 20-dhcp.network is indeed added by kairos?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working triage Add this label to issues that should be triaged and prioretized in the next planning call unconfirmed
Projects
Archived in project
Development

Successfully merging a pull request may close this issue.

3 participants