-
-
Notifications
You must be signed in to change notification settings - Fork 14.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
NetworkManager's nm-online kills nixos-rebuild #180175
Comments
Possibly it relates to this? #178046 Edit: Nevermind, that commit is not in |
This issue has been mentioned on NixOS Discourse. There might be relevant details there: |
This is related to udev not initializing devices. NetworkManager never completes startup because a WireGuard interface is never initialized by udev. A workaround is just putting the affected device into |
I'm wondering if there is any solution to this. Since this triggers for me always. So, I'm not even sure if my As mentioned in the initial description of the issue:
This never seems to be the case, since the actual message is "Started Network Manager.". EDIT: no idea why, but it went away ... |
I think I'm running into this.
It only happens after I've connected my USB-C dock with an ethernet connection at least once after boot. (Note: I'm running tmpfs on root, so my system should "forget" everything about my dock on reboot.) I'm also running |
@Stale don't you dare! |
Attempt to fix nm-online-service from stalling on tailscale interface. See NixOS/nixpkgs#180175
@ncfavier Any chance you can help with this? |
I don't use NetworkManager so I wouldn't know, but in the case of systemd-networkd there are relevant options under
Warning about failed units is pretty much the last thing that the activation script does, so it's probably fine (but the failure should be fixed, of course). |
BTW I've "fixed" this by setting # udev 250 doesn't reliably reinitialize devices after restart
systemd.services.systemd-udevd.restartIfChanged = false; But this is really an upstream systemd bug. |
Temporarily fixed by disabling nm-wait-online NixOS/nixpkgs#180175
My Ubuntu has:
My latest NixOS (22.05) config has:
based on this definition: systemd.services.NetworkManager-wait-online = {
wantedBy = [ "network-online.target" ];
}; Where is this Also: nixpkgs on fix/teamviewer-service-deps [$]
❯ rg 'nm-online'
[ nothing ] |
@blaggacao: The reference to |
I'd vote for disabling this service until we can make it reliable. It's doing no good currently. |
I've been tripping over this bug for quite some time now and it is annoying for users. As mentioned above, the error can be worked around with: systemd.services.NetworkManager-wait-online.enable = lib.mkForce false;
systemd.services.systemd-networkd-wait-online.enable = lib.mkForce false; I was concernd if there might be other dependencies or services that require this to be enabled, so I grepped through nixpkgs for both. These are the mentions: NetworkManager-wait-online.service
systemd-networkd-wait-online.service
TL;DRUpon first look the usage of these two services seem minimal to me and they are causing more problems that doing good. Agreeing with @domenkozar's proposal, I'd vote to disable them per default. If this is agreed upon, I can submit a PR |
I've been running with that service disabled for 6 months now and have not experienced a single issue. Don't count my voice too heavily, though 😉 ! 👍 |
If we're going to work around this I'd still prefer |
For some reason, that didn't work for me. On rebuild it said "not restarting service" |
Yep confirming what @pinpox said:
Still seeing the issue on HEAD. Disabling wait-online like mentioned previously fixes nixos-rebuild.
|
Is this only fixed when using tailscaled, for users not using tailscaled, they should apply which of the dozens workaround shared here ? Is this really a good solution for new users? Shouldn't something be also added in NetworkManager's module ? Quoting previous comments : |
@AkechiShiro have you observed this without using Tailscale? As far as I understand the issue is tied down to the combination of Tailscale firing up before NetworkManager and depending on it being online, ending up in a deadlock essentially |
The same issue happened to me with wireguard in the past. I'll try the suggested fix |
@supermarin I have yes and I believe I'm not the only one |
Mind posting your configuration.nix in a gist? We should reopen this issue then if this happens outside of just Tailscale & NM. EDIT: do you use wireguard? Tailscale uses wireguard so that could be the lower common denominator |
I do use Wireguard yes, so it could be I have the same issue as @pinpox Regarding my configuration.nix, it is all over the place currently, a minimal flake would be great but I can't provide my whole config at the moment, if I have time I'll try in a VM with a single Wireguard interface setup. |
Can anyone confirm whether they still experience this issue without using tailscaled? |
As recently as two years ago, I had this issue without knowingly using tailscaled. (Posted to #59603.) The problem is perhaps that tailscaled is not the only service that activates Perhaps this issue has since been fixed by an unrelated patch upstream (however that would happen), but I'm not optimistic. Multiple people (for example #180175 (comment), linking to #182449) have pointed that this is probably an upstream bug in systemd/udev. Let me try to get rid of the workaround and see if it is still an issue. |
We did receive an answer from Poettering but I don't know how to properly answer back, if anyone more experienced could pitch in : systemd/systemd#34585 |
The wait will only be enabled on machines with NetworkManager enabled. Closes NixOS#180175 (cherry picked from commit 0d822cc)
This issue has been mentioned on NixOS Discourse. There might be relevant details there: https://discourse.nixos.org/t/rebuild-error-failed-to-start-network-manager-wait-online/41977/5 |
Just ran into this again with tailscaled. |
@Atemu ran into it as well, but already had a running system with tailscale up. Manually shut down tailscale, and rebuild succeeded (and tailscale was back up). Tested with several rebuilds and restarts, works ok so far |
An idea I had on this is that we could perhaps hack around this using a systemd unit that |
Maybe this would help? (I closed because earlier I thought it was obsolete) And I've made a small flake version for testing purpose: https://github.com/inmaldrerah/nixos-extensions At least last time I used this, I didn't have to |
I can try testing with it.
On the next rebuild it should get stuck. I'll try this later today and report back if it reproduces consistently. Note: the Tailscale fix was backported to 24.05 so need to pin nixpkgs to a commit prior to that |
Note that this issue occurs even with the supposed fix. |
….service timing out on generation activation This is caused by nmcli waiting on the mullvad wiregaurd network device to be up, which never happens. The workaround is adding the device, wg0-mullvad, to the unmanaged devices list. Upstream issue is NixOS/nixpkgs#180175 .
Also still experiencing this with tailscaled on nixpkgs unstable If I stop the tailscaled unit I can then rebuild OK, and the service comes back up again. |
Yeah just ran into it as well :( |
I'm still running into this problem. However, I'm not using NetworkManager, but systemd-networkd. Adding the following line to my config seemed to fix things: systemd.services.tailscaled.after = ["systemd-networkd-wait-online.service"]; Perhaps we should also add that line to the tailscale service definition? |
That flake can be used by making an overlay over nixpkgs, Now I haven't been using my flake for a while, but still keep Edit: I checked the code of |
This issue has been mentioned on NixOS Discourse. There might be relevant details there: https://discourse.nixos.org/t/error-network-wait-online-service/57902/3 |
Describe the bug
The systemd service
NetworkManager-wait-online.service
can preventnixos-rebuild
from succeeding:This service runs
nm-online -s -q
, and thenm-online
man page says:I am not familiar with this tool, but my experience is that after my laptop has been up for some time (e.g. days),
nm-online
will often return an error code rather than correctly determine the network is up, thus killing any futurenixos-rebuild
commands.Steps To Reproduce
Steps to reproduce the behavior:
nm-online -s -q
does not return success (not sure how to do this on demand).nixos-rebuild
failure.Expected behavior
nixos-rebuild
should not fail due to an erroneous network check.Additional context
This is tricky as it is not a
nix
issue per se but rather an issue with a presumably flaky systemd service. It is easy enough to disable this service manually:And perhaps this is the best solution. But a number of my coworkers all ran into this issue independently, so I thought it merited an issue for discoverability, if nothing else. My gut reaction is that a flaky check should probably not be required by default, but I don't know enough about this service's importance/fragility to say.
This issue was noticed only recently, both on
nixos-unstable
andnixos-22.05
.Notify maintainers
Metadata
Please run
nix-shell -p nix-info --run "nix-info -m"
and paste the result.Thanks!
The text was updated successfully, but these errors were encountered: