Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[20.09] 3-minute delay at boot waiting for wpa_supplicant to restart; may also result in failure to configure the wireless interface #107341

Closed
mebubo opened this issue Dec 21, 2020 · 7 comments
Labels
0.kind: bug Something is broken 6.topic: nixos Issues or PRs affecting NixOS modules, or package usability issues specific to NixOS

Comments

@mebubo
Copy link
Contributor

mebubo commented Dec 21, 2020

Since #104722 (backported to 20.09 as 645b8a3), the boot process hangs for 3 minutes on this message:

[***   ] A start job is running for Wait for udev To Complete Device Initialization

Even a minimal configuration with just this:

  networking.wireless.networks.<ssid>.psk = <psk>;

as the only network-relevant config displays the same behavior.

I got the same delay + message while booting a recent nixos-minimal-20.09 iso.

In addition to the delay, with networking.useNetworkd = true;, the wireless interface is never configured at all it seems.

I haven't yet tried with nixos-unstable.

Reverting 645b8a3 fixes the issue.

Ping @rnhmjoj @doronbehar

@mebubo mebubo added the 0.kind: bug Something is broken label Dec 21, 2020
@mebubo
Copy link
Contributor Author

mebubo commented Dec 21, 2020

Relevant bits of the log

Dec 21 19:52:17 nixos systemd-udevd[521]: wlan0: Spawned process '/run/current-system/systemd/bin/systemctl restart wpa_supplicant.service' [693] is taking longer than 59s to complete
Dec 21 19:52:17 nixos systemd-udevd[497]: wlan0: Worker [521] processing SEQNUM=2598 is taking a long time
Dec 21 19:53:14 nixos systemd[1]: systemd-udev-settle.service: Main process exited, code=exited, status=1/FAILURE
Dec 21 19:53:14 nixos systemd[1]: systemd-udev-settle.service: Failed with result 'exit-code'.
Dec 21 19:53:14 nixos systemd[1]: Failed to start Wait for udev To Complete Device Initialization.
Dec 21 19:53:14 nixos systemd[1]: Dependency failed for Network Service.
Dec 21 19:53:14 nixos systemd[1]: Dependency failed for Wait for Network to be Configured.
Dec 21 19:53:14 nixos systemd[1]: systemd-networkd-wait-online.service: Job systemd-networkd-wait-online.service/start failed with result 'dependency'.
Dec 21 19:53:14 nixos systemd[1]: Dependency failed for Extra networking commands..

Dec 21 19:53:14 nixos systemd[1]: network-local-commands.service: Job network-local-commands.service/start failed with result 'dependency'.
Dec 21 19:53:14 nixos systemd[1]: systemd-networkd.service: Job systemd-networkd.service/start failed with result 'dependency'.
Dec 21 19:53:14 nixos systemd[1]: systemd-udev-settle.service: Consumed 7.418s CPU time, no IP traffic.

Dec 21 19:53:14 nixos systemd[1]: Starting Wait for udev To Complete Device Initialization...
Dec 21 19:53:14 nixos systemd[1]: Condition check resulted in Update is Completed being skipped.
Dec 21 19:53:14 nixos udevadm[699]: systemd-udev-settle.service is deprecated. Please fix systemd-networkd.service not to pull it in.
Dec 21 19:54:17 nixos systemd-udevd[521]: wlan0: Spawned process '/run/current-system/systemd/bin/systemctl restart wpa_supplicant.service' [693] timed out after 2min 59s, killing
Dec 21 19:54:17 nixos systemd-udevd[497]: wlan0: Worker [521] processing SEQNUM=2598 killed
Dec 21 19:54:17 nixos systemd-udevd[497]: Worker [521] terminated by signal 9 (KILL)
Dec 21 19:54:17 nixos systemd-udevd[497]: wlan0: Worker [521] failed
Dec 21 19:54:17 nixos systemd[1]: Finished Wait for udev To Complete Device Initialization.

@veprbl veprbl added the 6.topic: nixos Issues or PRs affecting NixOS modules, or package usability issues specific to NixOS label Dec 21, 2020
@rnhmjoj
Copy link
Contributor

rnhmjoj commented Dec 21, 2020

I think I have an idea but I don't know why it's hanging. Can you try setting this?

systemd.services.systemd-udev-settle.enable = false;

@rnhmjoj
Copy link
Contributor

rnhmjoj commented Dec 22, 2020

It looks like there is a deadlock: udev is waiting for the systemctl restart wpa_supplicant command to complete, but this one is waiting for udev itself to settle.

The most likely cause is dhcpcd wanting systemd-udev-settle.
This should really be removed because it's deprecated and always a troublemaker. For 20.09 I can only revert the commit, there is no other way around.

@rnhmjoj
Copy link
Contributor

rnhmjoj commented Dec 22, 2020

It's not related to wpa_supplicant in particular. It looks like having any service waiting for udev-settle will hang a systemctl restart indefinitely.

rnhmjoj added a commit that referenced this issue Dec 22, 2020
This reverts commit 8f17761.

Attempting to start any service from udev when systemd-udev-settle is
used at all hangs the boot for 2min. See issue #107341.
rnhmjoj added a commit that referenced this issue Dec 22, 2020
This reverts commit 645b8a3.

Attempting to start any service from udev when systemd-udev-settle is
used at all hangs the boot for 2min. See issue #107341.
@rnhmjoj
Copy link
Contributor

rnhmjoj commented Dec 22, 2020

I've reverted both master (9f52d1f) and release-20.09 (9d6d906) for now.

@rnhmjoj rnhmjoj closed this as completed Dec 31, 2020
rnhmjoj added a commit that referenced this issue Feb 20, 2021
systemd-udev-settle is a terrible hack[1] and should never[2] ever[3]
used, seriously it's very bad. It was used as a stop-gap solution for
issue #39069, but thanks to PR #79532 it can be removed now.

[1]: systemd/systemd#7293 (comment)
[2]: #73095
[3]: #107341
rnhmjoj added a commit to rnhmjoj/nixpkgs that referenced this issue Feb 21, 2021
It's a dull and boring day, it's cold outside and I'm stuck at home: let
me tell you the story of systemd-vconsole-setup.

In the beginnings of NixOS[1], systemd-vconsole-setup was a powerful
sysinit.target unit, installed and running at boot to set up fonts
keyboard layouts and even colors of the virtual consoles. If needed, the
service would also be restarted after a configuration change, consoles
were happy and everything was good, well, almost.

Since the service had no way to specify the dependency "ttys are ready",
modesetting could sometimes happen *after* systemd-vconsole-setup had
started, leaving the console in a broken state. So abbradar worked
around that by putting a systemd-udev-settle `After=`.

In the meanwhile, probably realizing their mistake, systemd added a
shiny udev rule to start the systemd-udev-settle at the right time[2].
However, the rule bypassed systemd by directly running the binary
`systemd-udev-settle`, and the service - though still installed - fell
into disuse.

Two years would pass before a good samaritan, seeing the poor jobless
systemd-udev-settle service, decided to give it the coup de grâs[3] by
unlisting it from the installed units.
This, combined with another bug, caused quite a commotion[4] in NixOS;
to see why remember the fact that `WantedBy=` in upstream units doesn't
work[5], so it had to be added manually in cc54211, but while systemd
removed it, the NixOS unit continued to install and restart the service,
making a lot of fuss when switching configuration.

After at least thee different tentative fixes, deedrah realised[6] what
the root cause was and fpletz put the final nail[7] in the coffin of
systemd-udev-settle. The service would never see the light of a boot
again, NixOS would not restart it all the time but thanks to udev
consoles would still get their pretty fonts and playful colors.

The En..

..no, wait! You should ask what came of systemd-udev-settle, first.
And why is the service even around if udev is doing all the work?

Udev-settle, like the deceitful snake that he is, laid hidden for years.
He looks innocuous doesn't it? A little hack. Only until it leaves his
den and a poor user[8] drops dead. Obviously, it serves no purpose, as
the service is not part of the boot process anymore, so let's remove it
for good!

About the service, it may not be useful at boot, but it can be started
to pick up changes in vconsole.conf and set the consoles accordingly.
But wait, this doesn't work anymore: the service is never started at
boot (remember f76d2aa), so switch-to-configuration.pl will not restart
it. Fortunately it can be repaired: here I install a new unit which
does *nothing* on start, but restarts the real service when reloaded.
This perfectly reproduces the original behavior, hopefully without the
original bugs too.

The End?

[1]: NixOS@cc54211
[2]: systemd/systemd@f6ba867#diff-84849fddcef81458f69725dc18c6614aade5c4f41a032b6908ebcf1ee6740636
[3]: systemd/systemd@8125e8d
[4]: https://web.archive.org/web/20180603130107/https://github.com/NixOS/nixpkgs/issues/22470
[5]: NixOS#81138
[6]: https://web.archive.org/web/20180603130107/https://github.com/NixOS/nixpkgs/issues/22470#issuecomment-330930456
[7]: NixOS@f76d2aa
[8]: NixOS#107341
@knuton
Copy link
Contributor

knuton commented Jun 16, 2021

Seeing that #107382 has been merged, can #104722 be applied again or is there a different plan now?

@rnhmjoj
Copy link
Contributor

rnhmjoj commented Jun 16, 2021

No, unless systemd-udev-settle is removed from every service, there is still the possibility of a deadlock like this.
There are a couple of services that still requires it: prominently gnome and zfs. There's nothing I can do besides waiting for upstream to fix the issue.i

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
0.kind: bug Something is broken 6.topic: nixos Issues or PRs affecting NixOS modules, or package usability issues specific to NixOS
Projects
None yet
Development

No branches or pull requests

4 participants