Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

nmcli logic in dracut-module-setup.sh fails after NetworkManager version update on specific systems #5

Open
aburmash opened this issue May 7, 2024 · 3 comments

Comments

@aburmash
Copy link

aburmash commented May 7, 2024

When system with ISCSI network disk used for system partition ( / ) is upgraded, and there are packages in the dnf
transaction that trigger kdump initramfs rebuild AND new NetworkManager is installed as an update in the same transaction,
kdump service will fail and kdump initramfs will not be regenerated. This happens for example during update from
NetworkManager 1.44 to 1.46 ( centos-stream case ).

This is caused by a combination of expected NetworkManager behaviour and kdump using nmcli ( NetworkManager client to
manage connections ) to generate kdump initramfs image with network.

After update NetworkManager does NOT restart the daemon ( this is expected and normal as well ), but only reloads the
configuration. That causes the mismatch between running daemon and installed libs and clients ( also expected ).
Certain nmcli commands fail, because new NetworkManager option added in 1.46 is not supported by running 1.44 daemon (
also expected ).

For example this command fails: https://github.com/rhkdump/kdump-utils/blob/main/dracut-module-setup.sh#L262
Error emitted in kdump logs is something like:

Apr 29 20:49:11 test-kexec kdumpctl[193959]: Warning: nmcli (1.46.0) and NetworkManager (1.44.0) versions don't match. Restarting NetworkManager is advised.
Apr 29 20:49:11 test-kexec kdumpctl[193959]: Error: Failed to add 'Wired Connection' connection: connection.autoconnect-ports: unknown property
Apr 29 20:49:11 test-kexec kdumpctl[193949]: dracut: Failed to clone 269caef6-3a85-419e-a645-483f25e94417
Apr 29 20:49:11 test-kexec dracut[189638]: Failed to clone 269caef6-3a85-419e-a645-483f25e94417
Apr 29 20:49:11 test-kexec kdumpctl[189607]: dracut: Failed to install the .nmconnection for ens300f0np0
Apr 29 20:49:11 test-kexec dracut[189638]: Failed to install the .nmconnection for ens300f0np0
Apr 29 20:49:11 test-kexec kdumpctl[189324]: kdump: mkdumprd: failed to make kdump initrd
Apr 29 20:49:11 test-kexec kdumpctl[189324]: kdump: Starting kdump: [FAILED]

This is caused by the fact that running NM daemon 1.44 has no idea about a NEW 1.46 option
autoconnect-ports
( that was added in 1.46, but was missing in 1.44 )

nmcli also emits a message suggesting restarting NetworkManager - "Warning: nmcli (1.46.0) and NetworkManager (1.44.0)
versions don't match. Restarting NetworkManager is advised." This message is seen in journalctl logs and kdump service logs.

kdump running those nmcli commands will fail to do so and fail the service. During the update that triggers both updates of packages that trigger kdump rebuld and NM update user may end up with non-functional kdump up until reboot or NM + kdump service restart.

Not sure what should be the proper fix.
From one point of view, "restart your system or restart NM + kdump" is a solution, but is there a better suggestion to look into ?

@coiby
Copy link
Member

coiby commented May 13, 2024

Hi @aburmash,

Thanks for reporting this issue! I'm curious to ask why do you think that NetworkManager daemon doesn't get restarted after the update "is expected and normal as well"? What's the risk of auto-restart NM after updating NM?

@aburmash
Copy link
Author

Potential network loss. LAN connections are likely to persist through NM restart, but some connections may be interrupted during restart for a while, which specifically is a problem for enterprise environment. Most ( if not all ) RH based distros follow RH logic and reload NM instead of restarting NM.
Change in NM package itself goes back to https://bugzilla.redhat.com/show_bug.cgi?id=811200

I believe the main problem itself is not just the requirement to restart NM + kdump or reboot the machine, the problem is that when you do yum update, that pulls in NM with similar change AND also some package that triggers kdump initramfs rebuild, kdump service is triggered in background ( because of monitored files change ), not as a post-script, so when kdump service post/during yum update fails, you actually have no idea that kdump is not OK, unless you specifically decide to inspect journalctl / systemctl status kdump.

@coiby
Copy link
Member

coiby commented Jun 3, 2024

Thanks for explaining the side-effect of restarting NM to me!

I believe kdump.service gets started because kexec-tools (kdump-utils) itself gets updated and the following scriptlet is triggered

%postun
%systemd_postun_with_restart kdump.service

Can you confirm it? If that's the case, maybe we can add an exception for this case where "Warning: nmcli (1.46.0) and NetworkManager (1.44.0) versions don't match. Restarting NetworkManager is advised" is detected.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants