Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

trace: warning: mdadm: Neither MAILADDR nor PROGRAM has been set. This will cause the mdmon service to crash. #254807

Closed
Ashvith10 opened this issue Sep 12, 2023 · 39 comments · Fixed by #255426
Assignees
Labels
0.kind: bug Something is broken

Comments

@Ashvith10
Copy link
Contributor

Describe the bug

After updating the unstable channel to the recent version, and trying to run nixos-rebuild switch, I get this warning.

Steps To Reproduce

Steps to reproduce the behavior:

  1. Update nix-channel (unstable)
  2. Run nixos-rebuild switch
  3. Trace will show on the console

Expected behavior

-

Screenshots

-

Additional context

I am not sure how this trace could affect the working of my current instance.

Notify maintainers

@Ekleog
@arcnmx
@yu-re-ka

Metadata

Please run nix-shell -p nix-info --run "nix-info -m" and paste the result.

[user@system:~]$ nix-shell -p nix-info --run "nix-info -m"
 - system: `"x86_64-linux"`
 - host os: `Linux 6.1.51, NixOS, 23.11 (Tapir), 23.11pre521611.e56990880811`
 - multi-user?: `yes`
 - sandbox: `yes`
 - version: `nix-env (Nix) 2.17.0`
 - channels(ashvith): `""`
 - channels(root): `"nixos"`
 - nixpkgs: `/nix/var/nix/profiles/per-user/root/channels/nixos`
@Ashvith10 Ashvith10 added the 0.kind: bug Something is broken label Sep 12, 2023
@yu-re-ka
Copy link
Contributor

I also noticed this warning, but why do you think I am a maintainer of an affected module or package? :D

@yu-re-ka
Copy link
Contributor

@ctheune @mweinelt

@Ashvith10
Copy link
Contributor Author

@yu-re-ka oh, apologies. In the history of commits, your name came on the second list, so I thought adding the recent contributors made sense.

@ctheune
Copy link
Contributor

ctheune commented Sep 13, 2023

@Ashvith10 are you using software raid? if so, does your /etc/mdadm.conf show any of the two options? does the mdmonitor service run?

@ctheune ctheune self-assigned this Sep 13, 2023
@Ashvith10
Copy link
Contributor Author

@ctheune I believe that I am not using raid. My configuration file is closest to the default GNOME desktop setup, and with the exception of enabling firewall and using GRUB. Also, /etc/mdadm.conf is empty.

@inmaldrerah
Copy link

I am experiencing this as well. I think it is not GRUB-related, as switching to systemd-boot still trigger this warning.

@ctheune
Copy link
Contributor

ctheune commented Sep 13, 2023

It's interesting that /etc/mdadm.conf does exist, which means something did activate the relevant code. Are you possibly using the installation-device profile?

@ctheune
Copy link
Contributor

ctheune commented Sep 13, 2023

Alternatively, would you mind double checking your hardware-configuration.nix?

@yu-re-ka
Copy link
Contributor

@ctheune the default value of boot.swraid.enable is true for stateVersions older than 23.11

@ctheune
Copy link
Contributor

ctheune commented Sep 13, 2023

Hmm. Interesting. That's a weird combination we end up with ... :) I'll think about that.

@Ashvith10
Copy link
Contributor Author

@yu-re-ka made the right observation. The options boot.swraid.mdadmConf and boot.swraid.enable aren't in 23.05, but in the unstable channel. But I've never touched these settings before.

@Ashvith10
Copy link
Contributor Author

I was able to reproduce the error when this option was set to true. Perhaps boot.swraid.enable should be set to false by default.

@ctheune
Copy link
Contributor

ctheune commented Sep 13, 2023

The issue is that it was enabled silently and by default previously (which is a bit weird because that basically means there are lots and lots and lots of systems out there with a non-functional broken unit an "unclean" overall systemd state ... o_O)

@ctheune
Copy link
Contributor

ctheune commented Sep 13, 2023

Ooooooooh. Maybe those systems aren't really running with broken systemd states. Would you guys mind checking? I think the complexity of the software raid module working on upstream systemd units and upstream udev rules that poke each other might mean that if it's enabled but not running then it won't ever trigger the monitor unit ... maybe ... Brrrrrrr. This thing is a bit of a nightmare.

@pmarreck
Copy link

pmarreck commented Sep 13, 2023

Just noticed this error on a rebuild.

Not sure if related, but I run on ZFS root.

I see in the docs for this option https://search.nixos.org/options?channel=unstable&show=boot.swraid.enable&from=0&size=50&sort=relevance&type=packages&query=swraid that the logic for defaulting it to true is simply a check for whether your stateVersion is older than 23.11. Mine is 22.05, which I guess is why it's true, but I don't think it should be. Not sure why the logic for defaulting this to true has anything to do with the stateVersion (or more particularly I guess, why it ONLY has to do with the stateVersion, and not, say, whether or not I actually have a software RAID defined, or evidence of one...)

Part of the problem is that my (very slightly older) unstable version of NixOS didn't even understand that option and errored, so I basically have to update it and experience potential "unsoundness" just to get to the part where I can configure it to false

Not sure if the Nix language has something like an "only set this option if it's an understood/defined option" function

@imincik
Copy link
Contributor

imincik commented Sep 13, 2023

I get this error message when running nixosTest on package where we definitely don't set any raid array.

@Ashvith10
Copy link
Contributor Author

Maybe those systems aren't really running with broken systemd states. Would you guys mind checking?

Can you share how can we check the same? What should we look for while checking out journalctl?

@yu-re-ka
Copy link
Contributor

I am quite sure this is what happened:

  • boot.swraid.enable used to default to true on all systems (no matter if they were actually using raid or not). This would just mean it supports booting from raid, but nothing would break if there was no raid configured.
  • People noticed that we don't need to ship these units on systems where they are never used/activated, and built a detection into nixos-generate-config to dynamically set boot.swraid.enable in hardware-configuration.nix when required.
  • The default was then changed to false for systems with stateVersion >= 23.11, because older systems that were installed with a version of nixos-generate-config that did not detect and enable the option might depend on it being implicitly enabled
  • The warning was added for all systems where the boot.swraid.enable is true, but additional config options are not set

My suggestion: Add a condition to only show the warning on systems with stateVersion >= 23.11, since we can't assume the users of older systems actually intended to use swraid functionality.

@anund
Copy link
Contributor

anund commented Sep 13, 2023

A link to this issue or a little more text around boot.swraid.enable would also be helpful. The current warning requires reading this issue to understand what's the problem is when your system has no direct references to mdadm.

@nixos-discourse
Copy link

This issue has been mentioned on NixOS Discourse. There might be relevant details there:

https://discourse.nixos.org/t/dict-offline-version/33004/2

mnacamura added a commit to m15a/etc-nixos that referenced this issue Sep 14, 2023
adammitha added a commit to adammitha/nixos-configuration that referenced this issue Sep 15, 2023
ctheune added a commit to ctheune/nixpkgs that referenced this issue Sep 16, 2023
The default just recently changed in 23.11. Users that had
swraid enabled implicitly by NixOS in previous releases got surprised
by warnings even though they do not actually use software RAID.

Fixes NixOS#254807
@r-vdp
Copy link
Contributor

r-vdp commented Sep 16, 2023

Not sure if the Nix language has something like an "only set this option if it's an understood/defined option" function

@pmarreck it's possible, but it's not very pretty. You'd need to inspect the options argument that's passed to your module to see if the option is there, something like

{ options, config, ... }: {
  config.boot.${if options.boot ? swraid then "swraid" else null} = ...;
}

Where we rely on the fact that if the attribute name is null, nix ignores it (as documented in the manual).

yu-re-ka pushed a commit that referenced this issue Sep 16, 2023
The default just recently changed in 23.11. Users that had
swraid enabled implicitly by NixOS in previous releases got surprised
by warnings even though they do not actually use software RAID.

Fixes #254807
@Ashvith10
Copy link
Contributor Author

@yu-re-ka can I remove the boot.swraid.enable option, now that it is disabled by default?

@yu-re-ka
Copy link
Contributor

yu-re-ka commented Sep 16, 2023

@Ashvith10 the way you phrase it: no. you might break your system by doing that.

but also I might be missing context.

@Ashvith10
Copy link
Contributor Author

Ashvith10 commented Sep 16, 2023

@yu-re-ka I'm still not sure if I understand the new changes well enough: Is boot.swraid.enable a compulsory option in the configuration.nix file, starting from 23.11? Because most general users don't want this enabled by default, right? So, naturally, wouldn't it make sense to false as the default value, so as to not use raid, and also avoid the warning?

@dixslyf
Copy link
Member

dixslyf commented Sep 17, 2023

@yu-re-ka I'm still not sure if I understand the new changes well enough: Is boot.swraid.enable a compulsory option in the configuration.nix file, starting from 23.11? Because most general users don't want this enabled by default, right? So, naturally, wouldn't it make sense to false as the default value, so as to not use raid, and also avoid the warning?

No, it's not a compulsory option. It's just enabled by default if your system.stateVersion is older than 23.11. If it's 23.11 or higher, then it's disabled by default.

The PR that resolves this issue makes it so that the warning only shows if boot.swraid was enabled explicitly (and some minimal configuration for mdadm.conf is missing).

If your configuration's boot.swraid is enabled by default (because your system.stateVersion is older than 23.11), you can explicitly disable it if you are sure that your system doesn't require it.

@Izorkin
Copy link
Contributor

Izorkin commented Oct 4, 2023

@ctheune is it possible to add a parameter that disables the monitor service?
I have mdmonitor disabled this way, but a warning pops up:

  systemd.units."mdmonitor.service".enable = false;

@ctheune
Copy link
Contributor

ctheune commented Oct 5, 2023

The problem with that is that the services are enabled/disabled imperatively and dynamically by the udev rules. To clarify: do you have any swraid volumes in your system?

@Ashvith10
Copy link
Contributor Author

Not sure if messing around with the stateVersion has screwed my instance of NixOS, but now I do not get any warning while rebuilding, even after reverting the stateVersion back to 23.05. But earlier, when my stateVersion was untouched at 23.05, I had to manually add the option to disable.

@ctheune
Copy link
Contributor

ctheune commented Oct 5, 2023

We did change the situation so that systems that rely on implicit configuration from older stateVersions will not generate the warning any longer.

@Izorkin
Copy link
Contributor

Izorkin commented Oct 5, 2023

To clarify: do you have any swraid volumes in your system?

Yes, there is one volume.

cat /proc/mdstat

Personalities : [raid1]
md127 : active raid1 sda2[0] sdc2[1]
      2097088 blocks [2/2] [UU]

unused devices: <none>

For a long time in boot logs I encountered an error when starting mdmonitor. In the end I just turned it off.

@ctheune
Copy link
Contributor

ctheune commented Oct 5, 2023

In that case, I'd strongly recommend to actually configure it. It's fine to add a non-sensical email address (nobody@example.com). The dynamic tooling that mdadm does (with a whole suite of pre-defined systemd units and udev rules that interact dynamically) isn't really made to override this from a nixos environment. As its so dynamic we also decided to not re-invent the wheel and try to play continuous catch up with the upstream project by replicating their very intricate work.

@Izorkin
Copy link
Contributor

Izorkin commented Oct 5, 2023

Ok. Just need to add an email address? Or should PROGRAM also be specified?

@ctheune
Copy link
Contributor

ctheune commented Oct 5, 2023

One of either is sufficient. I'd use an email if you don't care.

@Stunkymonkey
Copy link
Contributor

I have a question about installation-device.nix. how would you suggest to not get the warning all the times?

is this something we should fix? is this intentional?

For me having this enabled on a installation device make sense, but having this setup to report errors does not.

@ctheune
Copy link
Contributor

ctheune commented Dec 7, 2023

Yeah, I'd suggest to add a non-functional example email in that config to silence the warning and get the mdadm working in an installation environment.

@yu-re-ka
Copy link
Contributor

yu-re-ka commented Dec 7, 2023

If you don't use raid, then just set boot.swraid.enable = false;

@Stunkymonkey
Copy link
Contributor

@ctheune or is setting the PROGRAM to /dev/null a good idea?

@ctheune
Copy link
Contributor

ctheune commented Dec 9, 2023

Not sure whether /dev/null will cause issues. Maybe /usr/bin/true would be more appropriate here.

@Echaleon
Copy link

Echaleon commented Oct 2, 2024

Not sure if I should open a new issue or not, but I found this issue while running into the same thing. That being said, instead of mdadm logging to an email address or a program, you can also make it log to syslog. Annoyingly though, it cannot be configured to do that from mdadm.conf. I had to override the mdmonitor systemd unit file in a roundabout way (couldn't directly override the ExecStart= command, and the unit file was ending up with two because mdadm packages its own services files instead of the nix module creating them. Might still be a better way to do this though):

  # Override mdmonitor to log to syslog instead of emailing or alerting
  systemd.services."mdmonitor".environment = {
    MDADM_MONITOR_ARGS = "--scan --syslog";
  };

mdmonitor.service works now, without having MAILADDR or PROGRAM specified, and logs to syslog as expected. Perhaps having an option boot.swraid.syslog might be nice to allow this behavior while also supressing the warning? Does allow avoiding redoing too much of the upstream mdadm tools if I understand correctly.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
0.kind: bug Something is broken
Projects
None yet
Development

Successfully merging a pull request may close this issue.