Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fixes for the systemd mount generator #9611

Closed
wants to merge 3 commits into from

Conversation

InsanePrawn
Copy link
Contributor

@InsanePrawn InsanePrawn commented Nov 22, 2019

Fixes a typo and two bigger problems with the systemd mount generator:

  • The key load units tried to call sh -c ... directly, which systemd on my Debian 9(!) doesn't allow. It wants an absolute path, so I replaced it with @bindir@/sh -c .... On my system, @bindir@ correctly resolves to /usr/bin, although /bin works just as well since it's a symlink to /usr/bin.
    Do we want @bindir@ or a hardcoded /bin?
  • Generation of load-key .service units for filesystems that don't automatically get mounted. The generator rightly skips generation of .mount units for filesystems that don't get auto mounted, but the old version then also skipped the generation of a load-key unit, even if the dataset is an encryption root.

Motivation and Context

Consider a pool with an unmountable (canmount=off) root dataset as the encryption root and mountable children. To mount the children, one has to load the key for the root dataset. The old generator implementation would not generate a load-key unit for the root dataset, as it has canmount=off set. Therefore, the user would never be prompted for the password (or whatever keysource they selected) on boot, the depending dataset mounts fail, boot continues but it's obviously not functional.

Description

Change summary:

  • Executing sh as @bindir@/sh.
  • Moved the key-load .service generation before the .mount generation and mount property parsing.
  • Applied minor comment fixes.

How Has This Been Tested?

Locally on my machine. Anybody else willing to write tests for this?

Types of changes

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Performance enhancement (non-breaking change which improves efficiency)
  • Code cleanup (non-breaking change which makes code smaller or more readable)
  • Breaking change (fix or feature that would cause existing functionality to change)
  • Documentation (a change to man pages or other documentation)

Checklist:

@behlendorf
Copy link
Contributor

@aerusso would you mind reviewing this.

Do we want @bindir@ or a hardcoded /bin?

In this case a hardcoded /bin is probably best since @bindir@ could be specified as something else when compiling zfs.

@behlendorf behlendorf added the Status: Code Review Needed Ready for review and testing label Nov 22, 2019
@codecov
Copy link

codecov bot commented Nov 23, 2019

Codecov Report

Merging #9611 into master will increase coverage by 0.07%.
The diff coverage is n/a.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #9611      +/-   ##
==========================================
+ Coverage   79.19%   79.26%   +0.07%     
==========================================
  Files         418      418              
  Lines      123531   123531              
==========================================
+ Hits        97828    97923      +95     
+ Misses      25703    25608      -95
Flag Coverage Δ
#kernel 79.88% <ø> (+0.02%) ⬆️
#user 67.03% <ø> (+0.11%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 0c46813...6f29f7f. Read the comment docs.

@InsanePrawn
Copy link
Contributor Author

InsanePrawn commented Nov 23, 2019

So I've started looking deeper into this generator and I've noticed it not only creates .mount units for all auto-mountable datasets (which is fine), it also symlinks them into local-fs.target.wants, which makes systemd try to mount them all and query you for all encryption roots' passphrases. I'm not sure if this is intended behaviour, I will admit it is similar to how zfs mount -al behaves, but I think that getting the system running with a minimum of mounts and keyloads coming from systemd itself and the rest through zfs mount -a might be more desirable, especially since the user can override this by
a) manually symlinking the desired .mount files into local-fs.target.wants
b) adding them as requires to the services that require the mount, if such a mapping can be made
Unproblematic datasets can be handled by a zfs mount -a or the service equivalent, zfs-mount.service

I've prepared such a commit in a different branch.
I've confirmed that with the modified generator from the branch, I'm only prompted for the password of one encryption root instead of all three on my pool and the filesystems mounted by systemd are only the handful required by my systemd service overrides, namely libvirtd and docker, instead of my entire pool including the media collections.
With this fix, enabling the generator alone is no longer a surefire way to fix mount races with services, unless they specify their requirement for the mount. I'm not sure if this change in behaviour is desirable to everyone.
Maybe we want a switch somewhere (/etc/default/zfs comes to mind) to control whether the generator does the symlinking by default?

Copy link
Member

@rlaager rlaager left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The PR, as written, looks good to me. I haven't tested it, but the changes seem correct in principle.

I think that getting the system running with a minimum of mounts and keyloads coming from systemd itself and the rest through zfs mount -a might be more desirable

No, absolutely not. That defeats the entire purpose of the mount generator. systemd needs to be aware of the individual mounts, as this fixes/avoids all sorts of problems. If people do not want the generator, they don't have to use it. But when using the generator, it needs to make mount units for everything that is to be mounted.

Copy link
Member

@rlaager rlaager left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wait, I may have spoken too soon. These changes are fine for the use case presented, but what if there are encryption roots that are intentionally not being mounted? For example, imagine I had per-user encryption for each home directory. In that case, the system administrator is not going to be able to provide the passphrase for each user's home directory.

I think we want to generate the load-key unit for an encryption root if it is to be automounted or if one of its children is to be automounted. We probably need to do this by going through all the mountpoints and making a (uniqued) list of encryption roots, then have a second loop over those encryption roots to write the load-key units.

@rlaager
Copy link
Member

rlaager commented Nov 23, 2019

@aerusso Can you look at this too?

@aerusso
Copy link
Contributor

aerusso commented Nov 23, 2019

Ok, so I'm happy with the typo fix, and the change to the path for sh (which is a whole other can of worms---the documentation for systemd was pretty clear on this issue, but---hey---what works works).

I agree with @rlaager: the purpose of this generator is to fully emulate the behavior of having things in /etc/fstab. If you're interested in speeding up boot, the right way is not to special-case it for ZFS, it's to figure out how to make your dependencies not need local-fs.target, and only depend on the specific mountpoints they need. I understand this is a lot of work, but otherwise we (ZOL) are papering over an issue that needs to be fixed elsewhere (well, basically everywhere). Doing this right can have MAJOR payoff for startup speed, and there's no reason to restrict this to ZFS users.

There's also the whole zpool import zready infrastructure #4943 which needs love (but I digress).

This brings me to the core of this change: unconditionally creating the unit. As @rlaager also correctly points out, we don't want to accidentally start the zfs-load-key-.service. However, my quick re-read of the code is that nothing automatically creates a dependency on $keyloadunit. Can you, @InsanePrawn, please confirm, by testing, this behavior?

I'd say to squash sh business down to a single patch, and (if the above analysis is right) go forward with the merge.

Copy link
Member

@rlaager rlaager left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, so I'm happy with the typo fix, and the change to the path for sh (which is a whole other can of worms---the documentation for systemd was pretty clear on this issue, but---hey---what works works).

It's quite possible that this behavior changed over time (i.e. systemd versions).

This brings me to the core of this change: unconditionally creating the unit. As @rlaager also correctly points out, we don't want to accidentally start the zfs-load-key-.service.

I think you're giving me too much credit, and you have just mentioned a critical point! :) I was worried about creating the load-key unit, but you're right, it's only starting it that is a problem. This simplifies the problem space, as we can simply delegate this to systemd. Specifically, we can always create the load-key unit, but it will only get pulled in when one of the mount units Wants it. That appears to be what this change is doing. So yeah, this should be good.

However, my quick re-read of the code is that nothing automatically creates a dependency on $keyloadunit. Can you, @InsanePrawn, please confirm, by testing, this behavior?

It looks like the mount units will get a wants on it, as the wants variable is modified at line 126.

@aerusso
Copy link
Contributor

aerusso commented Nov 23, 2019

It looks like the mount units will get a wants on it, as the wants variable is modified at line 126.

Yeah, and grepping the source shows this is only ever mentioned in $keyloadunit, which is again only ever Wants= from the mount unit (as you notice). But, you know the famous Knuth quote:

Beware of bugs in the above code; I have only proved it correct, not tried it.

So, yeah, just double check it and squash the sh commits and then LGTM.

@InsanePrawn
Copy link
Contributor Author

As I tried to express earlier: Removing the ln -s line is enough to get the behaviour you guys describe: All mounts and key-load units get generated, both in the original version, this PR and the other branch. In the other branch, only the mounts explicitly required by other units will get actually mounted, and they'll require only their respective key-loads if any.

As I've said, I've tested it works correctly in my setup. Only the required minimum of mounts and their key-load units get started. I'll do some more testing in different setups in VMs.

You can manually blacklist individual key-load units from boot with systemctl mask, if any are getting depended on that you'd rather skip. The depending services will fail but can be started after [loading keys for and] mounting those datasets manually.

So the consensus is that I should pull in the commit from the other branch?

I'd still argue in favour of a 'symlink all the mounts like in older versions' switch somewhere, effectively making a systemd implementation of zfs mount -al for lazy admins and/or distros.

@rlaager
Copy link
Member

rlaager commented Nov 23, 2019

In the other branch, only the mounts explicitly required by other units will get actually mounted

As I said, that sounds wrong. If I create a dataset at /srv/backup and no systemd unit explicitly depends on it, what mounts it? What mounts it if the zfs-mount.service goes away in the future (or is masked)?

It is correct for all local mountpoints to be pulled in by local-fs. That is its point. This is consistent with what happens in /etc/fstab.

The load-key units should NOT be pulled-in by local-fs, but by the mount units. That is what this PR does, right?

So the consensus is that I should pull in the commit from the other branch?

No.

Signed-off-by: InsanePrawn <insane.prawny@gmail.com>
Systemd will ignore units that try to execute programs from non-absolute
paths. Use hardcoded /bin/sh instead.

Signed-off-by: InsanePrawn <insane.prawny@gmail.com>
@InsanePrawn
Copy link
Contributor Author

I did some git wizardry and think/hope i got the path commit right. :)

If you're interested in speeding up boot, the right way is not to special-case it for ZFS, it's to figure out how to make your dependencies not need local-fs.target, and only depend on the specific mountpoints they need. I understand this is a lot of work, but otherwise we (ZOL) are papering over an issue that needs to be fixed elsewhere (well, basically everywhere). Doing this right can have MAJOR payoff for startup speed, and there's no reason to restrict this to ZFS users.

I've got my dependencies down, they're minimal and my system boots fine with my other branch's patch. The problem is that ALL MOUNTABLE ZFS DATASETS are a requirement for local-fs.target without said patch.

As I said, that sounds wrong. If I create a dataset at /srv/backup and no systemd unit explicitly depends on it, what mounts it? What mounts it if the zfs-mount.service goes away in the future (or is masked)?

  • A user can always create their own thing that ends up running zfs mount -a, run it manually, whatever. If we went with my idea to have a toggle for this in /etc/default/zfs or similar, we could even make the old 'symlink all the things to local-fs.target.d' behaviour the default, unless a certain value is found in the config. No config file found? Not the correct value? We fall back to symlinking everything into local-fs.target.d.
  • This behaviour would theoretically also enable you to use another systemd mechanism to mount /srv/backup, effectively supporting the administrator:
    use a srv-backup.automount unit with your generated srv-backup.mount

Also: What if your /srv/backup dataset is an encryption root with a passphrase? What if it's actually /home/tony and that damn tony guy doesn't give you his passphrase? Do you really want systemd to prompt you for all encryption root passphrases that are relevant to any dataset that can be mounted?
Note how zfs mount -a would just quietly skip those, unless you add -l, which is a behaviour i rely on. I don't want to load the key for zroot/abc at boot time, but after I zfs load-key zroot/abc, i expect zfs mount -a to also mount zroot/abc and children. Doesn't work with canmount=noauto and I dread the alternative of keeping an encryption root blacklist up to date via systemctl mask or similar.

It is correct for all local mountpoints to be pulled in by local-fs. That is its point. This is consistent with what happens in /etc/fstab.

This appears to drift into a philosophical question. I have large-ish dataset hierarchies (as in: a lot of datasets with a lot of nesting) and I'd much prefer if [after setting a config value] systemd did the pure minimum explicitly required and let something else handle the rest a little bit later. If nothing else, it means way less spam on my screen when [re]booting :D

The load-key units should NOT be pulled-in by local-fs, but by the mount units. That is what this PR does, right?

Yes and no. The load-key units are only referenced by the .mount units, so only .mount units that get depended on (either through a unit requirement or a symlink) pull in their key-load units. That was always the case, I didn't change the unit definitions at all (except for the /bin/sh fix).

Before the commits in this pr, the generator would skip processing lines of datasets that can't be mounted automatically. This would lead to also skipping generating key-load units for such datasets, which might be required to mount a different dataset. I reordered the code so the key-load unit is generated anyway.
As said before, the unit merely exists and does nothing, unless it is pulled in from somewhere, usually a .mount unit.

So the consensus is that I should pull in the commit from the other branch?

No.

Meh, as I said, I'd be strongly in favour of having this behaviour available, even if it's not the default.

@aerusso
Copy link
Contributor

aerusso commented Nov 23, 2019

@InsanePrawn

I did some git wizardry and think/hope i got the path commit right. :)

Unfortunately, I think your last rebase (or at least what's pushed) clobbered the removal half of the moving of the code (it just duplicates it now).

The problem is that ALL MOUNTABLE ZFS DATASETS are a requirement for local-fs.target without said patch.

The point I'm trying to make is: either those datasets are to be mounted (and local-fs.target is when they should be mounted by), or you don't want them mounted (so set them canmount=noauto). If you want some other services mounted before local-fs.target, make sure those services actually don't need everything in local-fs.target, and drop that dependency (i.e., make those dependencies more fine-tuned). That's a micro-optimization that will benefit all filesystems, not just ZFS---and I recognize it is a lot of work.

Also: What if your /srv/backup dataset is an encryption root with a passphrase? What if it's actually /home/tony and that damn tony guy doesn't give you his passphrase? Do you really want systemd to prompt you for all encryption root passphrases that are relevant to any dataset that can be mounted?

No. Those datasets should be canmount=noauto, and won't have units created. Maybe instead, those mount units should be created, but not WantedBy=local-fs.target. If you don't want those units mounted at boot, don't set canmount=auto.

Note how zfs mount -a would just quietly skip those, unless you add -l, which is a behaviour i rely on. I don't want to load the key for zroot/abc at boot time, but after I zfs load-key zroot/abc, i expect zfs mount -a to also mount zroot/abc and children. Doesn't work with canmount=noauto and I dread the alternative of keeping an encryption root blacklist up to date via systemctl mask or similar.

Well, you have to decide whether you want the units to try to be mounted automatically, and whether or not they should try to load the keys. If you want the keys automatically loaded, then if you mount a dataset automatically, that's what happens. You should be able to mask mount units, though, to get the special case behavior you're asking for here.

This appears to drift into a philosophical question. I have large-ish dataset hierarchies (as in: a lot of datasets with a lot of nesting) and I'd much prefer if [after setting a config value] systemd did the pure minimum explicitly required and let something else handle the rest a little bit later. If nothing else, it means way less spam on my screen when [re]booting :D

I'm sorry, this is a really special case you're asking for: mounting local filesystems after local-fs.target. It doesn't make sense for this kind of specific micro-optimization to be in mainline. It's really a system administration question: just mask the unit and call zfs mount yourself whenever you want in your boot process. Or set it to canmount=noauto, which is really what you're asking for: manual control over when you mount it.

@InsanePrawn InsanePrawn force-pushed the sysdgenerator branch 2 times, most recently from 6a5cc53 to 0bddd62 Compare November 23, 2019 19:04
pathdep="RequiresMountsFor='${p_keyloc#file://}'"
keyloadcmd="@sbindir@/zfs load-key '${dataset}'"
elif [ "${p_keyloc}" = "prompt" ] ; then
keyloadcmd="@bindir@/sh -c 'set -eu;"\
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be /bin/sh now, right?

Previously the generator would skip a dataset if it wasn't mountable by
'zfs mount -a' (legacy/none mountpoint, canmount off/noauto). This also
skipped the generation of key-load units for such datasets, breaking
the dependency handling for mountable child datasets.

Signed-off-by: InsanePrawn <insane.prawny@gmail.com>
@aerusso
Copy link
Contributor

aerusso commented Nov 23, 2019

@InsanePrawn Up to my one comment (the rebase clobbered the /bin/sh change), LGTM.

Also, there's an argument for making the mount units for canmount=noauto, but not Want=ing them anywhere by default. Then the system administrator could play with them as desired. Would that be a reasonable compromise?

@rlaager
Copy link
Member

rlaager commented Nov 23, 2019

We should ideally create units for noauto and not Wants them, as this sounds like it is the behavior of the stock fstab generator. I was going to suggest that as a separate change.

However, if we have two datasets with the same mountpoint and one is noauto and one is “on”, we would have a problem. The unit name comes from the mountpoint (and AFAIK this is required). We would need to give precedence to the auto one. The current code handles this by skipping all noauto mounts. But once we lift that, it will be a problem.

The code currently refuses to overwrite existing mount files. If we changed that to only apply for noauto, then the auto one would overwrite if it came later, but would not be overwritten if it came first. This would achieve the right result without making the generator take two passes.

@aerusso, do you see any problem with this change? Was there a good reason to not overwrite existing mountfiles?

@aerusso
Copy link
Contributor

aerusso commented Nov 23, 2019

I didn't overwrite existing units out of an abundance of caution. I'm happy with the PR as is. If we start talking about multiple datasets with the same mountpoint, I'm starting to think that we should figure out if there is any consensus in the larger community. As a ZFS-user I think in terms of "<dataset> mounted at <mountpoint>," but most Linux users probably think of "the filesystem mounted at <mountpoint>", and the systemd interface is built around that.

I vote LGTM as is, and then have a more thorough discussion for a follow-up patch.

@rlaager
Copy link
Member

rlaager commented Nov 24, 2019

@aerusso Agreed on merging this PR as is. Can you change your review to approved?

Also, I did check (now that I'm not on a cell phone), and the systemd.mount documentation says, "Mount units must be named after the mount point directories they control." So when we have multiple datasets with the same mountpoint (e.g. in root-on-ZFS scenarios, with multiple root filesystems), we can only write out one mount unit. Clearly we need the auto to override the noauto one(s) in that case.

@behlendorf behlendorf added Status: Accepted Ready to integrate (reviewed, tested) and removed Status: Code Review Needed Ready for review and testing labels Nov 27, 2019
behlendorf pushed a commit that referenced this pull request Nov 27, 2019
Systemd will ignore units that try to execute programs from non-absolute
paths. Use hardcoded /bin/sh instead.

Reviewed-by: Antonio Russo <antonio.e.russo@gmail.com>
Reviewed-by: Richard Laager <rlaager@wiktel.com>
Signed-off-by: InsanePrawn <insane.prawny@gmail.com>
Closes #9611
behlendorf pushed a commit that referenced this pull request Nov 27, 2019
Previously the generator would skip a dataset if it wasn't mountable by
'zfs mount -a' (legacy/none mountpoint, canmount off/noauto). This also
skipped the generation of key-load units for such datasets, breaking
the dependency handling for mountable child datasets.

Reviewed-by: Antonio Russo <antonio.e.russo@gmail.com>
Reviewed-by: Richard Laager <rlaager@wiktel.com>
Signed-off-by: InsanePrawn <insane.prawny@gmail.com>
Closes #9611
@behlendorf
Copy link
Contributor

Merged. Thanks for sorting this out.

tonyhutter pushed a commit to tonyhutter/zfs that referenced this pull request Dec 26, 2019
Reviewed-by: Antonio Russo <antonio.e.russo@gmail.com>
Reviewed-by: Richard Laager <rlaager@wiktel.com>
Signed-off-by: InsanePrawn <insane.prawny@gmail.com>
Closes openzfs#9611
tonyhutter pushed a commit to tonyhutter/zfs that referenced this pull request Dec 26, 2019
Systemd will ignore units that try to execute programs from non-absolute
paths. Use hardcoded /bin/sh instead.

Reviewed-by: Antonio Russo <antonio.e.russo@gmail.com>
Reviewed-by: Richard Laager <rlaager@wiktel.com>
Signed-off-by: InsanePrawn <insane.prawny@gmail.com>
Closes openzfs#9611
tonyhutter pushed a commit to tonyhutter/zfs that referenced this pull request Dec 26, 2019
Previously the generator would skip a dataset if it wasn't mountable by
'zfs mount -a' (legacy/none mountpoint, canmount off/noauto). This also
skipped the generation of key-load units for such datasets, breaking
the dependency handling for mountable child datasets.

Reviewed-by: Antonio Russo <antonio.e.russo@gmail.com>
Reviewed-by: Richard Laager <rlaager@wiktel.com>
Signed-off-by: InsanePrawn <insane.prawny@gmail.com>
Closes openzfs#9611
tonyhutter pushed a commit to tonyhutter/zfs that referenced this pull request Dec 27, 2019
Reviewed-by: Antonio Russo <antonio.e.russo@gmail.com>
Reviewed-by: Richard Laager <rlaager@wiktel.com>
Signed-off-by: InsanePrawn <insane.prawny@gmail.com>
Closes openzfs#9611
tonyhutter pushed a commit to tonyhutter/zfs that referenced this pull request Dec 27, 2019
Systemd will ignore units that try to execute programs from non-absolute
paths. Use hardcoded /bin/sh instead.

Reviewed-by: Antonio Russo <antonio.e.russo@gmail.com>
Reviewed-by: Richard Laager <rlaager@wiktel.com>
Signed-off-by: InsanePrawn <insane.prawny@gmail.com>
Closes openzfs#9611
tonyhutter pushed a commit to tonyhutter/zfs that referenced this pull request Dec 27, 2019
Previously the generator would skip a dataset if it wasn't mountable by
'zfs mount -a' (legacy/none mountpoint, canmount off/noauto). This also
skipped the generation of key-load units for such datasets, breaking
the dependency handling for mountable child datasets.

Reviewed-by: Antonio Russo <antonio.e.russo@gmail.com>
Reviewed-by: Richard Laager <rlaager@wiktel.com>
Signed-off-by: InsanePrawn <insane.prawny@gmail.com>
Closes openzfs#9611
tonyhutter pushed a commit that referenced this pull request Jan 23, 2020
Reviewed-by: Antonio Russo <antonio.e.russo@gmail.com>
Reviewed-by: Richard Laager <rlaager@wiktel.com>
Signed-off-by: InsanePrawn <insane.prawny@gmail.com>
Closes #9611
tonyhutter pushed a commit that referenced this pull request Jan 23, 2020
Systemd will ignore units that try to execute programs from non-absolute
paths. Use hardcoded /bin/sh instead.

Reviewed-by: Antonio Russo <antonio.e.russo@gmail.com>
Reviewed-by: Richard Laager <rlaager@wiktel.com>
Signed-off-by: InsanePrawn <insane.prawny@gmail.com>
Closes #9611
tonyhutter pushed a commit that referenced this pull request Jan 23, 2020
Previously the generator would skip a dataset if it wasn't mountable by
'zfs mount -a' (legacy/none mountpoint, canmount off/noauto). This also
skipped the generation of key-load units for such datasets, breaking
the dependency handling for mountable child datasets.

Reviewed-by: Antonio Russo <antonio.e.russo@gmail.com>
Reviewed-by: Richard Laager <rlaager@wiktel.com>
Signed-off-by: InsanePrawn <insane.prawny@gmail.com>
Closes #9611
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Status: Accepted Ready to integrate (reviewed, tested)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants