Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Correctly importing ZFS pools, and nixos-generate-config perl-ness #42178

Closed
Baughn opened this issue Jun 18, 2018 · 13 comments
Closed

Correctly importing ZFS pools, and nixos-generate-config perl-ness #42178

Baughn opened this issue Jun 18, 2018 · 13 comments
Labels
2.status: stale https://github.com/NixOS/nixpkgs/blob/master/.github/STALE-BOT.md

Comments

@Baughn
Copy link
Contributor

Baughn commented Jun 18, 2018

Issue description

It's a long story, so bear with me for a minute. There's the ZFS pool issue; and then there's what I did in an attempt to fix it.

ZFS pools first.

These, like most volume managers, provide a level of indirection between the underlying device and the filesystem being mounted. nixos-generate-config already has a degree of understanding of this, even if just through robust parsing; if you use it, it'll generate segments looking like this:

  fileSystems."/home" =
    { device = "rpool/home";
      fsType = "zfs";
    };

--which are correctly handled by the rest of NixOS, generating the appropriate systemd units for mounting the filesystem.

Well, in principle. There's one problem: The mapping from device to filesystem is lost, which means systemd won't wait until the device nodes exist. There's an attempt to work around that with udevadm settle, but this doesn't actually work, as on modern systems init may be executed before the devices have even joined the bus. udevadm settle only waits until any pending udev events are processed, it doesn't wait for devices it doesn't know about.

I have on multiple occasions (#16901 and #42087) attempted to fix this, by hacky methods such as sleep loops.

Unfortunately, as I was coding the newest hack, I realized there's a problem with that approach: ZFS will happily import a pool with some missing devices, so long as there's sufficient redundancy for the pool to function. In layman's terms, if you have a simple two-disk mirror, it will still import the pool and let you mount it even if one disk is missing.

This is only natural. Mirroring would be pointless if that wasn't possible. Unfortunately, ZFS will not afterwards re-join the second disk to the pool, unless if the user explicitly asks it to by way of calling zpool online.

So, in summary: The practical effect of the wait-loop code is to make it possible that ZFS pools are imported in degraded mode, opening users up to data loss that they weren't expecting. This is terrible, and I've spent the last few hours trying to think of a quick fix... without much success. I want to say it's unlikely in practice, but all I can say is I haven't observed the behaviour myself.

I'm not sure how to quickly fix that, in a safe manner that can be backported to -stable. Help?

===

Now, the second issue: Fixing it properly requires generating systemd dependencies for the backing devices, in the form of After= dependencies on device units like dev-disk-by\x2did-nvme\x2deui.0025385871b09294\x2dpart2.device.

These cannot be generated at build-time, since the build might not be run on the machine the pool exists on. (And it would be a breach of hermeticity.) It could be done at bootloader install time, and that's an option I've considered, though it's also one I feel rather uncomfortable with. Ideally, I'd like the devices to be defined in hardware-configuration.nix -- that is what we already do for all non-pooled filesystems, in practice.

Which means that nixos-generate-config should be generating this, and I've spent the last two hours trying to make it so. It's not particularly complicated.

...except, nixos-generate-config is written in Perl. I've never learned Perl, and it seems a bad choice for everything but historical reasons.

So would anyone mind terribly if I start the process of converting it to Rust, by way of making at least this particular case use a Rust binary?

It won't expand the runtime closure. Rust binaries are statically linked; nothing but glibc is needed at runtime.

@jcrben
Copy link
Contributor

jcrben commented Jun 19, 2018

Conversation on IRC starting at https://botbot.me/freenode/nixos/2018-06-18/?msg=101240239&page=8

@Mic92
Copy link
Member

Mic92 commented Jun 19, 2018

There is also a prototype for zfs upstream, that does this at boot time: openzfs/zfs#4943

@Mic92
Copy link
Member

Mic92 commented Jun 19, 2018

Complementary would be to integrate also this mount generator: openzfs/zfs#7329

@aerusso
Copy link

aerusso commented Jun 20, 2018

I noticed this referenced in openzfs/zfs#4943, and I wanted to point to zfs-import.target merged in openzfs/zfs#6764 (and the yet to be merged integration of this target into zfs dracut integration, openzfs/zfs#6964).

I was actually unaware of some of the complexities of getting all devices to become available---udevadm settle has always been sufficient in the cases I've seen. Ideally, all this wait logic should be hidden behind zfs-import.target (and possibly later zfs-import@pool.service). However, at the moment I don't believe it addresses your concerns over partially available pools.

I've rebased the zready patch, and half-integrated it with the zfs-mount-generator, but there were many comments in that PR that need to be addressed. The zready patch would allow for much faster boots on systems that have nonessential pools that take a long time to import, so merging it would certainly be preferred.

@Mic92
Copy link
Member

Mic92 commented Jun 20, 2018

For non-root pools this is definitely the better choice.

@Baughn
Copy link
Contributor Author

Baughn commented Jun 21, 2018

The 'big hammer' I've got in #42269 is a better approach than requiring static configuration of the pool members, I think; it's only slightly slower than having the correct systemd dependencies would be, so we can wait for upstream.

I'm still going to try to rewrite nixos-generate-config, on the principle that probably more people can contribute if it isn't Perl. There are other changes I might like to make to it, but for now I'll be sticking to a 1:1 rewrite.

We'll see how the community feels about that in the corresponding PR, I suppose.

@jcrben
Copy link
Contributor

jcrben commented Oct 16, 2018

How's that rewrite? Been reading nixos-generate-config.pl lately 😅 (altho it's prolly easier going for me for me than Rust would be, but at least I'd feel better investing the time into Rust)

@Mic92
Copy link
Member

Mic92 commented Oct 19, 2018

You have to be quick otherwise it might be python3: #48378

@Baughn
Copy link
Contributor Author

Baughn commented Oct 19, 2018

@jcrben Real life got in the way. Suffice to say that I'm only just starting to look at it.

I don't really mind if it's Python, I just don't like Perl very much. :D

That said, thinking about rewriting the zfs-pool-import script in rust now. The shell scripting is getting over the top, and this is something that has to work.

@L-as
Copy link
Member

L-as commented Jan 24, 2019

Has there been any progress on this?

@Baughn
Copy link
Contributor Author

Baughn commented Jan 25, 2019

Not really. It's on my todo-list... somewhere, but I found other work that has a higher reward-to-effort ratio. If you want to take over, be my guest; otherwise it'll happen eventually.

@stale
Copy link

stale bot commented Jun 3, 2020

Thank you for your contributions.

This has been automatically marked as stale because it has had no activity for 180 days.

If this is still important to you, we ask that you leave a comment below. Your comment can be as simple as "still important to me". This lets people see that at least one person still cares about this. Someone will have to do this at most twice a year if there is no other activity.

Here are suggestions that might help resolve this more quickly:

  1. Search for maintainers and people that previously touched the related code and @ mention them in a comment.
  2. Ask on the NixOS Discourse.
  3. Ask on the #nixos channel on irc.freenode.net.

@stale stale bot added the 2.status: stale https://github.com/NixOS/nixpkgs/blob/master/.github/STALE-BOT.md label Jun 3, 2020
@Mic92
Copy link
Member

Mic92 commented Jun 3, 2020

Closing this issue in favor of #89396
If we use this generator the problem should no longer appear.

@Mic92 Mic92 closed this as completed Jun 3, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
2.status: stale https://github.com/NixOS/nixpkgs/blob/master/.github/STALE-BOT.md
Projects
None yet
Development

No branches or pull requests

5 participants