-
-
Notifications
You must be signed in to change notification settings - Fork 14.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
nixos/clevis: init & support for zfs, bcachefs and luks #257525
Conversation
To-do: write a test to check the fallback on interactive passphrase |
Maybe rebase it on #247037 ? |
027ded9
to
e20e20c
Compare
43a2a72
to
d86644d
Compare
We will have to rebase on #261884 when it is merged. |
@ofborg test installer |
Looks like ofborg is not able to run all the tests before timing out. what should we do about that? Target staging instead? |
We should run them locally and merge.
Le lun. 27 nov. 2023, 19:12, Julien Malka ***@***.***> a
écrit :
… Looks like ofborg is not able to run all the tests before timing out. what
should we do about that? Target staging instead?
—
Reply to this email directly, view it on GitHub
<#257525 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AACMZRA5WMBF2WGC5QJ2G2TYGTJYZAVCNFSM6AAAAAA5IPL45SVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQMRYGM3DONRYGY>
.
You are receiving this because your review was requested.Message ID:
***@***.***>
|
Ran the tests, installer.nix:
installer-systemd-stage-1.nix:
I had to exclude tests that are already broken on Hydra. @RaitoBezarius you have access to the machine on which I performed the tests so if you want to check go ahead. |
@JulienMalka can you also run the installer-systemd-stage-1 tests? |
I've done that already, the results are in the previous message. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Merge conflict with release notes.
@@ -1081,6 +1096,35 @@ in | |||
boot.initrd.preLVMCommands = mkIf (!config.boot.initrd.systemd.enable) (commonFunctions + preCommands + concatStrings (mapAttrsToList openCommand preLVM) + postCommands); | |||
boot.initrd.postDeviceCommands = mkIf (!config.boot.initrd.systemd.enable) (commonFunctions + preCommands + concatStrings (mapAttrsToList openCommand postLVM) + postCommands); | |||
|
|||
boot.initrd.systemd.services = let devicesWithClevis = filterAttrs (device: _: (hasAttr device clevis.devices)) luks.devices; in |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One way you could address @RaitoBezarius's comment would be to make a service per device, and wed it to the systemd.device showing up. I think that's pretty clean, and it's a 1:1 map to the code in nixos/modules/system/boot/luksroot.nix
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you, I think I'll address these remarks in a follow-up PR.
Co-Authored-By: Julien Malka <julien@malka.sh>
Co-Authored-By: Camille Mondon <camillemondon@free.fr>
Fixed conflicts with |
Merging because it is constantly entering into merge conflicts with release notes and we plenty tested this PR by now. |
Many thanks to everyone that took part of the review process of this PR, especially @ElvishJerricco, @RaitoBezarius, @philiptaron :) |
Yay! Congratulations @JulienMalka. |
@@ -623,6 +629,9 @@ in | |||
fi | |||
poolImported "${pool}" || poolImport "${pool}" # Try one last time, e.g. to import a degraded pool. | |||
fi | |||
|
|||
${concatMapStringsSep "\n" (elem: "clevis decrypt < /etc/clevis/${elem}.jwe | zfs load-key ${elem}") (filter (p: (elemAt (splitString "/" p) 0) == pool) clevisDatasets)} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wouldn't this add calls to clevis
even when boot.initrd.clevis.enable
is false?
I think that's why I started getting this evaluation error yesterday: https://gist.github.com/aij/bb5c49f0afbde2b3efdfc8f30c1e7b6e
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you have clevis devices set but clevis disabled ? Or do you have nothing clevis related in your config but still evaluation errors ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for reporting that, I am proposing a fix here: #272061
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry about the delay. FWIW, I had nothing clevis related in my configs.
Fix looks good. Thanks!
@@ -17,6 +17,9 @@ let | |||
cfgZED = config.services.zfs.zed; | |||
|
|||
selectModulePackage = package: config.boot.kernelPackages.${package.kernelModuleAttribute}; | |||
clevisDatasets = map (e: e.device) (filter (e: (hasAttr e.device config.boot.initrd.clevis.devices) && e.fsType == "zfs" && (fsNeededForBoot e)) config.system.build.fileSystems); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is mishandling fileSystems with device = null
(the default).
For example, I'm mounting /boot
by label with
fileSystems."/boot" =
{ label = "boot";
fsType = "ext4";
};
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I think you are right
First, thanks for all the effort that must have gone into this! Unfortunately I'm still struggling to get this working. Sry for reusing the thread but I guess all the experts are here and I feel many non-experts would appreciate some more guidance here ;) I would like my tang server to unlock my LUKS device (setup via standard nix-os ui installer - full disk encryption). Things I tried so far... Approach via clevis luks bind
This creates an entry in the LUKS header such that Outcome: Nothing happens -> The password prompt still shows up, nothing regarding clevis is logged. Could it be that what I want is still not possible and this issue status is actually still correct (OPEN atm.): #121636 I think people are also curious on that thread as well... Approach via secretFile - JWEClevis.md actually only mentions a Outcome: Boot Log shows My nix configs (used for both approaches)Only the secretFile is different and only enabled in the latter approach! hardware-configuration.nix ...shortened to relevant parts. I learned you should not alter that file because it could be overwritten in the future - but should not make a difference - a module is a module?!
Any help or hint appreciated! UPDATE: it works now, not 100% what was the problem though. PS: snippet for creating the .jwe file (fill out stuff in
|
@gernotfeichter Hello, ideally not to ping all the people suscribed to this PR can we continue this on Matrix ? My username is JulienMalka:matrix.org |
"initrd-switch-root.target" | ||
"shutdown.target" | ||
]; | ||
wants = [ "systemd-udev-settle.service" ] ++ optional clevis.useTang "network-online.target"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
systemd-udev-settle is broken by design and should not be used, ever. See #73095 (comment).
I don't know what you're trying to depend on, but this is most likely not the correct way.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We use that to wait for TPM2 to be available. Do you know of another synchronization point for that ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you know the exact device path (say under /dev/ or /sys) in advance you can add a dependency on a device unit (see systemd.device(5)
). Otherwise you could listen for the specific udev events using udevadm monitor -s <subsystem>/<devicetype>
in you script. There's an example here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems TPM2 devices get linked to /dev/tpmrm, so you should probably add a Wants=dev-tpmrm0.device
dependency.
I don't know if systemd handles TPMs devices by default, though. If not you'll also have to add a TAG+=systemd
rule on those devices.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, look what was just commited to systemd: systemd/systemd@4e1f003
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yea, in the future, we should definitely change this to use tpm2.target
. At the moment, I don't know if we can add dependencies on dev-tpmrm0.device
directly, because that would cause boot to delay waiting for that device to timeout in the event that one doesn't exist.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The tpm2 target does exactly this: it adds a dependecy on dev-tpmrm0.device
, so it will time out without a TPM.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@rnhmjoj But tpm2.target
is not in the initial transaction by default. It only gets added by systemd-tpm2-generator
if it detects that the TPM2 exists in /dev
or according to the firmware (via efi variables). Units aren't supposed to have Wants=tpm2.target
, only After=tpm2.target
so that they're ordered properly when a TPM exists.
Early boot programs that intend to access the TPM2 device should hence order themselves after this target unit, but not pull it in.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, I see. Anyway, it seems that this service does require a TPM, so should it be if not a timeout?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@rnhmjoj I don't think a TPM is required necessarily? I think the .jwe
determines if a TPM is required.
@@ -154,6 +157,9 @@ let | |||
poolImported "${pool}" || poolImport "${pool}" # Try one last time, e.g. to import a degraded pool. | |||
fi | |||
if poolImported "${pool}"; then |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the entire zfs pool is encrypted it will ask the password before this check and clevis won't be able to decrypt the pool.
Jun 11 22:42:05 localhost zfs-import-rpool-start[327]: importing ZFS pool "rpool"...
Jun 11 22:42:05 localhost zfs-import-rpool-start[652]: Key load error: Keys must be loaded for encryption root of 'rpool/safe/root' (rpool).
I have this config:
{ config, pkgs, ... }:
{
boot.initrd.clevis.enable = true;
boot.initrd.clevis.devices."rpool/safe/root".secretFile = "/etc/initrd/clevis/rpool.jwe";
}
The filesystems config is roughly like this:
{
fileSystems."/" =
{ device = "rpool/safe/root";
fsType = "zfs";
};
fileSystems."/nix" =
{ device = "rpool/local/nix";
fsType = "zfs";
};
fileSystems."/boot" =
{ device = "/dev/disk/by-uuid/E08C-AFC5";
fsType = "vfat";
options = [ "fmask=0022" "dmask=0022" ];
};
}
% zfs list -rHo name,keylocation,keystatus -t volume,filesystem rpool
rpool prompt available
rpool/local none available
rpool/local/nix none available
rpool/reserved none available
rpool/safe none available
rpool/safe/root none available
With this config it tries to execute zfs load-key rpool/safe/root
which fails with Key load error: Keys must be loaded for encryption root of 'rpool/safe/root'
, but it works if I execute zfs load-key rpool
:
-bash-5.2# zfs load-key rpool/safe/root
Key load error: Keys must be loaded for encryption root of 'rpool/safe/root' (rpool).
-bash-5.2# zfs load-key rpool
Enter passphrase for 'rpool':
If I use boot.initrd.clevis.devices."rpool".secretFile = "/etc/initrd/clevis/rpool.jwe";
it fails to evaluate:
Failed assertions:
- No filesystem or LUKS device with the name rpool is declared in your configuration.
This looks like a module deficiency to me.
Description of changes
Work done with equal contribution from @camillemndn. This work has been founded as part of an NGI Assure grant from NLNet.
This PR adds support for unattended decryption of encrypted boot device using Clevis and Tang. In this first PR, we add:
Remaining to do before merging this PR:
includeBuildDependencies
option