-
-
Notifications
You must be signed in to change notification settings - Fork 40
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
kexec fails due to IMA being enforced on Azure VMs #128
Comments
Seems like GCP changed something about their boot process and the same instance types that failed to kexec, now just works. This fixes secureboot as well: #128
We now pass this flag but it's not clear to me what else is needed |
@Ma27 I will investigate thoroughly more during the coming week and report back if I find a solution. |
@Mic92 , @AkechiShiro: |
Hi @henrirosten I'm not sure what you mean by Azure Gen 2 Standard B images ? Is the securityType of the VM TrustedLaunch ? nixos-anywhere fails to kexec due to a missing signature (SecureBoot being enabled and enforced). Even disabling Integrity Measurement doesn't seem enough. For more context, trying to modprobe unsigned kernel drivers also fails |
'Standard' is the Azure security type that disables secure boot and IMA. 'B'-series refers to Azure VM image sizes which are deployed on hardware types and processors as described here: https://learn.microsoft.com/en-us/azure/virtual-machines/sizes-b-series-burstable. |
There must be a way forward by which we could push in the official Azure Marketplace an Azure compatible NixOS image then, we just need to try and work with Lanzaboote folks and see if we can find a way to combine NixOS + Lanzaboote in order to have at least SecureBoot support, IMA will have to be disabled at first. vTPM doesn't really matter, I'd guess, at first. But having nixos-anywhere compatible with other SecureBoot distributions seems to be a very non-trivial feat, the only way/workaround, I see that is possible, is to disable SecureBoot temporarily, use nixos-anywhere, then activate it back, but what will happen ? Since nixos-anywhere doesn't ship Lanzaboote, in the NixOS image I believe... @Mic92 : Would a PR showcasing the steps to use nixos-anywhere on Azure gen 2 VMs that have been created using There is some documentation that is there for anyone interested about testing their non-official NixOS VM image : https://learn.microsoft.com/en-us/partner-center/marketplace/azure-vm-image-test Anyone interested on working on this, I'd be willing to progress on it slowly as much as I can, if I can commit enough time to make progress on it. |
@AkechiShiro you mean having a guide that describes how to install on Azure with nixos-anywhere? Sure. Could be dropped here: https://github.com/nix-community/nixos-anywhere/tree/main/docs/howtos |
Here is one idea: Shouldn't it be possible kexec into the original kernel but with |
@Mic92 I will try that soon, but I've tried this on a debian 11 Cloud image and was still stuck with some weird issue I couldn't debug at all, but I'll need to check/retry again. |
If it was just an old kernel than eaf2d21 might solve it. |
Hi @Mic92, I gave a try to run as root under a machine with Secure Boot disabled and
I got this output after the reboot, after the kexec I believe, seems like something bad happened ? username login: [ 10.089786] CPU1 failed to report alive state [ 10.129163] BUG: kernel NULL pointer dereference, address: 0000000000000010 [ 10.129779] #PF: supervisor read access in kernel mode [ 10.129779] #PF: error_code(0x0000) - not-present page [ 10.129779] PGD 0 P4D 0 [ 10.129779] Oops: 0000 [#1] PREEMPT SMP PTI [ 10.129779] CPU: 0 PID: 11 Comm: kworker/u4:0 Not tainted 6.6.10 #1-NixOS [ 10.129779] Hardware name: Microsoft Corporation Virtual Machine/Virtual Machine, BIOS Hyper-V UEFI Release v4.1 07/12/2023 [ 10.129779] Workqueue: eval_map_wq tracer_init_tracefs_work_func [ 10.129779] RIP: 0010:event_create_dir+0x29/0x5d0 [ 10.129779] Code: 90 41 57 41 56 41 55 41 54 49 89 f4 55 53 48 83 ec 18 48 8b 46 28 4c 8b 6e 10 48 c7 c6 71 0b 9a 87 48 89 7c 24 08 48 89 04 24 <49> 8b 45 10 48 8b 18 48 89 df e8 58 40 8e 00 85 c0 0f 84 ec 04 00 [ 10.129779] RSP: 0000:ffffa21d00093dd8 EFLAGS: 00010296 [ 10.129779] RAX: 0000000000000000 RBX: ffff8bc14020e1e0 RCX: ffff8bc140808080 [ 10.129779] RDX: 0000000000000000 RSI: ffffffff879a0b71 RDI: ffff8bc140442b40 [ 10.129779] RBP: ffffffff88155260 R08: ffff8bc140b6c060 R09: 0000000000038ee0 [ 10.129779] R10: ffff8bc140c3f080 R11: 006e776f64726165 R12: ffff8bc14020e1e0 [ 10.129779] R13: 0000000000000000 R14: ffff8bc1402ed405 R15: ffffffff8875c948 [ 10.129779] FS: 0000000000000000(0000) GS:ffff8bc1fbc00000(0000) knlGS:0000000000000000 [ 10.129779] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 10.129779] CR2: 0000000000000010 CR3: 000000003d220001 CR4: 00000000003706f0 [ 10.129779] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 10.129779] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [ 10.129779] Call Trace: [ 10.129779] [ 10.129779] ? __die+0x23/0x70 [ 10.129779] ? page_fault_oops+0x17d/0x4b0 [ 10.129779] ? exc_page_fault+0x6d/0x150 [ 10.129779] ? asm_exc_page_fault+0x26/0x30 [ 10.129779] ? event_create_dir+0x29/0x5d0 [ 10.129779] ? event_create_dir+0x123/0x5d0 [ 10.129779] __trace_early_add_event_dirs+0x33/0x70 [ 10.129779] event_trace_init+0x98/0xf0 [ 10.129779] tracer_init_tracefs_work_func+0xa/0x2e0 [ 10.129779] process_one_work+0x174/0x340 [ 10.129779] worker_thread+0x27b/0x3a0 [ 10.129779] ? __pfx_worker_thread+0x10/0x10 [ 10.129779] kthread+0xe8/0x120 [ 10.129779] ? __pfx_kthread+0x10/0x10 [ 10.129779] ret_from_fork+0x34/0x50 [ 10.129779] ? __pfx_kthread+0x10/0x10 [ 10.129779] ret_from_fork_asm+0x1b/0x30 [ 10.129779] [ 10.129779] Modules linked in: [ 10.129779] CR2: 0000000000000010 [ 10.129779] ---[ end trace 0000000000000000 ]--- [ 10.129779] RIP: 0010:event_create_dir+0x29/0x5d0 [ 10.129779] Code: 90 41 57 41 56 41 55 41 54 49 89 f4 55 53 48 83 ec 18 48 8b 46 28 4c 8b 6e 10 48 c7 c6 71 0b 9a 87 48 89 7c 24 08 48 89 04 24 <49> 8b 45 10 48 8b 18 48 89 df e8 58 40 8e 00 85 c0 0f 84 ec 04 00 [ 10.129779] RSP: 0000:ffffa21d00093dd8 EFLAGS: 00010296 [ 10.129779] RAX: 0000000000000000 RBX: ffff8bc14020e1e0 RCX: ffff8bc140808080 [ 10.129779] RDX: 0000000000000000 RSI: ffffffff879a0b71 RDI: ffff8bc140442b40 [ 10.129779] RBP: ffffffff88155260 R08: ffff8bc140b6c060 R09: 0000000000038ee0 [ 10.129779] R10: ffff8bc140c3f080 R11: 006e776f64726165 R12: ffff8bc14020e1e0 [ 10.129779] R13: 0000000000000000 R14: ffff8bc1402ed405 R15: ffffffff8875c948 [ 10.129779] FS: 0000000000000000(0000) GS:ffff8bc1fbc00000(0000) knlGS:0000000000000000 [ 10.129779] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 10.129779] CR2: 0000000000000010 CR3: 000000003d220001 CR4: 00000000003706f0 [ 10.129779] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 10.129779] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [ 10.129779] note: kworker/u4:0[11] exited with irqs disabled |
Wait I gave it a try a second time, it's now working, so
The IP has been conserved but the DNS server probably need to be tweaked, I'm not sure, what is the default one, will edit if I have the answer. EDIT 2 : Running the second kexec, in order to install NixOS (using grub) and the default example with some additional configuration led to an impossible to boot machine, it's stuck in HyperV's UEFI saying it found no suitable boot system. I'll have to give a try with systemd-boot, I may also need to tweak disko's configuration. |
So it seems + init=/nix/store/nadvk7k5qam9iq19kshbk2c045hkd5q6-nixos-system-nixos-23.11pre-git/init + kernelParams=console=tty0 console=ttyS0,115200 loglevel=4 + readlink -f /root/kexec/kexec/run + dirname /root/kexec/kexec/run + SCRIPT_DIR=/root/kexec/kexec + TMPDIR=/root/kexec/kexec mktemp -d + INITRD_TMP=/root/kexec/kexec/tmp.mI4YwicutB + cd /root/kexec/kexec/tmp.mI4YwicutB + trap cleanup EXIT + mkdir -p ssh + extractPubKeys /root + home=/root + key=/root/.ssh/authorized_keys + test -e /root/.ssh/authorized_keys + grep -o \(\(ssh\|ecdsa\|sk\)-[^ ]* .*\) /root/.ssh/authorized_keys + key=/root/.ssh/authorized_keys2 + test -e /root/.ssh/authorized_keys2 + test -n root + sh -c echo ~root + sudo_home=/root + extractPubKeys /root + home=/root + key=/root/.ssh/authorized_keys + test -e /root/.ssh/authorized_keys + grep -o \(\(ssh\|ecdsa\|sk\)-[^ ]* .*\) /root/.ssh/authorized_keys + key=/root/.ssh/authorized_keys2 + test -e /root/.ssh/authorized_keys2 + test -e /etc/ssh/authorized_keys.d/root + test -n root + test -e /etc/ssh/authorized_keys.d/root + test -e /etc/ssh/ssh_host_dsa_key + cp -a /etc/ssh/ssh_host_dsa_key ssh + test -e /etc/ssh/ssh_host_dsa_key.pub + cp -a /etc/ssh/ssh_host_dsa_key.pub ssh + test -e /etc/ssh/ssh_host_ecdsa_key + cp -a /etc/ssh/ssh_host_ecdsa_key ssh + test -e /etc/ssh/ssh_host_ecdsa_key.pub + cp -a /etc/ssh/ssh_host_ecdsa_key.pub ssh + test -e /etc/ssh/ssh_host_ed25519_key + cp -a /etc/ssh/ssh_host_ed25519_key ssh + test -e /etc/ssh/ssh_host_ed25519_key.pub + cp -a /etc/ssh/ssh_host_ed25519_key.pub ssh + test -e /etc/ssh/ssh_host_rsa_key + cp -a /etc/ssh/ssh_host_rsa_key ssh + test -e /etc/ssh/ssh_host_rsa_key.pub + cp -a /etc/ssh/ssh_host_rsa_key.pub ssh + /root/kexec/kexec/ip --json addr + /root/kexec/kexec/ip -4 --json route + /root/kexec/kexec/ip -6 --json route + [ -f /etc/machine-id ] + cp /etc/machine-id machine-id + find . + gzip -9 + cpio -o -H newc 27 blocks + kexecSyscallFlags= + + sort -c -V uname -r + printf %s\n 6.1 6.5.0-1010-azure + kexecSyscallFlags=--kexec-syscall-auto + /root/kexec/kexec/kexec --load /root/kexec/kexec/bzImage --kexec-syscall-auto --initrd=/root/kexec/kexec/initrd --no-checks --command-line init=/nix/store/nadvk7k5qam9iq19kshbk2c045hkd5q6-nixos-system-nixos-23.11pre-git/init console=tty0 console=ttyS0,115200 loglevel=4 machine will boot into nixos in 6s... + echo machine will boot into nixos in 6s... + test -e /dev/kmsg + exec ssh: connect to host localhost port 22: Connection refused .... Endless repeat of the last line On the VM, NixOS did kexec successfully and the ssh service is running :
|
By following some tips on nix-community/nixos-anywhere#112 and also https://github.com/tiiuae/ghaf-infra/blob/main/docs/nixos-anywhere.md?plain=1#L138-L149 I was able to install NixOS using nixos-anywhere, I also add to use --post-kexec-ssh-port as the port wasn't the default one. I will try to document the steps and create PR in the future.
|
I think nixos-anywhere could automate this kexec step as well if it detects a locked down kernel. |
By lockdown you mean if IMA is configured and enabled ? However for SecureBoot enabled machine we still don't have a solution yet, I believe the only way to have SecureBoot on Azure would probably to first contact Microsoft to know if there is a process. But there should probably no way to nixos-anywhere unless we could sign the kernels with the key enrolled on the Azure machine. |
Is IMA not the mechanism that is in place in case the machine was booted with secure boot? |
I think IMA is kinda of an extension of SecureBoot to cover more files but on my test the machine, I did disable SecureBoot, I'll do some test with SecureBoot on and ima_appraise=off and report the result. But so far SecureBoot off, IMA appraisal off worked. Then with just SecureBoot off it should work out too. Note : also sometimes the kexec seems to fail and the machine is kind of frozen after a nulle pointer dereference in the kernel and a CPU core seems just stuck |
@Mic92 it seems that if SecureBoot is enabled, it is not possible to kexec.
See, even with
Also |
Maybe it should be stated in the README that kexec doesn't work with secure boot |
@usama8800 feel free to add it. |
kexec
fails due to IMA (Integrity Measurement Architecture) being enforced on Azure, I'm using nixos-anywhere and just saw that the image comes from here for unattended install.See here : nix-community/nixos-anywhere#189
I want to know, do I need to build a new image in order to use
kexec -s
instead ofkexec
?It is due to
IMA
appraisal being enabled on Azure VMs :More details here : https://kernsec.org/pipermail/linux-security-module-archive/2018-October/008951.html
To build, a compatible image, I should try and modify the
build-images.sh
script to my needs ?The text was updated successfully, but these errors were encountered: