Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

kexec fails due to IMA being enforced on Azure VMs #128

Open
AkechiShiro opened this issue Aug 25, 2023 · 21 comments
Open

kexec fails due to IMA being enforced on Azure VMs #128

AkechiShiro opened this issue Aug 25, 2023 · 21 comments

Comments

@AkechiShiro
Copy link

AkechiShiro commented Aug 25, 2023

kexec fails due to IMA (Integrity Measurement Architecture) being enforced on Azure, I'm using nixos-anywhere and just saw that the image comes from here for unattended install.
See here : nix-community/nixos-anywhere#189

I want to know, do I need to build a new image in order to use kexec -s instead of kexec ?

It is due to IMA appraisal being enabled on Azure VMs :

[ 3099.239362] ima: impossible to appraise a kernel image without a file descriptor; try using kexec_file_load syscall.

More details here : https://kernsec.org/pipermail/linux-security-module-archive/2018-October/008951.html

To build, a compatible image, I should try and modify the build-images.sh script to my needs ?

Mic92 added a commit that referenced this issue Aug 26, 2023
Seems like GCP changed something about their boot process and the same instance types that failed to kexec, now just works.
This fixes secureboot as well: #128
@Mic92
Copy link
Member

Mic92 commented Aug 26, 2023

We now pass this flag but it's not clear to me what else is needed

@AkechiShiro
Copy link
Author

AkechiShiro commented Aug 26, 2023

@Ma27 I will investigate thoroughly more during the coming week and report back if I find a solution.
I will try to see if I can find a way to enroll/sign the kernel as being to get executed on Azure, if I find a way to make it work, I'll let you know the steps I took.

@henrirosten
Copy link

henrirosten commented Oct 27, 2023

@Mic92 , @AkechiShiro:
FYI: we have been successfully trialing nixos-anywhere with Azure Gen2 'Standard B' image types as described here: https://github.com/tiiuae/ghaf-infra/blob/main/docs/nixos-anywhere.md.

@AkechiShiro
Copy link
Author

AkechiShiro commented Oct 27, 2023

Hi @henrirosten

I'm not sure what you mean by Azure Gen 2 Standard B images ? Is the securityType of the VM TrustedLaunch ?
Could you give more information ?

nixos-anywhere fails to kexec due to a missing signature (SecureBoot being enabled and enforced).

Even disabling Integrity Measurement doesn't seem enough.

For more context, trying to modprobe unsigned kernel drivers also fails

@henrirosten
Copy link

'Standard' is the Azure security type that disables secure boot and IMA.

'B'-series refers to Azure VM image sizes which are deployed on hardware types and processors as described here: https://learn.microsoft.com/en-us/azure/virtual-machines/sizes-b-series-burstable.

@AkechiShiro
Copy link
Author

AkechiShiro commented Oct 28, 2023

There must be a way forward by which we could push in the official Azure Marketplace an Azure compatible NixOS image then, we just need to try and work with Lanzaboote folks and see if we can find a way to combine NixOS + Lanzaboote in order to have at least SecureBoot support, IMA will have to be disabled at first.

vTPM doesn't really matter, I'd guess, at first. But having nixos-anywhere compatible with other SecureBoot distributions seems to be a very non-trivial feat, the only way/workaround, I see that is possible, is to disable SecureBoot temporarily, use nixos-anywhere, then activate it back, but what will happen ? Since nixos-anywhere doesn't ship Lanzaboote, in the NixOS image I believe...

@Mic92 : Would a PR showcasing the steps to use nixos-anywhere on Azure gen 2 VMs that have been created using SecurityType : TrustedLaunch and not Standard by disabling SecureBoot temporarily would be something, acceptable for now ? Or would it be useless ?

There is some documentation that is there for anyone interested about testing their non-official NixOS VM image : https://learn.microsoft.com/en-us/partner-center/marketplace/azure-vm-image-test

Anyone interested on working on this, I'd be willing to progress on it slowly as much as I can, if I can commit enough time to make progress on it.

@Mic92
Copy link
Member

Mic92 commented Oct 28, 2023

@AkechiShiro you mean having a guide that describes how to install on Azure with nixos-anywhere? Sure. Could be dropped here: https://github.com/nix-community/nixos-anywhere/tree/main/docs/howtos

@Mic92
Copy link
Member

Mic92 commented Dec 24, 2023

Here is one idea: Shouldn't it be possible kexec into the original kernel but with ima_appraise=off and than do the actual nixos kexec afterwards?

@AkechiShiro
Copy link
Author

@Mic92 I will try that soon, but I've tried this on a debian 11 Cloud image and was still stuck with some weird issue I couldn't debug at all, but I'll need to check/retry again.

@Mic92
Copy link
Member

Mic92 commented Dec 26, 2023

If it was just an old kernel than eaf2d21 might solve it.

@AkechiShiro
Copy link
Author

AkechiShiro commented Jan 12, 2024

Hi @Mic92,
Sorry for the time taken to give it a try, it took me awhile.

I gave a try to run as root under a machine with Secure Boot disabled and ima_appraisal=off :

curl -L https://github.com/nix-community/nixos-images/releases/download/nixos-unstable/nixos-kexec-installer-noninteractive-x86_64-linux.tar.gz | tar -xzf- -C /root
/root/kexec/run

I got this output after the reboot, after the kexec I believe, seems like something bad happened ?

username login: [   10.089786] CPU1 failed to report alive state
[   10.129163] BUG: kernel NULL pointer dereference, address: 0000000000000010
[   10.129779] #PF: supervisor read access in kernel mode
[   10.129779] #PF: error_code(0x0000) - not-present page
[   10.129779] PGD 0 P4D 0 
[   10.129779] Oops: 0000 [#1] PREEMPT SMP PTI
[   10.129779] CPU: 0 PID: 11 Comm: kworker/u4:0 Not tainted 6.6.10 #1-NixOS
[   10.129779] Hardware name: Microsoft Corporation Virtual Machine/Virtual Machine, BIOS Hyper-V UEFI Release v4.1 07/12/2023
[   10.129779] Workqueue: eval_map_wq tracer_init_tracefs_work_func
[   10.129779] RIP: 0010:event_create_dir+0x29/0x5d0
[   10.129779] Code: 90 41 57 41 56 41 55 41 54 49 89 f4 55 53 48 83 ec 18 48 8b 46 28 4c 8b 6e 10 48 c7 c6 71 0b 9a 87 48 89 7c 24 08 48 89 04 24 <49> 8b 45 10 48 8b 18 48 89 df e8 58 40 8e 00 85 c0 0f 84 ec 04 00
[   10.129779] RSP: 0000:ffffa21d00093dd8 EFLAGS: 00010296
[   10.129779] RAX: 0000000000000000 RBX: ffff8bc14020e1e0 RCX: ffff8bc140808080
[   10.129779] RDX: 0000000000000000 RSI: ffffffff879a0b71 RDI: ffff8bc140442b40
[   10.129779] RBP: ffffffff88155260 R08: ffff8bc140b6c060 R09: 0000000000038ee0
[   10.129779] R10: ffff8bc140c3f080 R11: 006e776f64726165 R12: ffff8bc14020e1e0
[   10.129779] R13: 0000000000000000 R14: ffff8bc1402ed405 R15: ffffffff8875c948
[   10.129779] FS:  0000000000000000(0000) GS:ffff8bc1fbc00000(0000) knlGS:0000000000000000
[   10.129779] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   10.129779] CR2: 0000000000000010 CR3: 000000003d220001 CR4: 00000000003706f0
[   10.129779] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[   10.129779] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[   10.129779] Call Trace:
[   10.129779]  
[   10.129779]  ? __die+0x23/0x70
[   10.129779]  ? page_fault_oops+0x17d/0x4b0
[   10.129779]  ? exc_page_fault+0x6d/0x150
[   10.129779]  ? asm_exc_page_fault+0x26/0x30
[   10.129779]  ? event_create_dir+0x29/0x5d0
[   10.129779]  ? event_create_dir+0x123/0x5d0
[   10.129779]  __trace_early_add_event_dirs+0x33/0x70
[   10.129779]  event_trace_init+0x98/0xf0
[   10.129779]  tracer_init_tracefs_work_func+0xa/0x2e0
[   10.129779]  process_one_work+0x174/0x340
[   10.129779]  worker_thread+0x27b/0x3a0
[   10.129779]  ? __pfx_worker_thread+0x10/0x10
[   10.129779]  kthread+0xe8/0x120
[   10.129779]  ? __pfx_kthread+0x10/0x10
[   10.129779]  ret_from_fork+0x34/0x50
[   10.129779]  ? __pfx_kthread+0x10/0x10
[   10.129779]  ret_from_fork_asm+0x1b/0x30
[   10.129779]  
[   10.129779] Modules linked in:
[   10.129779] CR2: 0000000000000010
[   10.129779] ---[ end trace 0000000000000000 ]---
[   10.129779] RIP: 0010:event_create_dir+0x29/0x5d0
[   10.129779] Code: 90 41 57 41 56 41 55 41 54 49 89 f4 55 53 48 83 ec 18 48 8b 46 28 4c 8b 6e 10 48 c7 c6 71 0b 9a 87 48 89 7c 24 08 48 89 04 24 <49> 8b 45 10 48 8b 18 48 89 df e8 58 40 8e 00 85 c0 0f 84 ec 04 00
[   10.129779] RSP: 0000:ffffa21d00093dd8 EFLAGS: 00010296
[   10.129779] RAX: 0000000000000000 RBX: ffff8bc14020e1e0 RCX: ffff8bc140808080
[   10.129779] RDX: 0000000000000000 RSI: ffffffff879a0b71 RDI: ffff8bc140442b40
[   10.129779] RBP: ffffffff88155260 R08: ffff8bc140b6c060 R09: 0000000000038ee0
[   10.129779] R10: ffff8bc140c3f080 R11: 006e776f64726165 R12: ffff8bc14020e1e0
[   10.129779] R13: 0000000000000000 R14: ffff8bc1402ed405 R15: ffffffff8875c948
[   10.129779] FS:  0000000000000000(0000) GS:ffff8bc1fbc00000(0000) knlGS:0000000000000000
[   10.129779] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   10.129779] CR2: 0000000000000010 CR3: 000000003d220001 CR4: 00000000003706f0
[   10.129779] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[   10.129779] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[   10.129779] note: kworker/u4:0[11] exited with irqs disabled

@AkechiShiro
Copy link
Author

AkechiShiro commented Jan 13, 2024

Wait I gave it a try a second time, it's now working, so ima_appraisal=off, does allow the kexec to happen with SecureBoot disabled.
Specific image used was Ubuntu 23.10
EDIT : Network connectivity seems to be broken, I believe DHCP did not run a knew in order to get an IP address

(experimental, only tested for nixos-unstable) Static ip addresses and routes are restored after reboot. Interface that had dynamic addresses before are configured with DHCP and to accept prefixes from ipv6 router advertisement

The IP has been conserved but the DNS server probably need to be tweaked, I'm not sure, what is the default one, will edit if I have the answer.

EDIT 2 : Running the second kexec, in order to install NixOS (using grub) and the default example with some additional configuration led to an impossible to boot machine, it's stuck in HyperV's UEFI saying it found no suitable boot system.

I'll have to give a try with systemd-boot, I may also need to tweak disko's configuration.

@AkechiShiro
Copy link
Author

AkechiShiro commented Jan 16, 2024

So it seems ima_appraisal=off is not even needed if SecureBoot is off, however the first kexec happen sucessfully :

+ init=/nix/store/nadvk7k5qam9iq19kshbk2c045hkd5q6-nixos-system-nixos-23.11pre-git/init
+ kernelParams=console=tty0 console=ttyS0,115200 loglevel=4
+ readlink -f /root/kexec/kexec/run
+ dirname /root/kexec/kexec/run
+ SCRIPT_DIR=/root/kexec/kexec
+ TMPDIR=/root/kexec/kexec mktemp -d
+ INITRD_TMP=/root/kexec/kexec/tmp.mI4YwicutB
+ cd /root/kexec/kexec/tmp.mI4YwicutB
+ trap cleanup EXIT
+ mkdir -p ssh
+ extractPubKeys /root
+ home=/root
+ key=/root/.ssh/authorized_keys
+ test -e /root/.ssh/authorized_keys
+ grep -o \(\(ssh\|ecdsa\|sk\)-[^ ]* .*\) /root/.ssh/authorized_keys
+ key=/root/.ssh/authorized_keys2
+ test -e /root/.ssh/authorized_keys2
+ test -n root
+ sh -c echo ~root
+ sudo_home=/root
+ extractPubKeys /root
+ home=/root
+ key=/root/.ssh/authorized_keys
+ test -e /root/.ssh/authorized_keys
+ grep -o \(\(ssh\|ecdsa\|sk\)-[^ ]* .*\) /root/.ssh/authorized_keys
+ key=/root/.ssh/authorized_keys2
+ test -e /root/.ssh/authorized_keys2
+ test -e /etc/ssh/authorized_keys.d/root
+ test -n root
+ test -e /etc/ssh/authorized_keys.d/root
+ test -e /etc/ssh/ssh_host_dsa_key
+ cp -a /etc/ssh/ssh_host_dsa_key ssh
+ test -e /etc/ssh/ssh_host_dsa_key.pub
+ cp -a /etc/ssh/ssh_host_dsa_key.pub ssh
+ test -e /etc/ssh/ssh_host_ecdsa_key
+ cp -a /etc/ssh/ssh_host_ecdsa_key ssh
+ test -e /etc/ssh/ssh_host_ecdsa_key.pub
+ cp -a /etc/ssh/ssh_host_ecdsa_key.pub ssh
+ test -e /etc/ssh/ssh_host_ed25519_key
+ cp -a /etc/ssh/ssh_host_ed25519_key ssh
+ test -e /etc/ssh/ssh_host_ed25519_key.pub
+ cp -a /etc/ssh/ssh_host_ed25519_key.pub ssh
+ test -e /etc/ssh/ssh_host_rsa_key
+ cp -a /etc/ssh/ssh_host_rsa_key ssh
+ test -e /etc/ssh/ssh_host_rsa_key.pub
+ cp -a /etc/ssh/ssh_host_rsa_key.pub ssh
+ /root/kexec/kexec/ip --json addr
+ /root/kexec/kexec/ip -4 --json route
+ /root/kexec/kexec/ip -6 --json route
+ [ -f /etc/machine-id ]
+ cp /etc/machine-id machine-id
+ find .
+ gzip -9
+ cpio -o -H newc
27 blocks
+ kexecSyscallFlags=
+ + sort -c -V
uname -r
+ printf %s\n 6.1 6.5.0-1010-azure
+ kexecSyscallFlags=--kexec-syscall-auto
+ /root/kexec/kexec/kexec --load /root/kexec/kexec/bzImage --kexec-syscall-auto --initrd=/root/kexec/kexec/initrd --no-checks --command-line init=/nix/store/nadvk7k5qam9iq19kshbk2c045hkd5q6-nixos-system-nixos-23.11pre-git/init console=tty0 console=ttyS0,115200 loglevel=4
machine will boot into nixos in 6s...
+ echo machine will boot into nixos in 6s...
+ test -e /dev/kmsg
+ exec
ssh: connect to host localhost port 22: Connection refused
....
Endless repeat of the last line
But the VM looses some network connectivity after the kexec, the script thus cannot reach the VM in order to finish the "nixosification"

On the VM, NixOS did kexec successfully and the ssh service is running :

[nixos@nixos:~]$ systemctl status sshd
● sshd.service - SSH Daemon
     Loaded: loaded (/etc/systemd/system/sshd.service; enabled; preset: enabled)
     Active: active (running) since Tue 2024-01-16 ; 14s ago
    Process: 636 ExecStartPre=/nix/store/n7lpzrgsj5kmwsnm8fvv8cawr8qycym6-unit->
   Main PID: 639 (sshd)
         IP: 0B in, 0B out
         IO: 1.3M read, 0B written
      Tasks: 1 (limit: 4195)
     Memory: 3.4M
        CPU: 133ms
     CGroup: /system.slice/sshd.service
             └─639 "sshd: /nix/store/9fkxlh9gyxnb7bahc2rn0b5fhamgb63m-openssh-9>

nixos systemd[1]: Starting SSH Daemon...
nixos systemd[1]: Started SSH Daemon.
nixos sshd[639]: Server listening on 0.0.0.0 port 22.
nixos sshd[639]: Server listening on :: port 22.

@AkechiShiro
Copy link
Author

AkechiShiro commented Jan 16, 2024

By following some tips on nix-community/nixos-anywhere#112 and also https://github.com/tiiuae/ghaf-infra/blob/main/docs/nixos-anywhere.md?plain=1#L138-L149

I was able to install NixOS using nixos-anywhere, I also add to use --post-kexec-ssh-port as the port wasn't the default one.

I will try to document the steps and create PR in the future.

EDIT : I'm still lacking internet connectivity despite being able to reach the virtual machine using ssh 🤔 (dns seems to be working fine) (I was wrong everything works as intended)
EDIT 2 : Also did the install with systemd-boot instead of grub.

@Mic92
Copy link
Member

Mic92 commented Jan 17, 2024

I think nixos-anywhere could automate this kexec step as well if it detects a locked down kernel.

@AkechiShiro
Copy link
Author

By lockdown you mean if IMA is configured and enabled ?

However for SecureBoot enabled machine we still don't have a solution yet, I believe the only way to have SecureBoot on Azure would probably to first contact Microsoft to know if there is a process.

But there should probably no way to nixos-anywhere unless we could sign the kernels with the key enrolled on the Azure machine.

@Mic92
Copy link
Member

Mic92 commented Jan 17, 2024

Is IMA not the mechanism that is in place in case the machine was booted with secure boot?

@AkechiShiro
Copy link
Author

AkechiShiro commented Jan 17, 2024

I think IMA is kinda of an extension of SecureBoot to cover more files but on my test the machine, I did disable SecureBoot, I'll do some test with SecureBoot on and ima_appraise=off and report the result.

But so far SecureBoot off, IMA appraisal off worked.

Then with just SecureBoot off it should work out too.

Note : also sometimes the kexec seems to fail and the machine is kind of frozen after a nulle pointer dereference in the kernel and a CPU core seems just stuck

@AkechiShiro
Copy link
Author

AkechiShiro commented Jan 19, 2024

@Mic92 it seems that if SecureBoot is enabled, it is not possible to kexec.

$ cat /proc/cmdline 
BOOT.... console=tty0 console=ttyS0,115200 earlyprintk=ttyS0,115200 consoleblank=0 ima_appraise=off

See, even with ima_appraise=off :

[   60.022694] PEFILE: Unsigned PE binary
[   60.024444] kexec_file: Enforced kernel signature verification failed (-61).

Also ima_appraisal does not exist ? I only find ima_appraise=off as valid online. So without SecureBoot maybe adding ima_appraise=off is not needed.

@usama8800
Copy link
Contributor

Maybe it should be stated in the README that kexec doesn't work with secure boot

@Mic92
Copy link
Member

Mic92 commented Sep 4, 2024

@usama8800 feel free to add it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants