Skip to content

Freax13/cve-2024-21978-poc

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

SEV Firmware Vulnerability

This repo contains an exploit for a vulnerability in the SEV firmware. The exploit allows decrypting arbitrary memory of a running SEV-SNP guest.

Tested on version 1.55.16 (latest as of the time of writing).

Root Cause

The nv_paddr field of the SEV_INIT_EX command can be used to donate a chunk of memory to the firmware, so that it can be used instead of the persistent flash. If SEV-SNP is enabled, this memory has to be in the FIRMWARE state. The firmware checks this once while executing the SEV_INIT_EX command. From thereon after, the firmware assumes that this memory is in the FIRMWARE state and writes to it without any additional checks. The assumption that the memory is still in the FIRMWARE state is not always correct, nothing prevents the host from changing the state back to the HYPERVISOR state using the SNP_PAGE_RECLAIM command. Once the pages are in the HYPERVISOR state they can be transitioned into other states e.g. CONTEXT. Even though the pages are no longer in the FIRMWARE state, the firmware will write to those pages thus breaking the integrity required by certain page states.

Exploit

We can exploit this memory corruption by targeting CONTEXT pages. CONTEXT pages are powerful target, but there are some problems:

  1. CONTEXT pages are encrypted with a key than other memory. As a result, it's not easy to control the plaintext even if we could control the ciphertext.
  2. We don't have a lot of control of the memory written by the firmware.

The memory corruption effectively fills the CONTEXT page with random data, so it's not easy to actually corrupt CONTEXT pages in such a way that it's useful for the attacker. To work around that, we can repeatedly trigger the bug to cause corruption and use the SNP_GUEST_STATUS command the read back relevant fields of the corrupted CONTEXT page until we observe values that are useful.

The SNP_DBG_DECRYPT command can be used to decrypt the memory of a SEV-SNP guest with the DEBUG policy enabled. If we can craft a CONTEXT so that it has the DEBUG flag set and contains the ASID of another guest, we can use it to decrypt the memory of the other guest even though it doesn't have DEBUG policy set.

It turns out that SNP_DBG_DECRYPT ignores most fields in the CONTEXT page, it only checks gctx->guest.asid, gctx->guest.policy_snp and gctx->guest.guest_flags. The chance of these fields being correct after the memory corruption is not high, but it's also not out of the realm of possibility. The good news is also that we can read all of those fields using the GUEST_STATUS command.

In conclusion, we can exploit the bug with the following steps:

  1. Transition nv_paddr into the FIRMWARE state using the rmpupdate instruction.
  2. Execute the SEV_INIT_EX command.
  3. Transition nv_paddr back into the HYPERVISOR state using the SNP_RECLAIM_PAGE command.
  4. Create one or more CONTEXT pages at nv_paddr.
  5. Trick the firmware into writing to nv_paddr using the SEV_PDH_GEN command. This corrupts the CONTEXT pages.
  6. Use the GUEST_STATUS command to check whether SNP_DBG_DECRYPT would succeed, if not go back to step 5. The main bottleneck here is that ASIDs are stored in a 32-bit int, but there are way fewer valid ASIDs (509 or 1006 depending on the CPU), so it will take quite a lot of attempts to get this right.
  7. Launch (and optionally run) a victim guest using the corrupted ASID in the corrupted CONTEXT page. Keep track of the secrets page. This is possible because the SEV firmware tracks active ASIDs internally and doesn't check the active CONTEXT pages to check for duplicates.
  8. Use the corrupted CONTEXT page to execute SNP_DBG_DECRYPT on the secrets page of the victim guest.

The exploit spends most of its time on steps 5 and 6. The chances of hitting all the right conditions are about 1/20,000,000 on an EPYC Milan and we can do about 100 attempts per second, so we expect to hit the right conditions about once every two days (Warning: the calculations are only approximations and I may have messed something up, but anecdotally, once every two days feels about right). We can speed this up by not just attacking one CONTEXT page at a time, but three CONTEXT pages at nv_paddr, nv_paddr+4096, and nv_paddr+8192 (SEV_PDG_GEN will corrupt three pages). Conveniently those steps can be done ahead of launching the victim guest and only have to succeed once to attack an arbitrary number of guests (note that the PoC currently attacks just one guest though).

Impact

Although I haven't been able to test this yet, I believe that once an attacker has used this vulnerability to leak the guest's virtual machine platform communication keys, she should be able to send guest messages to the firmware on the guest's behalf and use this to request attestation reports. This violates a key principle of SEV-SNP in which only the guest should be able to request attestion reports.

Mitigation

A few commands (e.g. SNP_RECLAIM_PAGE, SNP_GCTX_CREATE, RING_BUFFER, maybe more?, maybe all just to be safe?) that accept a FIRMWARE page should check whether it overlaps with nv_paddr and fail if it does.

Upgrade Mitigations

There's one more concern I have, which I'm not sure is valid and would love to hear y'all's opinion on: IIUC the SEV firmware can be upgraded without interrupting running guests. This implies to me that it would be possible to carry over the corrupted CONTEXT page from an old vulnerable firmware version to a new fixed firmware version. Would it be possible to start on an older vulnerable version, do the exploit described above, upgrade and commit the new fixed firmware, launch the guest with the new firmware (so that the old firmware version doesn't show up in the attestation report) and then use the corrupted CONTEXT page created using the old firmware to attack the guest creating using the new version? Would a consumer of the attestion reports created by the new guest be able to tell that the old firmware version was running at some point before the new guest was launched? If not, are further mitigations required to prevent this from happening?

PoC Usage

  1. Apply the patches in the linux-patches folder to the tip of https://github.com/AMDESE/linux/commits/snp-host-v10. Build, install, and boot the kernel.
  2. Run the PoC.
    root@server:~/sev-exploit# cargo run --release
        Finished release [optimized] target(s) in 0.12s
        Running `target/release/sev-exploit`
    Corrupt guest context page so that ASID is in range 1..510
    Smallest ASID: 0x0000001f iterations: 14052175 zeros: 10539628 unique asids: 31500727 elapsed time: 1d 19h 20m 31s
    Creating VM with same ASID
    [03, 00, 00, 00, 00, 00, 00, 00, 11, 0f, a0, 00, 00, 00, 00, 00, 00, 00, 00, 00, 00, 00, 00, 00, 00, 00, 00, 00, 00, 00, 00, 00, f0, 51, a5, 03, 3f, 69, 6b, 93, e8, d8, 61, 0d, 2e, 5a, 45, f1, ea, 6d, bf, 49, fe, e4, a9, 2d, 8d, af, 76, 5e, 2e, 56, e0, fa, a9, b3, a7, e0, bc, 09, d9, 4f, 28, 5c, 9f, 84, d2, 7e, 34, eb, ea, 3f, 29, 88, 30, 01, 28, 65, 8b, 73, 3c, 84, 00, ae, 4a, 74, a2, 7a, d1, c7, 4f, 63, 7f, 72, 7b, 3b, 2f, 08, b3, 1a, 8c, 99, 1b, ad, b5, 1d, 42, 0b, 4d, 98, d4, 7d, c1, 0b, d6, 2f, b4, 6c, 6b, 51, a2, 92, 17, 3b, 01, e8, 82, 11, 1e, cb, cb, a2, 8f, c9, b0, 52, 1d, 1d, b7, d2, 25, 8d, 32, a9, 7a, 6f, 86, e4, 40, 44, 00, 00, 00, 00, 00, 00, 00, 00, 00, 00, 00, 00, 00, 00, 00, 00, 00, 00, 00, 00, 00, 00, 00, 00, 00, 00, 00, 00, 00, 00, 00, 00, 00, 00, 00, 00, 00, 00, 00, 00, 00, 00, 00, 00, 00, 00, 00, 00, 00, 00, 00, 00, 00, 00, 00, 00, 00, 00, 00, 00, 00, 00, 00, 00, 00, 00, 00, 00, 00, 00, 00, 00, 00, 00, 00, 00, 00, 00, 00, 00, 00, 00, 00, 00, 00, 00, 00, 00, 00, 00, 00, 00, 00, 00, 00, 00, 00, 00, 00, 00, 00, 80, 00, 88, 00, 00, 00, 00, ee, ff, 00, 00, f0, ff, ff, ff, ff, ff, ff, ff, ff, 3f, 00, 00, 00, 00, 00, 00, 00, 00, 00, 00, 00, 00, 00, 00, 00, 00, 00, 00, 00, 00, 00, 00, 00, 00, 00, 00, 00, 00, 00, 00, 00, 00, 00, 00, 00, 00, <cut off zeros>]
    thread 'main' panicked at src/main.rs:170:13:
    not yet implemented: use the leaked secrets to send guest messages
    note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
    

Note that it's normal for the exploit to run for a significant amount of time (on the order of hours if you're lucky, days if you're not). Running on an EPYC Genoa will likely be faster because there are almost two times more valid ASIDs.

During steps 5 and 6 the PoC displays some metrics:

  • "Smallest ASID": The smallest ASID encountered so far. This is just a sanity check metric to make sure that we encounter smaller and smaller ASIDs over time.
  • "iterations": This metric is increased every time the bug is triggered.
  • "zeroes": In about 1/4 of cases, the CONTEXT page is in a state where the firmware considers it not to have an ASID assigned yet. In those cases SNP_GUEST_STATUS will return 0 in the ASID field.
  • "unique asids": Another sanity check metric just to make sure that ASIDs are random and don't repeat after some time.
  • "elapsed time": Duration since the PoC was started.

In most cases decommissioning corrupted CONTEXT pages causes a crash of the firmware (most likely here). AFAICT crashes of the firmware cause a reset of the whole system. To avoid such crashes the kernel patches prevent CONTEXT pages from being decommissioned. One downside of this is that the ccp kernel module cannot be unloaded. Once the PoC has been started the whole system has to be rebooted before it can be started again (regardless of whether the PoC succeeded or was aborted).

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages