-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix spillslot size bug in SIMD by removing type-dependent spillslot allocation. #3645
Fix spillslot size bug in SIMD by removing type-dependent spillslot allocation. #3645
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems like a reasonable fix to me for now. Could an issue be filed about trying to do something better in the future? Also, could the sightglass benchmark suite be run to see if this has any impact on it?
I also think that it would be good to include the original wasm reproducer here as well:
(module
(func (result f32 f32)
i32.const 0
f32.convert_i32_s
v128.const i32x4 0 0 0 0
data.drop 0
f32x4.extract_lane 0
data.drop 0)
(export "" (func 0))
(data ""))
(where the "" export is run)
I'm a little uneasy about applying a blanket fix like this in the sense that it's making things "simpler" without any real path forward. For example the underlying bug here is that f32x4.extract_lane 0
is simply a mov
so we could fix just that one instruction but you're also worried about raw_bitcast
. This is the part that feels odd to me where there doesn't seem to be any recourse for concluding this doesn't affect raw_bitcast
.
I don't mean to say we shouldn't land this fix, I think we should land this ASAP, but I still think there's some food-for-thought style things to think about here.
f2bbbcc
to
7a0e1ef
Compare
Thanks! I ran a quick sanity-check with The reason I feel pretty strongly about doing it anyway is that there are "unknown unknowns": once I realized that regalloc.rs merges all vregs linked by move-elision into one equivalence class and gives us the type of an arbitrary one, that means we need to carefully audit all of our uses of move and move-like instructions for type safety. This is a safety issue that @julian-seward1 actually pointed out back in Jan 2020 when we were designing the regalloc.rs API; I pushed for using the vreg type to use smaller spillslots where possible, but given that this has now caused one real CVE and would have caused another CVE here if we had released SIMD support, I think that was a mistake, or at least we didn't adequately consider how to maintain the type-safety in the IR-to-registers mapping. One could reasonably ask: how do other compilers get this right, and use spillslots only as large as necessary? The answer is better discipline in all of the lowering patterns. I think one way of getting this would be to actually require an IR-level type to be passed down into the register allocator for each vreg; it would be opaque to the regalloc, but would be checked for every move (and would cause a panic if a cross-type move occurs). If we do that and fuzz all the panics out, we could probably move back to "spillslot only as large as actual type". I'll file an issue for this, and probably the right time to consider it is at the same time as we move over to regalloc2. Hopefully that seems reasonable; thoughts? |
cc @fitzgen and @jlb6740 while not super relevant to this PR per-se this seems relevant to sightglass.
That all sound reasonable to me, yeah, thanks for writing that up! I like the idea of passing opaque types to regalloc and regalloc could also possibly automatically determine that a move between two vregs isn't an equivalence class of vregs if the vregs have different types, assigning them separate live ranges and such. |
It isn't an overnight thing for me (at least for the realistic benchmarks, I don't run the shootout ones) usually just half an hour or so. |
Ah, OK, it's certainly possible I was holding it wrong; I remember cranking up the iterations and letting it run for a long time to try to get reasonable confidence intervals; and I remember you mentioning something similar (... on some searching, this comment) but maybe it's improved. Anyway, a separate topic; I'd love to be able to just ping a GitHub bot for this :-) |
I'm not a big fan of using extra stack space but I guess it makes sense as a temporary measure.
I think the existence of
I don't know if it is "programmer discipline" as much as "abstraction discipline." In V8, there are surely ways to make a mistake during lowering but my opinion is that this area is designed to be easier to use (no/less invariants to remember).
An alternate suggestion: perhaps regalloc should be smarter about |
7a0e1ef
to
bed1d8e
Compare
I just ran a few Sightglass benchmarks ( |
Yes, absolutely, "discipline in lowering patterns" was ambiguous, sorry! We should build guardrails into the abstractions that prevent us from doing the wrong thing by mistake or unknowingly; we clearly didn't do enough here. |
…llocation. This patch makes spillslot allocation, spilling and reloading all based on register class only. Hence when we have a 32- or 64-bit value in a 128-bit XMM register on x86-64 or vector register on aarch64, this results in larger spillslots and spills/restores. Why make this change, if it results in less efficient stack-frame usage? Simply put, it is safer: there is always a risk when allocating spillslots or spilling/reloading that we get the wrong type and make the spillslot or the store/load too small. This was one contributing factor to CVE-2021-32629, and is now the source of a fuzzbug in SIMD code that puns an arbitrary user-controlled vector constant over another stackslot. (If this were a pointer, that could result in RCE. SIMD is not yet on by default in a release, fortunately. In particular, we have not been particularly careful about using moves between values of different types, for example with `raw_bitcast` or with certain SIMD operations, and such moves indicate to regalloc.rs that vregs are in equivalence classes and some arbitrary vreg in the class is provided when allocating the spillslot or spilling/reloading. Since regalloc.rs does not track actual type, and since we haven't been careful about moves, we can't really trust this "arbitrary vreg in equivalence class" to provide accurate type information. In the fix to CVE-2021-32629 we fixed this for integer registers by always spilling/reloading 64 bits; this fix can be seen as the analogous change for FP/vector regs.
bed1d8e
to
833ebee
Compare
Fixes a recently-received fuzzbug exposed by differential fuzzing against V8.
This patch makes spillslot allocation, spilling and reloading all based
on register class only. Hence when we have a 32- or 64-bit value in a
128-bit XMM register on x86-64 or vector register on aarch64, this
results in larger spillslots and spills/restores.
Why make this change, if it results in less efficient stack-frame usage?
Simply put, it is safer: there is always a risk when allocating
spillslots or spilling/reloading that we get the wrong type and make the
spillslot or the store/load too small. This was one contributing factor
to CVE-2021-32629, and is now the source of a fuzzbug in SIMD code that
puns an arbitrary user-controlled vector constant over another
stackslot. (If this were a pointer, that could result in RCE. SIMD is
not yet on by default in a release, fortunately.)
In particular, we have not been particularly careful about using moves
between values of different types, for example with
raw_bitcast
orwith certain SIMD operations, and such moves indicate to regalloc.rs
that vregs are in equivalence classes and some arbitrary vreg in the
class is provided when allocating the spillslot or spilling/reloading.
Since regalloc.rs does not track actual type, and since we haven't been
careful about moves, we can't really trust this "arbitrary vreg in
equivalence class" to provide accurate type information.
In the fix to CVE-2021-32629 we fixed this for integer registers by
always spilling/reloading 64 bits; this fix can be seen as the analogous
change for FP/vector regs.