Skip to content

Commit

Permalink
Cranelift: Allow CLIF frontends to define their own stack maps
Browse files Browse the repository at this point in the history
Tracking GC references and producing stack maps is a significant amount of
complexity in `regalloc2`.

At the same time, GC reference value types are pretty annoying to deal with in
Cranelift itself. We know our `r64` is "actually" just an `i64` pointer, and we
want to do `i64`-y things with it, such as an `iadd` to compute a derived
pointer, but `iadd` only takes integer types and not `r64`s. We investigated
loosening that restriction and it was way too painful given the way that CLIF
type inference and its controlling type vars work. So to compute those derived
pointers, we have to first `bitcast` the `r64` into an `i64`. This is
unfortunate in two ways. First, because of arcane interactions between register
allocation constraints, stack maps, and ABIs this involves inserting unnecessary
register-to-register moves in our generated code which hurts binary size and
performance ever so slightly. Second, and much more seriously, this is a serious
footgun. If a GC reference isn't an `r64` right now, then it will not appear in
stack maps, and failure to record a live GC reference in a stack map means that
the collector could reclaim the object while you are still using it, leading to
use-after-free bugs! Very bad. And the mid-end needs to know
*not* to GVN these bitcasts or else we get similar bugs (see
bytecodealliance#8317).

Overall GC references are a painful situation for us today.

This commit is the introduction of an alternative. (Note, though, that we aren't
quite ready to remove the old stack maps infrastructure just yet.)

Instead of preserving GC references all the way through the whole pipeline and
computing live GC references and inserting spills at safepoints for stack maps
all the way at the end of that pipeline in register allocation, the
CLIF-producing frontend explicitly generates its own stack slots and spills for
safepoints. The only thing the rest of the compiler pipeline needs to know is
the metadata required to produce the stack map for the associated safepoint. We
can completely remove `r32` and `r64` from Cranelift and just use plain `i32`
and `i64` values. Or `f64` if the runtime uses NaN-boxing, which the old stack
maps system did not support at all. Or 32-bit GC references on a 64-bit target,
which was also not supported by the old system. Furthermore, we *cannot* get
miscompiles due to GVN'ing bitcasts that shouldn't be GVN'd because there aren't
any bitcasts hiding GC references from stack maps anymore. And in the case of a
moving GC, we don't need to worry about the mid-end doing illegal code motion
across calls that could have triggered a GC that invalidated the moved GC
reference because frontends will reload their GC references from the stack slots
after the call, and that loaded value simply isn't a candidate for GVN with the
previous version. We don't have to worry about those bugs by construction.

So everything gets a lot easier under this new system.

But this commit doesn't mean we are 100% done and ready to transition to the new
system, so what is actually in here?

* CLIF producers can mark values as needing to be present in a stack map if they
are live across a safepoint in `cranelift-frontend`. This is the
`FunctionBuilder::declare_needs_stack_map` method.

* When we finalize the function we are building, we do a simple, single-pass
liveness analysis to determine the set of GC references that are live at each
safepoint, and then we insert spills to explicit stack slots just before the
safepoint. We intentionally trade away the precision of a fixed-point liveness
analysis for the speed and simplicity of a single-pass implementation.

* We annotate the safepoint with the metadata necessary to construct its
associated stack map. This is the new
`cranelift_codegen::ir::DataFlowGraph::append_user_stack_map_entry` method and
all that stuff.

* These stack map entries are part of the CLIF and can be roundtripped through
printing and parsing CLIF.

* Each stack map entry describes a GC-managed value that is on the stack and how
to locate it: its type, the stack slot it is located within, and the offset
within that stack slot where it resides. Different stack map entries for the
same safepoint may have different types or a different width from the target's
pointer.

Here is what is *not* handled yet, and left for future follow up commits:

* Lowering the stack map entries' locations from symbolic stack slot and offset
pairs to physical stack frame offsets after register allocation.

* Coalescing and aggregating the safepoints and their raw stack map entries into
a compact PC-to-stack-map table during emission.

* Supporting moving GCs. Right now we generate spills into stack slots for live
GC references just before safepoints, but we don't reload the GC references from
the stack upon their next use after the safepoint. This involves rewriting uses
of the old, spilled values which could be a little finicky, but we think we have
a good approach.

* Port Wasmtime over to using this new stack maps system.

* Removing the old stack map system, including `r{32,64}` from Cranelift and GC
reference handling from `regalloc2`. (For the time being, the new system
generally refers to "user stack maps" to disambiguate from the old system where
it might otherwise be confusing.) If we wanted to remove the old system now,
that would require us to also port Wasmtime to the new system now, and we'd end
up with a monolithic PR. Better to do this incrementally and temporarily have
the old and in-progress new system overlap for a short period of time.

Co-Authored-By: Trevor Elliott <telliott@fastly.com>
  • Loading branch information
fitzgen and elliottt committed May 31, 2024
1 parent 38bc915 commit 4d72d17
Show file tree
Hide file tree
Showing 8 changed files with 1,144 additions and 9 deletions.
29 changes: 29 additions & 0 deletions cranelift/codegen/src/ir/dfg.rs
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,7 @@ use crate::ir::builder::ReplaceBuilder;
use crate::ir::dynamic_type::{DynamicTypeData, DynamicTypes};
use crate::ir::instructions::{CallInfo, InstructionData};
use crate::ir::pcc::Fact;
use crate::ir::user_stack_maps::{UserStackMapEntry, UserStackMapEntryVec};
use crate::ir::{
types, Block, BlockCall, ConstantData, ConstantPool, DynamicType, ExtFuncData, FuncRef,
Immediate, Inst, JumpTables, RelSourceLoc, SigRef, Signature, Type, Value,
Expand Down Expand Up @@ -105,6 +106,16 @@ pub struct DataFlowGraph {
/// primary `insts` map.
results: SecondaryMap<Inst, ValueList>,

/// User-defined stack maps.
///
/// Not to be confused with the stack maps that `regalloc2` produces. These
/// are defined by the user in `cranelift-frontend`. These will eventually
/// replace the stack maps support in `regalloc2`, but in the name of
/// incrementalism and avoiding gigantic PRs that completely overhaul
/// Cranelift and Wasmtime at the same time, we are allowing them to live in
/// parallel for the time being.
user_stack_maps: alloc::collections::BTreeMap<Inst, UserStackMapEntryVec>,

/// basic blocks in the function and their parameters.
///
/// This map is not in program order. That is handled by `Layout`, and so is the sequence of
Expand Down Expand Up @@ -155,6 +166,7 @@ impl DataFlowGraph {
Self {
insts: Insts(PrimaryMap::new()),
results: SecondaryMap::new(),
user_stack_maps: alloc::collections::BTreeMap::new(),
blocks: Blocks(PrimaryMap::new()),
dynamic_types: DynamicTypes::new(),
value_lists: ValueListPool::new(),
Expand All @@ -173,6 +185,7 @@ impl DataFlowGraph {
pub fn clear(&mut self) {
self.insts.0.clear();
self.results.clear();
self.user_stack_maps.clear();
self.blocks.0.clear();
self.dynamic_types.clear();
self.value_lists.clear();
Expand Down Expand Up @@ -561,6 +574,22 @@ impl DataFlowGraph {

self.clear_results(dest_inst);
}

/// Get the stack map entries associated with the given instruction.
pub fn user_stack_map_entries(&self, inst: Inst) -> Option<&[UserStackMapEntry]> {
self.user_stack_maps.get(&inst).map(|es| &**es)
}

/// Append a new stack map entry for the given call instruction.
///
/// # Panics
///
/// Panics if the given instruction is not a (non-tail) call instruction.
pub fn append_user_stack_map_entry(&mut self, inst: Inst, entry: UserStackMapEntry) {
let opcode = self.insts[inst].opcode();
assert!(opcode.is_call() && !opcode.is_return());
self.user_stack_maps.entry(inst).or_default().push(entry);
}
}

/// Where did a value come from?
Expand Down
2 changes: 2 additions & 0 deletions cranelift/codegen/src/ir/mod.rs
Original file line number Diff line number Diff line change
Expand Up @@ -25,6 +25,7 @@ mod sourceloc;
pub mod stackslot;
mod trapcode;
pub mod types;
mod user_stack_maps;

#[cfg(feature = "enable-serde")]
use serde_derive::{Deserialize, Serialize};
Expand Down Expand Up @@ -64,6 +65,7 @@ pub use crate::ir::stackslot::{
};
pub use crate::ir::trapcode::TrapCode;
pub use crate::ir::types::Type;
pub use crate::ir::user_stack_maps::UserStackMapEntry;

use crate::entity::{entity_impl, PrimaryMap, SecondaryMap};

Expand Down
19 changes: 19 additions & 0 deletions cranelift/codegen/src/ir/user_stack_maps.rs
Original file line number Diff line number Diff line change
@@ -0,0 +1,19 @@
use crate::ir;
use smallvec::SmallVec;

pub(crate) type UserStackMapEntryVec = SmallVec<[UserStackMapEntry; 4]>;

/// A stack map entry describes a GC-managed value and its location at a
/// particular instruction.
#[derive(Clone, PartialEq, Hash)]
#[cfg_attr(feature = "enable-serde", derive(Serialize, Deserialize))]
pub struct UserStackMapEntry {
/// The type of the value stored in this stack map entry.
pub ty: ir::Type,

/// The stack slot that this stack map entry is within.
pub slot: ir::StackSlot,

/// The offset within the stack slot where this entry's value can be found.
pub offset: u32,
}
26 changes: 24 additions & 2 deletions cranelift/codegen/src/write.rs
Original file line number Diff line number Diff line change
Expand Up @@ -441,7 +441,10 @@ pub fn write_operands(w: &mut dyn Write, dfg: &DataFlowGraph, inst: Inst) -> fmt
}
Call {
func_ref, ref args, ..
} => write!(w, " {}({})", func_ref, DisplayValues(args.as_slice(pool))),
} => {
write!(w, " {}({})", func_ref, DisplayValues(args.as_slice(pool)))?;
write_user_stack_map_entries(w, dfg, inst)
}
CallIndirect {
sig_ref, ref args, ..
} => {
Expand All @@ -452,7 +455,8 @@ pub fn write_operands(w: &mut dyn Write, dfg: &DataFlowGraph, inst: Inst) -> fmt
sig_ref,
args[0],
DisplayValues(&args[1..])
)
)?;
write_user_stack_map_entries(w, dfg, inst)
}
FuncAddr { func_ref, .. } => write!(w, " {}", func_ref),
StackLoad {
Expand Down Expand Up @@ -504,6 +508,24 @@ pub fn write_operands(w: &mut dyn Write, dfg: &DataFlowGraph, inst: Inst) -> fmt
Ok(())
}

fn write_user_stack_map_entries(w: &mut dyn Write, dfg: &DataFlowGraph, inst: Inst) -> fmt::Result {
let entries = match dfg.user_stack_map_entries(inst) {
None => return Ok(()),
Some(es) => es,
};
write!(w, ", stack_map=[")?;
let mut need_comma = false;
for entry in entries {
if need_comma {
write!(w, ", ")?;
}
write!(w, "{} @ {}+{}", entry.ty, entry.slot, entry.offset)?;
need_comma = true;
}
write!(w, "]")?;
Ok(())
}

/// Displayable slice of values.
struct DisplayValues<'a>(&'a [Value]);

Expand Down
35 changes: 35 additions & 0 deletions cranelift/filetests/filetests/parser/user_stack_maps.clif
Original file line number Diff line number Diff line change
@@ -0,0 +1,35 @@
; Parser tests for user stack maps on call instructions.

test cat

function %foo() system_v {
ss0 = explicit_slot 12, align = 4
sig0 = (i32) system_v
fn0 = colocated u0:0 sig0

block0:
v0 = iconst.i32 0
v1 = iconst.i32 1
v2 = iconst.i32 2
v3 = iconst.i32 3

stack_store v0, ss0
stack_store v1, ss0+4
stack_store v2, ss0+8
call fn0(v0), stack_map=[i32 @ ss0+0, i32 @ ss0+4, i32 @ ss0+8]
; check: call fn0(v0), stack_map=[i32 @ ss0+0, i32 @ ss0+4, i32 @ ss0+8]

stack_store v1, ss0
stack_store v2, ss0+4
call fn0(v0), stack_map=[i32 @ ss0+0, i32 @ ss0+4]
; check: call fn0(v0), stack_map=[i32 @ ss0+0, i32 @ ss0+4]

stack_store v2, ss0
call fn0(v1), stack_map=[i32 @ ss0+0]
; check: call fn0(v1), stack_map=[i32 @ ss0+0]

call fn0(v2)
; check: call fn0(v2)

return
}
Loading

0 comments on commit 4d72d17

Please sign in to comment.