Skip to content

Commit

Permalink
Transition points_to to a memory (static memory) memory-type.
Browse files Browse the repository at this point in the history
  • Loading branch information
cfallin committed Oct 11, 2023
1 parent 21716db commit 1bb20ec
Show file tree
Hide file tree
Showing 13 changed files with 307 additions and 182 deletions.
117 changes: 89 additions & 28 deletions cranelift/codegen/src/ir/memtype.rs
Original file line number Diff line number Diff line change
Expand Up @@ -4,8 +4,65 @@
//! each field having a type and possibly an attached fact -- that we
//! can use in proof-carrying code to validate accesses to structs and
//! propagate facts onto the loaded values as well.
//!
//! Memory types are meant to be rich enough to describe the *layout*
//! of values in memory, but do not necessarily need to embody
//! higher-level features such as subtyping directly. Rather, they
//! should encode an implementation of a type or object system.
//!
//! Note also that it is a non-goal for now for this type system to be
//! "complete" or fully orthogonal: we have some restrictions now
//! (e.g., struct fields are only primitives) because this is all we
//! need for existing PCC applications, and it keeps the
//! implementation simpler.
//!
//! There are a few basic kinds of types:
//!
//! - A struct is an aggregate of fields and an overall size. Each
//! field has a *primitive Cranelift type*. This is for simplicity's
//! sake: we do not allow nested memory types because to do so
//! invites cycles, requires recursive computation of sizes, creates
//! complicated questions when field types are dynamically-sized,
//! and in general is more complexity than we need.
//!
//! The expectation (validated by PCC) is that when a checked load
//! or store accesses memory typed by a memory type, accesses will
//! only be to fields at offsets named in the type, and will be via
//! the given Cranelift type -- i.e., no type-punning occurs in
//! memory.
//!
//! The overall size of the struct may be larger than that implied
//! by the fields because (i) we may not want or need to name all
//! the actually-existing fields in the memory type, and (ii) there
//! may be alignment padding that we also don't want or need to
//! represent explicitly.
//!
//! - A static memory is an untyped blob of storage with a static
//! size. This is memory that can be accessed with any type of load
//! or store at any valid offset.
//!
//! Note that this is *distinct* from an "array of u8" kind of
//! representation of memory, if/when we can represent such a thing,
//! because the expectation with memory types' fields (including
//! array elements) is that they are strongly typed, only accessed
//! via that type, and not type-punned. We don't want to imply any
//! restriction on load/store size, or any actual structure, with
//! untyped memory; it's just a blob.
//!
//! Eventually we plan to also have:
//!
//! - A dynamic memory is an untyped blob of storage with a size given
//! by a global value (GV). This is otherwise just like the "static
//! memory" variant described above.
//!
//! - A dynamic array is a sequence of struct memory types, with a
//! length given by a global value (GV). This is useful to model,
//! e.g., tables.
//!
//! - A discriminated union is a union of several memory types
//! together with a tag field. This will be useful to model and
//! verify subtyping/downcasting for Wasm GC, among other uses.
use crate::ir::entities::MemoryType;
use crate::ir::pcc::Fact;
use crate::ir::Type;
use alloc::vec::Vec;
Expand All @@ -14,10 +71,17 @@ use alloc::vec::Vec;
use serde_derive::{Deserialize, Serialize};

/// Data defining a memory type.
///
/// A memory type corresponds to a layout of data in memory. It may
/// have a statically-known or dynamically-known size.
#[derive(Clone, PartialEq, Hash)]
#[cfg_attr(feature = "enable-serde", derive(Serialize, Deserialize))]
pub enum MemoryTypeData {
/// An aggregate consisting of certain fields at certain offsets.
///
/// Fields must be sorted by offset, must be within the struct's
/// overall size, and must not overlap. These conditions are
/// checked by the CLIF verifier.
Struct {
/// Size of this type.
size: u64,
Expand All @@ -26,25 +90,10 @@ pub enum MemoryTypeData {
fields: Vec<MemoryTypeField>,
},

/// An aggregate consisting of a single element repeated at a
/// certain stride, with a statically-known length (element
/// count). Layout is assumed to be contiguous, with a stride
/// equal to the element type's size (so, if the stride is greater
/// than the struct's packed size, be sure to include padding in
/// the memory-type definition).
StaticArray {
/// The element type. May be another array, a struct, etc.
element: MemoryType,

/// Number of elements.
length: u64,
},

/// A single Cranelift primitive of the given type stored in
/// memory.
Primitive {
/// The primitive type.
ty: Type,
/// A statically-sized untyped blob of memory.
Memory {
/// Accessible size.
size: u64,
},

/// A type with no size.
Expand Down Expand Up @@ -80,11 +129,8 @@ impl std::fmt::Display for MemoryTypeData {
write!(f, " }}")?;
Ok(())
}
Self::StaticArray { element, length } => {
write!(f, "static_array {element} * {length:#x}")
}
Self::Primitive { ty } => {
write!(f, "primitive {ty}")
Self::Memory { size } => {
write!(f, "memory {size:#x}")
}
Self::Empty => {
write!(f, "empty")
Expand All @@ -99,12 +145,27 @@ impl std::fmt::Display for MemoryTypeData {
pub struct MemoryTypeField {
/// The offset of this field in the memory type.
pub offset: u64,
/// The type of the value in this field. Accesses to the field
/// must use this type (i.e., cannot bitcast/type-pun in memory).
pub ty: MemoryType,
/// The primitive type of the value in this field. Accesses to the
/// field must use this type (i.e., cannot bitcast/type-pun in
/// memory).
pub ty: Type,
/// A proof-carrying-code fact about this value, if any.
pub fact: Option<Fact>,
/// Whether this field is read-only, i.e., stores should be
/// disallowed.
pub readonly: bool,
}

impl MemoryTypeData {
/// Provide the static size of this type, if known.
///
/// (The size may not be known for dynamically-sized arrays or
/// memories, when those memtype kinds are added.)
pub fn static_size(&self) -> Option<u64> {
match self {
Self::Struct { size, .. } => Some(*size),
Self::Memory { size } => Some(*size),
Self::Empty => Some(0),
}
}
}
114 changes: 48 additions & 66 deletions cranelift/codegen/src/ir/pcc.rs
Original file line number Diff line number Diff line change
Expand Up @@ -59,6 +59,7 @@
use crate::ir;
use crate::isa::TargetIsa;
use crate::machinst::{InsnIndex, LowerBackend, MachInst, VCode};
use cranelift_entity::PrimaryMap;
use std::fmt;

#[cfg(feature = "enable-serde")]
Expand Down Expand Up @@ -111,13 +112,6 @@ pub enum Fact {
max: u64,
},

/// A pointer value to a memory region that can be accessed.
PointsTo {
/// A description of the memory region this pointer is allowed
/// to access (size, etc).
region: MemoryRegion,
},

/// A pointer to a memory type.
Mem {
/// The memory type.
Expand All @@ -127,24 +121,10 @@ pub enum Fact {
},
}

/// A memory region that can be accessed. This description is attached
/// to a particular base pointer.
#[derive(Clone, Debug, Hash, PartialEq, Eq)]
#[cfg_attr(feature = "enable-serde", derive(Serialize, Deserialize))]
pub struct MemoryRegion {
/// Includes both the actual memory bound as well as any guard
/// pages. Inclusive, so we can represent the full range of a
/// `u64`. The range is unsigned.
pub max: u64,
}

impl fmt::Display for Fact {
fn fmt(&self, f: &mut fmt::Formatter) -> fmt::Result {
match self {
Fact::ValueMax { bit_width, max } => write!(f, "max({}, {:#x})", bit_width, max),
Fact::PointsTo {
region: MemoryRegion { max },
} => write!(f, "points_to(0x{:x})", max),
Fact::Mem { ty, offset } => write!(f, "mem({}, {:#x})", ty, offset),
}
}
Expand All @@ -167,20 +147,29 @@ macro_rules! bail {
/// A "context" in which we can evaluate and derive facts. This
/// context carries environment/global properties, such as the machine
/// pointer width.
#[derive(Debug)]
pub struct FactContext {
pub struct FactContext<'a> {
memory_types: &'a PrimaryMap<ir::MemoryType, ir::MemoryTypeData>,
pointer_width: u16,
}

impl FactContext {
impl<'a> FactContext<'a> {
/// Create a new "fact context" in which to evaluate facts.
pub fn new(pointer_width: u16) -> Self {
FactContext { pointer_width }
pub fn new(
memory_types: &'a PrimaryMap<ir::MemoryType, ir::MemoryTypeData>,
pointer_width: u16,
) -> Self {
FactContext {
memory_types,
pointer_width,
}
}

/// Computes whether `lhs` "subsumes" (implies) `rhs`.
pub fn subsumes(&self, lhs: &Fact, rhs: &Fact) -> bool {
match (lhs, rhs) {
// Reflexivity.
(l, r) if l == r => true,

(
Fact::ValueMax {
bit_width: bw_lhs,
Expand All @@ -201,22 +190,7 @@ impl FactContext {
// possible value range.
bw_lhs == bw_rhs && max_lhs <= max_rhs
}
(
Fact::PointsTo {
region: MemoryRegion { max: max_lhs },
},
Fact::PointsTo {
region: MemoryRegion { max: max_rhs },
},
) => {
// If the pointer is valid up to `max_lhs`, and
// `max_rhs` is less than or equal to `max_lhs`, then
// it is certainly valid up to `max_rhs`.
//
// In other words, we can always shrink the valid
// addressable region.
max_rhs <= max_lhs
}

_ => false,
}
}
Expand Down Expand Up @@ -250,19 +224,17 @@ impl FactContext {
bit_width: bw_max,
max,
},
Fact::PointsTo { region },
Fact::Mem { ty, offset },
)
| (
Fact::PointsTo { region },
Fact::Mem { ty, offset },
Fact::ValueMax {
bit_width: bw_max,
max,
},
) if *bw_max >= self.pointer_width && add_width >= *bw_max => {
let region = MemoryRegion {
max: region.max.checked_sub(*max)?,
};
Some(Fact::PointsTo { region })
let offset = offset.checked_add(i64::try_from(*max).ok()?)?;
Some(Fact::Mem { ty: *ty, offset })
}

_ => None,
Expand Down Expand Up @@ -343,15 +315,15 @@ impl FactContext {

/// Offsets a value with a fact by a known amount.
pub fn offset(&self, fact: &Fact, width: u16, offset: i64) -> Option<Fact> {
// If we eventually support two-sided ranges, we can
// represent (0..n) + m -> ((0+m)..(n+m)). However,
// right now, all ranges start with zero, so any
// negative offset could underflow, and removes all
// claims of constrained range.
let offset = u64::try_from(offset).ok()?;

match fact {
Fact::ValueMax { bit_width, max } if *bit_width == width => {
// If we eventually support two-sided ranges, we can
// represent (0..n) + m -> ((0+m)..(n+m)). However,
// right now, all ranges start with zero, so any
// negative offset could underflow, and removes all
// claims of constrained range.
let offset = u64::try_from(offset).ok()?;

let max = match max.checked_add(offset) {
Some(max) => max,
None => {
Expand All @@ -367,13 +339,12 @@ impl FactContext {
max,
})
}
Fact::PointsTo {
region: MemoryRegion { max },
Fact::Mem {
ty,
offset: mem_offset,
} => {
let max = max.checked_sub(offset)?;
Some(Fact::PointsTo {
region: MemoryRegion { max },
})
let offset = mem_offset.checked_sub(offset)?;
Some(Fact::Mem { ty: *ty, offset })
}
_ => None,
}
Expand All @@ -383,9 +354,20 @@ impl FactContext {
/// a memory access of the given size, is valid.
pub fn check_address(&self, fact: &Fact, size: u32) -> PccResult<()> {
match fact {
Fact::PointsTo {
region: MemoryRegion { max },
} => ensure!(u64::from(size) <= *max, OutOfBounds),
Fact::Mem { ty, offset } => {
let end_offset: i64 = offset
.checked_add(i64::from(size))
.ok_or(PccError::Overflow)?;
let end_offset: u64 =
u64::try_from(end_offset).map_err(|_| PccError::OutOfBounds)?;
match &self.memory_types[*ty] {
ir::MemoryTypeData::Struct { size, .. }
| ir::MemoryTypeData::Memory { size } => {
ensure!(end_offset <= *size, OutOfBounds)
}
ir::MemoryTypeData::Empty => bail!(OutOfBounds),
}
}
_ => bail!(OutOfBounds),
}

Expand All @@ -405,7 +387,7 @@ fn max_value_for_width(bits: u16) -> u64 {
/// Top-level entry point after compilation: this checks the facts in
/// VCode.
pub fn check_vcode_facts<B: LowerBackend + TargetIsa>(
_f: &ir::Function,
f: &ir::Function,
vcode: &VCode<B::MInst>,
backend: &B,
) -> PccResult<()> {
Expand All @@ -417,7 +399,7 @@ pub fn check_vcode_facts<B: LowerBackend + TargetIsa>(
// verify. We'll call into the backend to validate this
// fact with respect to the instruction and the input
// facts.
backend.check_fact(&vcode[inst], vcode)?;
backend.check_fact(&vcode[inst], f, vcode)?;
}
}
Ok(())
Expand Down
11 changes: 8 additions & 3 deletions cranelift/codegen/src/isa/aarch64/lower.rs
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,7 @@
use crate::ir::condcodes::{FloatCC, IntCC};
use crate::ir::pcc::PccResult;
use crate::ir::Inst as IRInst;
use crate::ir::{Opcode, Value};
use crate::ir::{Function, Opcode, Value};
use crate::isa::aarch64::inst::*;
use crate::isa::aarch64::pcc;
use crate::isa::aarch64::AArch64Backend;
Expand Down Expand Up @@ -131,7 +131,12 @@ impl LowerBackend for AArch64Backend {
Some(regs::pinned_reg())
}

fn check_fact(&self, inst: &Self::MInst, vcode: &VCode<Self::MInst>) -> PccResult<()> {
pcc::check(inst, vcode)
fn check_fact(
&self,
inst: &Self::MInst,
function: &Function,
vcode: &VCode<Self::MInst>,
) -> PccResult<()> {
pcc::check(inst, function, vcode)
}
}
Loading

0 comments on commit 1bb20ec

Please sign in to comment.