-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Verification of addressing / memory accesses with VeriWasm or similar #6090
Comments
Hi! My approach follows what veriwasm and your prototype do: I build 'symbolic values' that model address computations/bounds checks and solve inequalities on them. As you have mentioned above, that is quite complex and not super scalable. I'm very excited to see how your PCC approach will differ from what I've built. :) |
Woah, that's really neat -- I wish I had known about this earlier! Looking it over, I definitely agree that It's really cool to see a working design point that can validate dynamic memories as well; thanks for this and I'll study it in more detail! |
I've been working on VeriWasm in Wasmtime the last few weeks, and decided to dump a summary of my investigation so far and ideas for the right approach to take. (I'm doing this partly because I've been asked to context switch back to something else, now that it's clear that VeriWasm proper cannot work and we need a new tool; I estimate it would take ~1 month to build what I describe in the last section below.)
(This may be better as an RFC, and I can clean it up, expand it, and post it as one once I have time to properly do so!)
Summary
We should include built-in support in Wasmtime to verify that compiled code from Wasm modules is accessing only the memory that it is supposed to access. Such verification would take the form of ahead-of-time static analysis on the code, so that it would not require dynamic checks; and it should trust as little as possible in the existing compiler stack, to increase the confidence we have in its conclusions.
The ideal would be to verify machine code bytes with no other input, but in practice, Wasmtime will have to tell the tool what to assume about details such as runtime data structures; and certain kinds of analysis, like full recovery of a CFG in the presence of indirect branches, may require whole-program analysis that is too expensive to be practical.
This issue proposes a scheme that embeds a notion of "memory capabilities" in CLIF, along with (static) assertions on values (mostly with respect to ranges), and carries these through to the VCode level in Cranelift. The Wasm lowering would be modified to annotate every load/store with an associated memory capability that permits it; and we can then check the uses of these capabilities at the other end of the pipeline, after instruction selection is done. This is inspired by, but in some sense the dual to, the approach taken by VeriWasm.
History: VeriWasm
In Johnson et al.1, the VeriWasm tool is described, with an approach that performs a static analysis of disassembled machine code to prove that all heap accesses in AOT-compiled Wasm code are properly to the Wasm heap, and cannot escape the sandbox. (The tool also checks other properties, such as control-flow integrity, but that is outside the scope of this issue.)
VeriWasm was designed to verify code produced by Lucet, another AOT Wasm compiler that has since been subsumed by Wasmtime and EOL'd.
VeriWasm works by lifting machine code to a simple IR that describes the semantics of the code in a partial way -- recovering just enough of the dataflow to be able to see all address computations -- and then performing iterative dataflow analysis over that IR using an analysis lattice.
The analysis values computed to describe program values include:
HeapBase
: this value is the heap base;Bounded4GB
: this value is less than 2^32;The basic idea of the analysis is to mark a result of any 32-bit instruction as
Bounded4GB
in the abstract interpretation, and then allow only memory accesses of the form[HeapBase + Bounded4GB]
.Issues with porting VeriWasm to Wasmtime
The obvious next step is to port VeriWasm to work with Wasmtime instead of Lucet. To understand why this is difficult, it may be useful to know how Lucet's generated code differed from Wasmtime's. Both use Cranelift, but there are two key deltas:
Lucet used simpler data structures: its implicit first argument to every Wasm function was a pointer to the start of the (one single) Wasm memory, and this memory was always statically-bounded; there was no separate
vmctx
, and no multi-memory or multi-table support. Furthermore Lucet was normally used (and VeriWasm only supported) static bounds-checking, i.e., use of guard regions up to 4GiB heap size.The version of Cranelift used at the time that Lucet and VeriWasm were developed was much simpler than what we have today. In particular, the address-mode support was fairly primitive and predictable: heap accesses were always of the form
[rBase + rOffset]
, whererBase
was the heap base andrOffset
was the offset (into the Wasm heap), computed separately. In contrast, Wasmtime uses modern Cranelift which can fold more work into addressing modes and perform more clever instruction selection. It also includes Spectre-guard logic, which is difficult to analyze from first principles without pattern-matching the whole thing.Taken together, these two facts mean that we need a much more general analysis. In principle, we need to be able to:
Example
Here is the simplest possible dynamic bounds-check example, compiled for x86-64 by Wasmtime, with annotations:
This is compiled from
with
wasmtime compile --dynamic-memory-guard-size 0 --static-memory-guard-size 0 --static-memory-maximum-size 0 test.wat
.There are several things going on here that we need to be able to analyze:
Bounded4GB
from VeriWasm's Lucet analysis);add
;rcx
being zero is significant later (so, track either aZero
state or all known constant values);rax
(which up to this point is just an arbitrary 32-bit value in the program), and join this with thecmova
below ti understand the "clamp to zero if out of bounds" idiom; then represent "pointer to this offset in heap, unless this offset plus 4 is above this other value"... symbolicallyr8
at offset 0x28 is legal.The basic issue here is that we need to (i) build up a massive body of facts about all register values in the program, and (ii) only later pattern-match uses of certain combinations of them to generate legal pointers into dynamically-bounded memory regions. So we need to generally symoblically summarize expressions, and we need some algebraic language of additions, scaling operations, and bounding operations that can be simplified, normalized, and used to do this recognition.
Dead-end Prototype: Symbolic Expressions
In this branch of VeriWasm (which goes with this branch of Wasmtime and this branch of
yaxpeax-core
), I prototyped a system that basically attempted the above. It defined an analysis lattice that could represent additions, scaling (multiplication by a constant),umin
(unsigned min, a bounding operator, the basic nonlinear element), loads, known constant values, and a few miscellaneous atoms likevmctx
, "some pointer into.text
", andUnknown
.The analysis used a set of rewrite rules to try to simplify expressions: e.g., an arbitrary 32-bit instruction might produce
UMin(Unknown, Const(0xffffffff))
, then adding4
might produceAdd(Const(4), UMin(Unknown, Const(0xffffffff)))
which simplifies toUMin(Unknown, Const(0x100000003))
becauseAdd
distributes overUMin
. (Wrapping is handled later, when we analyze ranges; the operators are defined not to wrap, because otherwise one needs to handle arbitrary piecewise segments rather than ranges and it just gets too messy.)The Wasmtime branch above was modified to generate expressions in this expression language that signify memory regions that are allowed to be accessed. This is then used alongside the analysis results in the above symbolic-expression form, with an analysis that computes the affine bound (constant, or base plus some scaled symbolic dynamic parameter), to try to prove each memory access safe.
This more-or-less works (not quite in the above branch, but in principle can be fixed) for static memories, but falls down completely for dynamically-bounded structures, either heaps or tables. The basic reason is that to do the in-bounds proof, one basically needs to build a full computer algebra system that can manipulate the symbolic expressions above in order to solve systems of inequalities. This is not only far too complex for a purpose-built tool that we need to trust, but is also very unlikely to be scalable.
Fixing the Approach
I believe we can fix the verifier by using two basic realizations:
The above is a kind of "forward search", in that it eagerly generates all possible facts about all values in the program, and then later tries to find a path through those facts to a proof that an address is safe.
This is a structure that is forced by the lattice-based dataflow analysis (which works in the forward direction).
This eager-expansion approach is also forced by the lack of any guidance from the compiler: even though when building the CLIF,
cranelift-wasm
knows what it is doing and what ranges values should have with respect to bounds checks, the analyzer intentionally tries to derive these facts from first principles (ostensibly to exclude the compiler from the TCB, but this also happens to make the problem intractible!)Instead, we could try an approach that lets the compiler leave us hints about what facts we should be deriving and carrying forward about each value; and mark only those values that are important for address computation. This doesn't have to put the comiler in the TCB; rather, the hints are a sort of assertion or ascription that is checked. (This is e.g. the difference between a proof search tool and a proof verifier.)
In a phrase, the approach I want to propose is proof-carrying code.
Proposal: Memory Capabilities and Proof-Carrying Code
The general approach is in four steps::
Describe the areas of memory that the code is allowed to access, as part of the CLIF function body of any function (verified by this framework at least; this will not be required by default for all functions). These are essentially capabilities: they describe a set of conditions (address must satisfy these properties) and in return grant permission to load or store. Loads and stores in the CLIF can then be annotated with these capabilities.
vmctx[0..sizeof(vmctx_t)]
; another capability, chaining off of that, to access(*(vmctx+8))[0..*(vmctx+16)]
(i.e., a dynamically-bounded memory with base and length fields in the vmctx struct); and another for a table.Add assertions on certain SSA values that describe their ranges. These should be placed at defs (operators or blockparams) of the SSA values, such that validation can be done in linear time, without an iterative dataflow analysis. We should write a set of semantics for some operators sufficient to validate the kinds of assertions we want to make. For example, if we know
v0 < 10
andv1 < 10
, givenv2 = iadd v0, v1
the framework should be able to validate the assertion thatv2 < 20
. Perhaps we also have some easy implicit ones, for efficiency: e.g., everyi32
-typed value is implicitly<= u32::MAX
.We then carry these assertions through to the VCode. This is simpler than it may at first seem, IMHO: we know already that a given vreg corresponds to a given SSA value, so we "just" need to transcribe the annotations onto the vregs. The egraph optimization infrastructure will need to replicate annotations as it clones nodes during elaboration, and rewrite rules that could affect address computation will need to create new assertions, but this is necessary work: of course it is the case that we should need to prove that our handling of
iadd
oruextend
on an address is correct!We then check the VCode, prior to register allocation: validate each assertion, and confirm that memory accesses use legal capabilities. (The ABI code will need to hold an ambient capability for the stack, unless we special-case those instructions.) Checking prior to register allocation is important because it allows us to leverage SSA to be flow-insensitive in our analysis: we don't need to track "
v1
at this program point" as we evaluate an assertion, but onlyv1
at all.Footnotes
Evan Johnson et al. "Довер ́яй, но провер ́яй: SFI Safety for native compiled Wasm." In NDSS 2021. https://cseweb.ucsd.edu/~lerner/papers/wasm-sfi-ndss2021.pdf ↩
The text was updated successfully, but these errors were encountered: