Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cranelift: Do not dedupe/GVN bitcasts from reference values #8317

Merged

Conversation

fitzgen
Copy link
Member

@fitzgen fitzgen commented Apr 8, 2024

Deduping bitcasts to integers from references can make the references no long
longer live across safepoints, and instead only the bitcasted integer results
would be. Because the reference is no longer live after the safepoint, the
safepoint's stack map would not have an entry for the reference, which could
result in the collector reclaiming an object too early, which is basically a
use-after-free bug. Luckily, we sandbox the GC heap now, so such UAF bugs aren't
memory unsafe, but they could potentially result in denial of service
attacks. Either way, we don't want those bugs!

On the other hand, it is technically fine to dedupe bitcasts to reference
types. Doing so extends, rather than shortens, the live range of the GC
reference. This potentially adds it to more stack maps than it otherwise would
have been in, which means it might unnecessarily survive a GC it otherwise
wouldn't have. But that is fine. Shrinking live ranges of GC references, and
removing them from stack maps they otherwise should have been in, is the
problematic transformation.

fitzgen added 2 commits April 8, 2024 09:35
Deduping bitcasts to integers from references can make the references no long
longer live across safepoints, and instead only the bitcasted integer results
would be. Because the reference is no longer live after the safepoint, the
safepoint's stack map would not have an entry for the reference, which could
result in the collector reclaiming an object too early, which is basically a
use-after-free bug. Luckily, we sandbox the GC heap now, so such UAF bugs aren't
memory unsafe, but they could potentially result in denial of service
attacks. Either way, we don't want those bugs!

On the other hand, it is technically fine to dedupe bitcasts *to* reference
types. Doing so extends, rather than shortens, the live range of the GC
reference. This potentially adds it to more stack maps than it otherwise would
have been in, which means it might unnecessarily survive a GC it otherwise
wouldn't have. But that is fine. Shrinking live ranges of GC references, and
removing them from stack maps they otherwise should have been in, is the
problematic transformation.
@fitzgen fitzgen requested review from a team as code owners April 8, 2024 19:58
@fitzgen fitzgen requested review from elliottt and alexcrichton and removed request for a team April 8, 2024 19:58
@github-actions github-actions bot added cranelift Issues related to the Cranelift code generator cranelift:area:machinst Issues related to instruction selection and the new MachInst backend. wasmtime:api Related to the API of the `wasmtime` crate itself labels Apr 8, 2024
Copy link

github-actions bot commented Apr 8, 2024

Subscribe to Label Action

cc @peterhuene

This issue or pull request has been labeled: "cranelift", "cranelift:area:machinst", "wasmtime:api"

Thus the following users have been cc'd because of the following labels:

  • peterhuene: wasmtime:api

To subscribe or unsubscribe from this label, edit the .github/subscribe-to-label.json configuration file.

Learn more.

Copy link
Member

@elliottt elliottt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense to me!

@fitzgen fitzgen added this pull request to the merge queue Apr 8, 2024
Merged via the queue into bytecodealliance:main with commit 1721fe3 Apr 8, 2024
19 checks passed
@fitzgen fitzgen deleted the dont-optimize-bitcast-from-r64 branch April 8, 2024 23:08
alexcrichton pushed a commit to alexcrichton/wasmtime that referenced this pull request Apr 11, 2024
…alliance#8317)

* Cranelift: Do not dedupe/GVN bitcasts from reference values

Deduping bitcasts to integers from references can make the references no long
longer live across safepoints, and instead only the bitcasted integer results
would be. Because the reference is no longer live after the safepoint, the
safepoint's stack map would not have an entry for the reference, which could
result in the collector reclaiming an object too early, which is basically a
use-after-free bug. Luckily, we sandbox the GC heap now, so such UAF bugs aren't
memory unsafe, but they could potentially result in denial of service
attacks. Either way, we don't want those bugs!

On the other hand, it is technically fine to dedupe bitcasts *to* reference
types. Doing so extends, rather than shortens, the live range of the GC
reference. This potentially adds it to more stack maps than it otherwise would
have been in, which means it might unnecessarily survive a GC it otherwise
wouldn't have. But that is fine. Shrinking live ranges of GC references, and
removing them from stack maps they otherwise should have been in, is the
problematic transformation.

* Add additional logging and debug asserts for GC stuff
alexcrichton added a commit that referenced this pull request Apr 11, 2024
* cranelift: Include clobbers and outgoing args in stack limit (#8301)

When we compute the amount of space that we need in a stack frame for
the stack limit check, we were only counting spill-slots and explicit
stack-slots. However, we need to account for all uses of the stack which
occur before the next stack limit check. That includes clobbers and any
stack arguments we want to pass to callees.

The maximum amount that we could have missed by is essentially bounded
by the number of arguments which could be passed to a function. In
Wasmtime, that is limited by `MAX_WASM_FUNCTION_PARAMS` in
`wasmparser::limits`, which is set to 1,000, and the largest arguments
are 16-byte vectors, so this could undercount by about 16kB.

This is not a security issue according to Wasmtime's security policy
(https://docs.wasmtime.dev/security-what-is-considered-a-security-vulnerability.html)
because it's the embedder's responsibility to ensure that the stack
where Wasmtime is running has enough extra space on top of the
configured `max_wasm_stack` size, and getting within 16kB of the host
stack size is too small to be safe even with this fixed.

However, this was definitely not the intended behavior when stack limit
checks or stack probes are enabled, and anyone with non-default
configurations or non-Wasmtime uses of Cranelift should evaluate whether
this bug impacts your use case.

(For reference: When Wasmtime is used in async mode or on Linux, the
default stack size is 1.5MB larger than the default WebAssembly stack
limit, so such configurations are typically safe regardless. On the
other hand, on macOS the default non-async stack size for threads other
than the main thread is the same size as the default for
`max_wasm_stack`, so that is too small with or without this bug fix.)

* fix: bindgen trappable_errors using unversion/versioned packages (#8305)

Signed-off-by: Brian H <brian.hardock@fermyon.com>

* Cranelift: Do not dedupe/GVN bitcasts from reference values (#8317)

* Cranelift: Do not dedupe/GVN bitcasts from reference values

Deduping bitcasts to integers from references can make the references no long
longer live across safepoints, and instead only the bitcasted integer results
would be. Because the reference is no longer live after the safepoint, the
safepoint's stack map would not have an entry for the reference, which could
result in the collector reclaiming an object too early, which is basically a
use-after-free bug. Luckily, we sandbox the GC heap now, so such UAF bugs aren't
memory unsafe, but they could potentially result in denial of service
attacks. Either way, we don't want those bugs!

On the other hand, it is technically fine to dedupe bitcasts *to* reference
types. Doing so extends, rather than shortens, the live range of the GC
reference. This potentially adds it to more stack maps than it otherwise would
have been in, which means it might unnecessarily survive a GC it otherwise
wouldn't have. But that is fine. Shrinking live ranges of GC references, and
removing them from stack maps they otherwise should have been in, is the
problematic transformation.

* Add additional logging and debug asserts for GC stuff

* Handle out-of-bounds component sections (#8323)

* Handle out-of-bounds component sections

Fixes #8322

* Add a test that trancated component binaries don't cause panics

---------

Signed-off-by: Brian H <brian.hardock@fermyon.com>
Co-authored-by: Jamey Sharp <jsharp@fastly.com>
Co-authored-by: Brian <brian.hardock@fermyon.com>
Co-authored-by: Nick Fitzgerald <fitzgen@gmail.com>
fitzgen added a commit to fitzgen/wasmtime that referenced this pull request Jun 1, 2024
Tracking GC references and producing stack maps is a significant amount of
complexity in `regalloc2`.

At the same time, GC reference value types are pretty annoying to deal with in
Cranelift itself. We know our `r64` is "actually" just an `i64` pointer, and we
want to do `i64`-y things with it, such as an `iadd` to compute a derived
pointer, but `iadd` only takes integer types and not `r64`s. We investigated
loosening that restriction and it was way too painful given the way that CLIF
type inference and its controlling type vars work. So to compute those derived
pointers, we have to first `bitcast` the `r64` into an `i64`. This is
unfortunate in two ways. First, because of arcane interactions between register
allocation constraints, stack maps, and ABIs this involves inserting unnecessary
register-to-register moves in our generated code which hurts binary size and
performance ever so slightly. Second, and much more seriously, this is a serious
footgun. If a GC reference isn't an `r64` right now, then it will not appear in
stack maps, and failure to record a live GC reference in a stack map means that
the collector could reclaim the object while you are still using it, leading to
use-after-free bugs! Very bad. And the mid-end needs to know
*not* to GVN these bitcasts or else we get similar bugs (see
bytecodealliance#8317).

Overall GC references are a painful situation for us today.

This commit is the introduction of an alternative. (Note, though, that we aren't
quite ready to remove the old stack maps infrastructure just yet.)

Instead of preserving GC references all the way through the whole pipeline and
computing live GC references and inserting spills at safepoints for stack maps
all the way at the end of that pipeline in register allocation, the
CLIF-producing frontend explicitly generates its own stack slots and spills for
safepoints. The only thing the rest of the compiler pipeline needs to know is
the metadata required to produce the stack map for the associated safepoint. We
can completely remove `r32` and `r64` from Cranelift and just use plain `i32`
and `i64` values. Or `f64` if the runtime uses NaN-boxing, which the old stack
maps system did not support at all. Or 32-bit GC references on a 64-bit target,
which was also not supported by the old system. Furthermore, we *cannot* get
miscompiles due to GVN'ing bitcasts that shouldn't be GVN'd because there aren't
any bitcasts hiding GC references from stack maps anymore. And in the case of a
moving GC, we don't need to worry about the mid-end doing illegal code motion
across calls that could have triggered a GC that invalidated the moved GC
reference because frontends will reload their GC references from the stack slots
after the call, and that loaded value simply isn't a candidate for GVN with the
previous version. We don't have to worry about those bugs by construction.

So everything gets a lot easier under this new system.

But this commit doesn't mean we are 100% done and ready to transition to the new
system, so what is actually in here?

* CLIF producers can mark values as needing to be present in a stack map if they
are live across a safepoint in `cranelift-frontend`. This is the
`FunctionBuilder::declare_needs_stack_map` method.

* When we finalize the function we are building, we do a simple, single-pass
liveness analysis to determine the set of GC references that are live at each
safepoint, and then we insert spills to explicit stack slots just before the
safepoint. We intentionally trade away the precision of a fixed-point liveness
analysis for the speed and simplicity of a single-pass implementation.

* We annotate the safepoint with the metadata necessary to construct its
associated stack map. This is the new
`cranelift_codegen::ir::DataFlowGraph::append_user_stack_map_entry` method and
all that stuff.

* These stack map entries are part of the CLIF and can be roundtripped through
printing and parsing CLIF.

* Each stack map entry describes a GC-managed value that is on the stack and how
to locate it: its type, the stack slot it is located within, and the offset
within that stack slot where it resides. Different stack map entries for the
same safepoint may have different types or a different width from the target's
pointer.

Here is what is *not* handled yet, and left for future follow up commits:

* Lowering the stack map entries' locations from symbolic stack slot and offset
pairs to physical stack frame offsets after register allocation.

* Coalescing and aggregating the safepoints and their raw stack map entries into
a compact PC-to-stack-map table during emission.

* Supporting moving GCs. Right now we generate spills into stack slots for live
GC references just before safepoints, but we don't reload the GC references from
the stack upon their next use after the safepoint. This involves rewriting uses
of the old, spilled values which could be a little finicky, but we think we have
a good approach.

* Port Wasmtime over to using this new stack maps system.

* Removing the old stack map system, including `r{32,64}` from Cranelift and GC
reference handling from `regalloc2`. (For the time being, the new system
generally refers to "user stack maps" to disambiguate from the old system where
it might otherwise be confusing.) If we wanted to remove the old system now,
that would require us to also port Wasmtime to the new system now, and we'd end
up with a monolithic PR. Better to do this incrementally and temporarily have
the old and in-progress new system overlap for a short period of time.

Co-Authored-By: Trevor Elliott <telliott@fastly.com>
fitzgen added a commit to fitzgen/wasmtime that referenced this pull request Jun 6, 2024
Tracking GC references and producing stack maps is a significant amount of
complexity in `regalloc2`.

At the same time, GC reference value types are pretty annoying to deal with in
Cranelift itself. We know our `r64` is "actually" just an `i64` pointer, and we
want to do `i64`-y things with it, such as an `iadd` to compute a derived
pointer, but `iadd` only takes integer types and not `r64`s. We investigated
loosening that restriction and it was way too painful given the way that CLIF
type inference and its controlling type vars work. So to compute those derived
pointers, we have to first `bitcast` the `r64` into an `i64`. This is
unfortunate in two ways. First, because of arcane interactions between register
allocation constraints, stack maps, and ABIs this involves inserting unnecessary
register-to-register moves in our generated code which hurts binary size and
performance ever so slightly. Second, and much more seriously, this is a serious
footgun. If a GC reference isn't an `r64` right now, then it will not appear in
stack maps, and failure to record a live GC reference in a stack map means that
the collector could reclaim the object while you are still using it, leading to
use-after-free bugs! Very bad. And the mid-end needs to know
*not* to GVN these bitcasts or else we get similar bugs (see
bytecodealliance#8317).

Overall GC references are a painful situation for us today.

This commit is the introduction of an alternative. (Note, though, that we aren't
quite ready to remove the old stack maps infrastructure just yet.)

Instead of preserving GC references all the way through the whole pipeline and
computing live GC references and inserting spills at safepoints for stack maps
all the way at the end of that pipeline in register allocation, the
CLIF-producing frontend explicitly generates its own stack slots and spills for
safepoints. The only thing the rest of the compiler pipeline needs to know is
the metadata required to produce the stack map for the associated safepoint. We
can completely remove `r32` and `r64` from Cranelift and just use plain `i32`
and `i64` values. Or `f64` if the runtime uses NaN-boxing, which the old stack
maps system did not support at all. Or 32-bit GC references on a 64-bit target,
which was also not supported by the old system. Furthermore, we *cannot* get
miscompiles due to GVN'ing bitcasts that shouldn't be GVN'd because there aren't
any bitcasts hiding GC references from stack maps anymore. And in the case of a
moving GC, we don't need to worry about the mid-end doing illegal code motion
across calls that could have triggered a GC that invalidated the moved GC
reference because frontends will reload their GC references from the stack slots
after the call, and that loaded value simply isn't a candidate for GVN with the
previous version. We don't have to worry about those bugs by construction.

So everything gets a lot easier under this new system.

But this commit doesn't mean we are 100% done and ready to transition to the new
system, so what is actually in here?

* CLIF producers can mark values as needing to be present in a stack map if they
are live across a safepoint in `cranelift-frontend`. This is the
`FunctionBuilder::declare_needs_stack_map` method.

* When we finalize the function we are building, we do a simple, single-pass
liveness analysis to determine the set of GC references that are live at each
safepoint, and then we insert spills to explicit stack slots just before the
safepoint. We intentionally trade away the precision of a fixed-point liveness
analysis for the speed and simplicity of a single-pass implementation.

* We annotate the safepoint with the metadata necessary to construct its
associated stack map. This is the new
`cranelift_codegen::ir::DataFlowGraph::append_user_stack_map_entry` method and
all that stuff.

* These stack map entries are part of the CLIF and can be roundtripped through
printing and parsing CLIF.

* Each stack map entry describes a GC-managed value that is on the stack and how
to locate it: its type, the stack slot it is located within, and the offset
within that stack slot where it resides. Different stack map entries for the
same safepoint may have different types or a different width from the target's
pointer.

Here is what is *not* handled yet, and left for future follow up commits:

* Lowering the stack map entries' locations from symbolic stack slot and offset
pairs to physical stack frame offsets after register allocation.

* Coalescing and aggregating the safepoints and their raw stack map entries into
a compact PC-to-stack-map table during emission.

* Supporting moving GCs. Right now we generate spills into stack slots for live
GC references just before safepoints, but we don't reload the GC references from
the stack upon their next use after the safepoint. This involves rewriting uses
of the old, spilled values which could be a little finicky, but we think we have
a good approach.

* Port Wasmtime over to using this new stack maps system.

* Removing the old stack map system, including `r{32,64}` from Cranelift and GC
reference handling from `regalloc2`. (For the time being, the new system
generally refers to "user stack maps" to disambiguate from the old system where
it might otherwise be confusing.) If we wanted to remove the old system now,
that would require us to also port Wasmtime to the new system now, and we'd end
up with a monolithic PR. Better to do this incrementally and temporarily have
the old and in-progress new system overlap for a short period of time.

Co-Authored-By: Trevor Elliott <telliott@fastly.com>
fitzgen added a commit to fitzgen/wasmtime that referenced this pull request Jun 6, 2024
Tracking GC references and producing stack maps is a significant amount of
complexity in `regalloc2`.

At the same time, GC reference value types are pretty annoying to deal with in
Cranelift itself. We know our `r64` is "actually" just an `i64` pointer, and we
want to do `i64`-y things with it, such as an `iadd` to compute a derived
pointer, but `iadd` only takes integer types and not `r64`s. We investigated
loosening that restriction and it was way too painful given the way that CLIF
type inference and its controlling type vars work. So to compute those derived
pointers, we have to first `bitcast` the `r64` into an `i64`. This is
unfortunate in two ways. First, because of arcane interactions between register
allocation constraints, stack maps, and ABIs this involves inserting unnecessary
register-to-register moves in our generated code which hurts binary size and
performance ever so slightly. Second, and much more seriously, this is a serious
footgun. If a GC reference isn't an `r64` right now, then it will not appear in
stack maps, and failure to record a live GC reference in a stack map means that
the collector could reclaim the object while you are still using it, leading to
use-after-free bugs! Very bad. And the mid-end needs to know
*not* to GVN these bitcasts or else we get similar bugs (see
bytecodealliance#8317).

Overall GC references are a painful situation for us today.

This commit is the introduction of an alternative. (Note, though, that we aren't
quite ready to remove the old stack maps infrastructure just yet.)

Instead of preserving GC references all the way through the whole pipeline and
computing live GC references and inserting spills at safepoints for stack maps
all the way at the end of that pipeline in register allocation, the
CLIF-producing frontend explicitly generates its own stack slots and spills for
safepoints. The only thing the rest of the compiler pipeline needs to know is
the metadata required to produce the stack map for the associated safepoint. We
can completely remove `r32` and `r64` from Cranelift and just use plain `i32`
and `i64` values. Or `f64` if the runtime uses NaN-boxing, which the old stack
maps system did not support at all. Or 32-bit GC references on a 64-bit target,
which was also not supported by the old system. Furthermore, we *cannot* get
miscompiles due to GVN'ing bitcasts that shouldn't be GVN'd because there aren't
any bitcasts hiding GC references from stack maps anymore. And in the case of a
moving GC, we don't need to worry about the mid-end doing illegal code motion
across calls that could have triggered a GC that invalidated the moved GC
reference because frontends will reload their GC references from the stack slots
after the call, and that loaded value simply isn't a candidate for GVN with the
previous version. We don't have to worry about those bugs by construction.

So everything gets a lot easier under this new system.

But this commit doesn't mean we are 100% done and ready to transition to the new
system, so what is actually in here?

* CLIF producers can mark values as needing to be present in a stack map if they
are live across a safepoint in `cranelift-frontend`. This is the
`FunctionBuilder::declare_needs_stack_map` method.

* When we finalize the function we are building, we do a simple, single-pass
liveness analysis to determine the set of GC references that are live at each
safepoint, and then we insert spills to explicit stack slots just before the
safepoint. We intentionally trade away the precision of a fixed-point liveness
analysis for the speed and simplicity of a single-pass implementation.

* We annotate the safepoint with the metadata necessary to construct its
associated stack map. This is the new
`cranelift_codegen::ir::DataFlowGraph::append_user_stack_map_entry` method and
all that stuff.

* These stack map entries are part of the CLIF and can be roundtripped through
printing and parsing CLIF.

* Each stack map entry describes a GC-managed value that is on the stack and how
to locate it: its type, the stack slot it is located within, and the offset
within that stack slot where it resides. Different stack map entries for the
same safepoint may have different types or a different width from the target's
pointer.

Here is what is *not* handled yet, and left for future follow up commits:

* Lowering the stack map entries' locations from symbolic stack slot and offset
pairs to physical stack frame offsets after register allocation.

* Coalescing and aggregating the safepoints and their raw stack map entries into
a compact PC-to-stack-map table during emission.

* Supporting moving GCs. Right now we generate spills into stack slots for live
GC references just before safepoints, but we don't reload the GC references from
the stack upon their next use after the safepoint. This involves rewriting uses
of the old, spilled values which could be a little finicky, but we think we have
a good approach.

* Port Wasmtime over to using this new stack maps system.

* Removing the old stack map system, including `r{32,64}` from Cranelift and GC
reference handling from `regalloc2`. (For the time being, the new system
generally refers to "user stack maps" to disambiguate from the old system where
it might otherwise be confusing.) If we wanted to remove the old system now,
that would require us to also port Wasmtime to the new system now, and we'd end
up with a monolithic PR. Better to do this incrementally and temporarily have
the old and in-progress new system overlap for a short period of time.

Co-Authored-By: Trevor Elliott <telliott@fastly.com>
fitzgen added a commit to fitzgen/wasmtime that referenced this pull request Jun 6, 2024
Tracking GC references and producing stack maps is a significant amount of
complexity in `regalloc2`.

At the same time, GC reference value types are pretty annoying to deal with in
Cranelift itself. We know our `r64` is "actually" just an `i64` pointer, and we
want to do `i64`-y things with it, such as an `iadd` to compute a derived
pointer, but `iadd` only takes integer types and not `r64`s. We investigated
loosening that restriction and it was way too painful given the way that CLIF
type inference and its controlling type vars work. So to compute those derived
pointers, we have to first `bitcast` the `r64` into an `i64`. This is
unfortunate in two ways. First, because of arcane interactions between register
allocation constraints, stack maps, and ABIs this involves inserting unnecessary
register-to-register moves in our generated code which hurts binary size and
performance ever so slightly. Second, and much more seriously, this is a serious
footgun. If a GC reference isn't an `r64` right now, then it will not appear in
stack maps, and failure to record a live GC reference in a stack map means that
the collector could reclaim the object while you are still using it, leading to
use-after-free bugs! Very bad. And the mid-end needs to know
*not* to GVN these bitcasts or else we get similar bugs (see
bytecodealliance#8317).

Overall GC references are a painful situation for us today.

This commit is the introduction of an alternative. (Note, though, that we aren't
quite ready to remove the old stack maps infrastructure just yet.)

Instead of preserving GC references all the way through the whole pipeline and
computing live GC references and inserting spills at safepoints for stack maps
all the way at the end of that pipeline in register allocation, the
CLIF-producing frontend explicitly generates its own stack slots and spills for
safepoints. The only thing the rest of the compiler pipeline needs to know is
the metadata required to produce the stack map for the associated safepoint. We
can completely remove `r32` and `r64` from Cranelift and just use plain `i32`
and `i64` values. Or `f64` if the runtime uses NaN-boxing, which the old stack
maps system did not support at all. Or 32-bit GC references on a 64-bit target,
which was also not supported by the old system. Furthermore, we *cannot* get
miscompiles due to GVN'ing bitcasts that shouldn't be GVN'd because there aren't
any bitcasts hiding GC references from stack maps anymore. And in the case of a
moving GC, we don't need to worry about the mid-end doing illegal code motion
across calls that could have triggered a GC that invalidated the moved GC
reference because frontends will reload their GC references from the stack slots
after the call, and that loaded value simply isn't a candidate for GVN with the
previous version. We don't have to worry about those bugs by construction.

So everything gets a lot easier under this new system.

But this commit doesn't mean we are 100% done and ready to transition to the new
system, so what is actually in here?

* CLIF producers can mark values as needing to be present in a stack map if they
are live across a safepoint in `cranelift-frontend`. This is the
`FunctionBuilder::declare_needs_stack_map` method.

* When we finalize the function we are building, we do a simple, single-pass
liveness analysis to determine the set of GC references that are live at each
safepoint, and then we insert spills to explicit stack slots just before the
safepoint. We intentionally trade away the precision of a fixed-point liveness
analysis for the speed and simplicity of a single-pass implementation.

* We annotate the safepoint with the metadata necessary to construct its
associated stack map. This is the new
`cranelift_codegen::ir::DataFlowGraph::append_user_stack_map_entry` method and
all that stuff.

* These stack map entries are part of the CLIF and can be roundtripped through
printing and parsing CLIF.

* Each stack map entry describes a GC-managed value that is on the stack and how
to locate it: its type, the stack slot it is located within, and the offset
within that stack slot where it resides. Different stack map entries for the
same safepoint may have different types or a different width from the target's
pointer.

Here is what is *not* handled yet, and left for future follow up commits:

* Lowering the stack map entries' locations from symbolic stack slot and offset
pairs to physical stack frame offsets after register allocation.

* Coalescing and aggregating the safepoints and their raw stack map entries into
a compact PC-to-stack-map table during emission.

* Supporting moving GCs. Right now we generate spills into stack slots for live
GC references just before safepoints, but we don't reload the GC references from
the stack upon their next use after the safepoint. This involves rewriting uses
of the old, spilled values which could be a little finicky, but we think we have
a good approach.

* Port Wasmtime over to using this new stack maps system.

* Removing the old stack map system, including `r{32,64}` from Cranelift and GC
reference handling from `regalloc2`. (For the time being, the new system
generally refers to "user stack maps" to disambiguate from the old system where
it might otherwise be confusing.) If we wanted to remove the old system now,
that would require us to also port Wasmtime to the new system now, and we'd end
up with a monolithic PR. Better to do this incrementally and temporarily have
the old and in-progress new system overlap for a short period of time.

Co-Authored-By: Trevor Elliott <telliott@fastly.com>
fitzgen added a commit to fitzgen/wasmtime that referenced this pull request Jun 7, 2024
Tracking GC references and producing stack maps is a significant amount of
complexity in `regalloc2`.

At the same time, GC reference value types are pretty annoying to deal with in
Cranelift itself. We know our `r64` is "actually" just an `i64` pointer, and we
want to do `i64`-y things with it, such as an `iadd` to compute a derived
pointer, but `iadd` only takes integer types and not `r64`s. We investigated
loosening that restriction and it was way too painful given the way that CLIF
type inference and its controlling type vars work. So to compute those derived
pointers, we have to first `bitcast` the `r64` into an `i64`. This is
unfortunate in two ways. First, because of arcane interactions between register
allocation constraints, stack maps, and ABIs this involves inserting unnecessary
register-to-register moves in our generated code which hurts binary size and
performance ever so slightly. Second, and much more seriously, this is a serious
footgun. If a GC reference isn't an `r64` right now, then it will not appear in
stack maps, and failure to record a live GC reference in a stack map means that
the collector could reclaim the object while you are still using it, leading to
use-after-free bugs! Very bad. And the mid-end needs to know
*not* to GVN these bitcasts or else we get similar bugs (see
bytecodealliance#8317).

Overall GC references are a painful situation for us today.

This commit is the introduction of an alternative. (Note, though, that we aren't
quite ready to remove the old stack maps infrastructure just yet.)

Instead of preserving GC references all the way through the whole pipeline and
computing live GC references and inserting spills at safepoints for stack maps
all the way at the end of that pipeline in register allocation, the
CLIF-producing frontend explicitly generates its own stack slots and spills for
safepoints. The only thing the rest of the compiler pipeline needs to know is
the metadata required to produce the stack map for the associated safepoint. We
can completely remove `r32` and `r64` from Cranelift and just use plain `i32`
and `i64` values. Or `f64` if the runtime uses NaN-boxing, which the old stack
maps system did not support at all. Or 32-bit GC references on a 64-bit target,
which was also not supported by the old system. Furthermore, we *cannot* get
miscompiles due to GVN'ing bitcasts that shouldn't be GVN'd because there aren't
any bitcasts hiding GC references from stack maps anymore. And in the case of a
moving GC, we don't need to worry about the mid-end doing illegal code motion
across calls that could have triggered a GC that invalidated the moved GC
reference because frontends will reload their GC references from the stack slots
after the call, and that loaded value simply isn't a candidate for GVN with the
previous version. We don't have to worry about those bugs by construction.

So everything gets a lot easier under this new system.

But this commit doesn't mean we are 100% done and ready to transition to the new
system, so what is actually in here?

* CLIF producers can mark values as needing to be present in a stack map if they
are live across a safepoint in `cranelift-frontend`. This is the
`FunctionBuilder::declare_needs_stack_map` method.

* When we finalize the function we are building, we do a simple, single-pass
liveness analysis to determine the set of GC references that are live at each
safepoint, and then we insert spills to explicit stack slots just before the
safepoint. We intentionally trade away the precision of a fixed-point liveness
analysis for the speed and simplicity of a single-pass implementation.

* We annotate the safepoint with the metadata necessary to construct its
associated stack map. This is the new
`cranelift_codegen::ir::DataFlowGraph::append_user_stack_map_entry` method and
all that stuff.

* These stack map entries are part of the CLIF and can be roundtripped through
printing and parsing CLIF.

* Each stack map entry describes a GC-managed value that is on the stack and how
to locate it: its type, the stack slot it is located within, and the offset
within that stack slot where it resides. Different stack map entries for the
same safepoint may have different types or a different width from the target's
pointer.

Here is what is *not* handled yet, and left for future follow up commits:

* Lowering the stack map entries' locations from symbolic stack slot and offset
pairs to physical stack frame offsets after register allocation.

* Coalescing and aggregating the safepoints and their raw stack map entries into
a compact PC-to-stack-map table during emission.

* Supporting moving GCs. Right now we generate spills into stack slots for live
GC references just before safepoints, but we don't reload the GC references from
the stack upon their next use after the safepoint. This involves rewriting uses
of the old, spilled values which could be a little finicky, but we think we have
a good approach.

* Port Wasmtime over to using this new stack maps system.

* Removing the old stack map system, including `r{32,64}` from Cranelift and GC
reference handling from `regalloc2`. (For the time being, the new system
generally refers to "user stack maps" to disambiguate from the old system where
it might otherwise be confusing.) If we wanted to remove the old system now,
that would require us to also port Wasmtime to the new system now, and we'd end
up with a monolithic PR. Better to do this incrementally and temporarily have
the old and in-progress new system overlap for a short period of time.

Co-Authored-By: Trevor Elliott <telliott@fastly.com>
github-merge-queue bot pushed a commit that referenced this pull request Jun 7, 2024
Tracking GC references and producing stack maps is a significant amount of
complexity in `regalloc2`.

At the same time, GC reference value types are pretty annoying to deal with in
Cranelift itself. We know our `r64` is "actually" just an `i64` pointer, and we
want to do `i64`-y things with it, such as an `iadd` to compute a derived
pointer, but `iadd` only takes integer types and not `r64`s. We investigated
loosening that restriction and it was way too painful given the way that CLIF
type inference and its controlling type vars work. So to compute those derived
pointers, we have to first `bitcast` the `r64` into an `i64`. This is
unfortunate in two ways. First, because of arcane interactions between register
allocation constraints, stack maps, and ABIs this involves inserting unnecessary
register-to-register moves in our generated code which hurts binary size and
performance ever so slightly. Second, and much more seriously, this is a serious
footgun. If a GC reference isn't an `r64` right now, then it will not appear in
stack maps, and failure to record a live GC reference in a stack map means that
the collector could reclaim the object while you are still using it, leading to
use-after-free bugs! Very bad. And the mid-end needs to know
*not* to GVN these bitcasts or else we get similar bugs (see
#8317).

Overall GC references are a painful situation for us today.

This commit is the introduction of an alternative. (Note, though, that we aren't
quite ready to remove the old stack maps infrastructure just yet.)

Instead of preserving GC references all the way through the whole pipeline and
computing live GC references and inserting spills at safepoints for stack maps
all the way at the end of that pipeline in register allocation, the
CLIF-producing frontend explicitly generates its own stack slots and spills for
safepoints. The only thing the rest of the compiler pipeline needs to know is
the metadata required to produce the stack map for the associated safepoint. We
can completely remove `r32` and `r64` from Cranelift and just use plain `i32`
and `i64` values. Or `f64` if the runtime uses NaN-boxing, which the old stack
maps system did not support at all. Or 32-bit GC references on a 64-bit target,
which was also not supported by the old system. Furthermore, we *cannot* get
miscompiles due to GVN'ing bitcasts that shouldn't be GVN'd because there aren't
any bitcasts hiding GC references from stack maps anymore. And in the case of a
moving GC, we don't need to worry about the mid-end doing illegal code motion
across calls that could have triggered a GC that invalidated the moved GC
reference because frontends will reload their GC references from the stack slots
after the call, and that loaded value simply isn't a candidate for GVN with the
previous version. We don't have to worry about those bugs by construction.

So everything gets a lot easier under this new system.

But this commit doesn't mean we are 100% done and ready to transition to the new
system, so what is actually in here?

* CLIF producers can mark values as needing to be present in a stack map if they
are live across a safepoint in `cranelift-frontend`. This is the
`FunctionBuilder::declare_needs_stack_map` method.

* When we finalize the function we are building, we do a simple, single-pass
liveness analysis to determine the set of GC references that are live at each
safepoint, and then we insert spills to explicit stack slots just before the
safepoint. We intentionally trade away the precision of a fixed-point liveness
analysis for the speed and simplicity of a single-pass implementation.

* We annotate the safepoint with the metadata necessary to construct its
associated stack map. This is the new
`cranelift_codegen::ir::DataFlowGraph::append_user_stack_map_entry` method and
all that stuff.

* These stack map entries are part of the CLIF and can be roundtripped through
printing and parsing CLIF.

* Each stack map entry describes a GC-managed value that is on the stack and how
to locate it: its type, the stack slot it is located within, and the offset
within that stack slot where it resides. Different stack map entries for the
same safepoint may have different types or a different width from the target's
pointer.

Here is what is *not* handled yet, and left for future follow up commits:

* Lowering the stack map entries' locations from symbolic stack slot and offset
pairs to physical stack frame offsets after register allocation.

* Coalescing and aggregating the safepoints and their raw stack map entries into
a compact PC-to-stack-map table during emission.

* Supporting moving GCs. Right now we generate spills into stack slots for live
GC references just before safepoints, but we don't reload the GC references from
the stack upon their next use after the safepoint. This involves rewriting uses
of the old, spilled values which could be a little finicky, but we think we have
a good approach.

* Port Wasmtime over to using this new stack maps system.

* Removing the old stack map system, including `r{32,64}` from Cranelift and GC
reference handling from `regalloc2`. (For the time being, the new system
generally refers to "user stack maps" to disambiguate from the old system where
it might otherwise be confusing.) If we wanted to remove the old system now,
that would require us to also port Wasmtime to the new system now, and we'd end
up with a monolithic PR. Better to do this incrementally and temporarily have
the old and in-progress new system overlap for a short period of time.

Co-authored-by: Trevor Elliott <telliott@fastly.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cranelift:area:machinst Issues related to instruction selection and the new MachInst backend. cranelift Issues related to the Cranelift code generator wasmtime:api Related to the API of the `wasmtime` crate itself
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants