Re-implement AArch64 atomic load and stores #3128

sparker-arm · 2021-07-29T15:01:42Z

The AArch64 support was a bit broken and was using Armv7 style
barriers, which aren't required with Armv8 acquire-release
load/stores. The backend now pattern matches uextend(atomic_load) too.

The fallback CAS loops and RMW, for AArch64, have also been updated
to use acquire-release, exclusive, instructions which, again, remove
the need for barriers. The CAS loop has also been further optimised
by using the extending form of the cmp instruction.

sparker-arm · 2021-07-29T15:07:07Z

@uweigand I made some changes in the s390x backend and I'm really not sure if it's right, could you take a look please?

bjorn3 · 2021-07-29T15:52:14Z

cranelift/codegen/meta/src/shared/instructions.rs

@@ -4579,8 +4579,7 @@ pub(crate) fn define(
            r#"
        Atomically load from memory at `p`.

-        This is a polymorphic instruction that can load any value type which has a memory
-        representation.  It should only be used for integer types with 8, 16, 32 or 64 bits.
+        It should only be used for integer types with 32 or 64 bits.


Please don't change this. In rustc_codegen_cranelift I use atomic_load to load true i8 and i16 values without any zero-extension afterwards.

+1, it's better to be consistent with normal load instructions here and accept any type, I think.

So I'm attempting to revert the IR changes, so we have the atomic_load + uextend. So during lowering, I'm trying to pattern match from the uextend, which I think works fine. But then I'm assuming that the AtomicLoad node is also being lowered so my output code ends up with two loads. Is there a way to explicitly replace a value or remove an instruction?

Ah, interesting! So the lowering framework keeps track of reference counts for values; instructions are lowered in a backward pass where inputs are only produced if something depends on them, or if they have side-effects. However in this case the load is considered to have a side effect (it can trap). This actually needs slightly different handling; it's not legal to duplicate the load (imagine if e.g. a store interleaved).

In cases where we merge an effectful op (like a load), we need to call sink_inst() on the context as well; see this comment. An example of how this is used is here in the x64 backend.

Sorry this is a bit subtle!

is_mergeable_load returns None for ty.bits() < 32.

That's specific to the x64 backend, because of the way that narrow loads become wider in some cases; I imagine that doesn't apply here since the widening loads don't reach beyond their width.

bjorn3 · 2021-07-29T15:52:23Z

cranelift/codegen/meta/src/shared/instructions.rs

    ig.push(
        Inst::new(
            "atomic_store",
            r#"
        Atomically store `x` to memory at `p`.

-        This is a polymorphic instruction that can store any value type with a memory
-        representation.  It should only be used for integer types with 8, 16, 32 or 64 bits.
+        This is a polymorphic instruction that can store a 32 or 64-bit value.


bjorn3 · 2021-07-29T15:57:46Z

cranelift/codegen/src/isa/aarch64/inst/emit.rs

                      cbnz        x24, again
-                      dmb         ish


Would it make sense to keep ARMv7 support and only emit the new sequence for ARMv8?

We certainly will need a similar sequence when we complete the ARM32 (ARMv7) backend, but that doesn't share code with the aarch64 (aka ARMv8) one we're in here, so I think it's fine/expected that we avoid ARMv7-isms here.

uweigand · 2021-07-29T17:08:30Z

cranelift/codegen/src/isa/s390x/lower.rs

+                ctx.emit(match (ty_bits(access_ty), ty_bits(ty)) {
+                    (8, 32) => Inst::Load32ZExt8 { rd, mem },
+                    (8, 64) => Inst::Load64ZExt8 { rd, mem },
+                    (16, 32) => Inst::LoadRev16 { rd, mem },


This is wrong, because the lrvh instruction does not perform any extension. To support this, you'd have to do the same that is currently done for a Uload16 and emit an explicit Extend after the load-reverse.

Also, I think you need to support the same set of operations for the little-endian case as for the big-endian case, which means the 16->64 and 32->64 extension cases also should be added. They need to be implemented via the same explicit Extend after the load.

Okay, thanks.

uweigand · 2021-07-29T17:10:32Z

cranelift/filetests/filetests/isa/s390x/atomic_load-little.clif

 block0(v0: i64):
-  v1 = atomic_load.i16 little v0
+  v1 = atomic_uload16.i32 little v0
  return v1
 }

 ; check:  lrvh %r2, 0(%r2)


The test case is likewise wrong. You can look at load-little.clif for a complete set of load test cases. Those should be usable for atomic loads as well (aligned loads are atomic by default on the architecture).

cranelift/codegen/meta/src/shared/instructions.rs

The AArch64 support was a bit broken and was using Armv7 style barriers, which aren't required with Armv8 acquire-release load/stores. The fallback CAS loops and RMW, for AArch64, have also been updated to use acquire-release, exclusive, instructions which, again, remove the need for barriers. The CAS loop has also been further optimised by using the extending form of the cmp instruction. Copyright (c) 2021, Arm Limited.

Along with the x64 and s390x changes. Now pattern matching the uextend(atomic_load) in the aarch64 backend.

cfallin

LGTM with the updates; thanks for the patience (and sorry for the delay in looking again)!

github-actions bot added cranelift Issues related to the Cranelift code generator cranelift:area:aarch64 Issues related to AArch64 backend. cranelift:area:x64 Issues related to x64 codegen cranelift:meta Everything related to the meta-language. cranelift:wasm labels Jul 29, 2021

bjorn3 suggested changes Jul 29, 2021

View reviewed changes

uweigand suggested changes Jul 29, 2021

View reviewed changes

cfallin requested changes Jul 29, 2021

View reviewed changes

cranelift/codegen/meta/src/shared/instructions.rs Outdated Show resolved Hide resolved

sparker-arm added 2 commits August 5, 2021 09:08

Revert IR changes

b6f6ac1

Along with the x64 and s390x changes. Now pattern matching the uextend(atomic_load) in the aarch64 backend.

sparker-arm force-pushed the aarch64-atomics branch from a9a93a4 to b6f6ac1 Compare August 5, 2021 09:26

sparker-arm changed the title ~~Re-implement atomic load and stores~~ Re-implement AArch64 atomic load and stores Aug 5, 2021

cfallin approved these changes Aug 6, 2021

View reviewed changes

cfallin merged commit 9c550fc into bytecodealliance:main Aug 6, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Re-implement AArch64 atomic load and stores #3128

Re-implement AArch64 atomic load and stores #3128

sparker-arm commented Jul 29, 2021 •

edited

Loading

sparker-arm commented Jul 29, 2021

bjorn3 Jul 29, 2021 •

edited

Loading

cfallin Jul 29, 2021

sparker-arm Jul 30, 2021

sparker-arm Aug 3, 2021

cfallin Aug 3, 2021

bjorn3 Aug 3, 2021

cfallin Aug 3, 2021

bjorn3 Jul 29, 2021

bjorn3 Jul 29, 2021

cfallin Jul 29, 2021

uweigand Jul 29, 2021

sparker-arm Jul 30, 2021

uweigand Jul 29, 2021

cfallin left a comment

Re-implement AArch64 atomic load and stores #3128

Re-implement AArch64 atomic load and stores #3128

Conversation

sparker-arm commented Jul 29, 2021 • edited Loading

sparker-arm commented Jul 29, 2021

bjorn3 Jul 29, 2021 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

cfallin left a comment

Choose a reason for hiding this comment

sparker-arm commented Jul 29, 2021 •

edited

Loading

bjorn3 Jul 29, 2021 •

edited

Loading