The purpose of this RFC is to provide an API to access to low level, ARM specific SIMD and non-SIMD instructions on stable Rust.
TODO (stable channel, less error prone than writing your own assembly, no
extra build dependency to assembly external files (e.g arm-none-eabi-gcc
))
TODO
The Arm architecture includes features that go beyond the set of operations available to C/C++ programmers. The intention of the Arm C Language Extensions (ACLE) is to allow the writing of applications and middleware code that is portable across compilers, and across Arm architecture variants, while exploiting the advanced features of the Arm architecture.
Reference: Section 7.3 "Memory barriers" of ACLE.
The following API will be available under {core,std}::arch::{arm,aarch64}
and
will be available for both the "arm" and "aarch64" architectures unless
indicated otherwise (see target_arch
).
/// Generates a DMB (data memory barrier) instruction or equivalent CP15 instruction.
///
/// DMB ensures the observed ordering of memory accesses. Memory accesses of the
/// specified type issued before the DMB are guaranteed to be observed (in the
/// specified scope) before memory accesses issued after the DMB.
///
/// For example, DMB should be used between storing data, and updating a flag
/// variable that makes that data available to another core.
///
/// The __dmb() intrinsic also acts as a compiler memory barrier of the
/// appropriate type.
pub unsafe fn __dmb<A>(arg: A) where A: Dmb { /* .. */ }
/// Generates a DSB (data synchronization barrier) instruction or equivalent
/// CP15 instruction.
///
/// DSB ensures the completion of memory accesses. A DSB behaves as the
/// equivalent DMB and has additional properties. After a DSB instruction
/// completes, all memory accesses of the specified type issued before the DSB
/// are guaranteed to have completed.
///
/// The __dsb() intrinsic also acts as a compiler memory barrier of the
/// appropriate type.
pub unsafe fn __dsb<A>(arg: A) where A: Dsb { /* .. */ }
/// Generates an ISB (instruction synchronization barrier) instruction or
/// equivalent CP15 instruction.
///
/// This instruction flushes the processor pipeline fetch buffers, so that
/// following instructions are fetched from cache or memory.
///
/// An ISB is needed after some system maintenance operations. An ISB is also
/// needed before transferring control to code that has been loaded or modified
/// in memory, for example by an overlay mechanism or just-in-time code
/// generator. (Note that if instruction and data caches are separate,
/// privileged cache maintenance operations would be needed in order to unify
/// the caches.)
///
/// The only supported argument for the __isb() intrinsic is 15, corresponding
/// to the SY (full system) scope of the ISB instruction.
pub unsafe fn __isb<A>(arg: A) where A: Isb { /* .. */ }
// Arguments to the above intrinsics
/// Full system is the required shareability domain, reads and writes are the
/// required access types
pub struct SY;
/// Full system is the required shareability domain, writes are the required
/// access type
pub struct ST;
/// Full system is the required shareability domain, reads are the required
/// access type
#[cfg(target_arch = "aarch64")]
pub struct LD;
/// Inner Shareable is the required shareability domain, reads and writes are
/// the required access types
pub struct ISH;
/// Inner Shareable is the required shareability domain, reads are the required
/// access type
#[cfg(target_arch = "aarch64")]
pub struct ISHLD;
/// Inner Shareable is the required shareability domain, writes are the required
/// access type
pub struct ISHST;
/// Non-shareable is the required shareability domain, reads and writes are the
/// required access types
pub struct NSH;
/// Non-shareable is the required shareability domain, reads are the required
/// access type
#[cfg(target_arch = "aarch64")]
pub struct NSHLD;
/// Non-shareable is the required shareability domain, writes are the required
/// access type
pub struct NSHST;
/// Outer Shareable is the required shareability domain, reads and writes are
/// the required access types
pub struct OSH;
/// Outher Shareable is the required shareability domain, reads are the required
/// access type
#[cfg(target_arch = "aarch64")]
pub struct OSHLD;
/// Outer Shareable is the required shareability domain, writes are the required
/// access type
pub struct OSHST;
// The following `struct`s implement the `Dmb` and `Dsb` traits:
// SY, LD, ST, ISH, ISHLD, ISHST, NSH, NSHST, OSH, OSHLD, OSHST
//
// Only SY implements the `Isb` trait
use core::arch::arm::{self, SY}
unsafe {
// omitted: write to the SCB peripheral to invalidate some cache
arm::__dsb(SY);
arm::__isb(SY);
}
In C, this would be written as:
// ommitted part
__dsb(0xF);
__dmb(0xF);
Quoting ACLE (Section 7.3):
The intrinsics in this section are available for all targets. They may be no-ops (i.e. generate no code, but possibly act as a code motion barrier in compilers) on targets where the relevant instructions do not exist, but only if the property they guarantee would have held anyway. On targets where the relevant instructions exist but are implemented as no-ops, these intrinsics generate the instructions.
Furthermore the table 10.1 in ACLE indicates:
Instruction | Flags | Arch. | Intrinsic or C code |
---|---|---|---|
DMB | 8, 7, 6-M | __dmb | |
DSB | 8, 7, 6-M | __dsb | |
ISB | 8, 7, 6-M | __isb |
Where the architecture numbers mean (quoting Section 10.1 of ACLE ):
Architecture 8 means Armv8-A AArch32 and AArch64, 8-32 means Armv8-AArch32 only.
Architecture 7 means Armv7-A and Armv7-R.
In the sequence of Thumb-only architectures { 6-M, 7-M, 7E-M } each architecture includes its predecessor instruction set.
Thus the memory barriers will be implemented as follows:
pub trait Dmb: sealed::Trait {
unsafe fn dmb(&self);
}
mod sealed {
trait Trait {}
}
pub unsafe fn __dmb<A>(arg: A) where A: Dmb {
arg.dmb()
}
impl Dmb for SY {
#[cfg(any(
target_feature = "mclass", // 6-M
target_feature = "v7", // 7
target_arch = "aarch64" // 8
))]
unsafe fn dmb(&self) {
asm!("dmb 0xF" : : : "memory" : "volatile");
}
#[cfg(not(/* like above */)]
unsafe fn dmb(&self) {
// No-op but still a compiler barrier because of "memory"
asm!("" : : : "memory" : "volatile");
}
}
That is on sub-architectures where the DMB instruction doesn't exist, the
__dmb
intrinsic will be equivalent to a compiler barrier.
References: Section 7.4 "Hints" and Section "Section 7.7 NOP" of ACLE.
The following API will be available under {core,std}::arch::{arm,aarch64}
and
will be available for both the "arm" and "aarch64" architectures.
/// Generates a WFI (wait for interrupt) hint instruction, or nothing.
///
/// The WFI instruction allows (but does not require) the processor to enter a
/// low-power state until one of a number of asynchronous events occurs.
pub unsafe fn __wfi() { /* .. */ }
/// Generates a WFE (wait for event) hint instruction, or nothing.
///
/// The WFE instruction allows (but does not require) the processor to enter a
/// low-power state until some event occurs such as a SEV being issued by
/// another processor.
pub unsafe fn __wfe() { /* .. */ }
/// Generates a SEV (send a global event) hint instruction.
///
/// This causes an event to be signaled to all processors in a multiprocessor
/// system. It is a NOP on a uniprocessor system.
pub unsafe fn __sev() { /* .. */ }
/// Generates a send a local event hint instruction.
///
/// This causes an event to be signaled to only the processor executing this
/// instruction. In a multiprocessor system, it is not required to affect the
/// other processors.
pub unsafe fn __sevl() { /* .. */ }
/// Generates a YIELD hint instruction.
///
/// This enables multithreading software to indicate to the hardware that it is
/// performing a task, for example a spin-lock, that could be swapped out to
/// improve overall system performance.
pub unsafe fn __yield() { /* .. */ }
/// Generates a DBG instruction.
///
/// This provides a hint to debugging and related systems. The argument must be
/// a constant integer from 0 to 15 inclusive. See implementation documentation
/// for the effect (if any) of this instruction and the meaning of the
/// argument. This is available only when compliling for AArch32.
#[rustc_args_required_const(0)]
pub unsafe fn __dbg(_: u32) { /* .. */ }
/// Generates an unspecified no-op instruction.
///
/// Note that not all architectures provide a distinguished NOP instruction. On
/// those that do, it is unspecified whether this intrinsic generates it or
/// another instruction. It is not guaranteed that inserting this instruction
/// will increase execution time.
pub unsafe fn __nop() { /* .. */ }
Quoting Section 7.4 "Hints" of ACLE :
The intrinsics in this section are available for all targets. They may be no-ops (i.e. generate no code, but possibly act as a code motion barrier in compilers) on targets where the relevant instructions do not exist. On targets where the relevant instructions exist but are implemented as no-ops, these intrinsics generate the instructions
So like in the implementation of memory barriers these intrinsics will do nothing on some sub-architectures.
Reference: Section 9 "System register access" of ACLE.
The following API will be available under {core,std}::arch::{arm,aarch64}
and
will be available for both the "arm" and "aarch64" architectures.
/// Reads a 32-bit system register
pub unsafe fn __arm_rsr<R>(special_register: R) -> u32
where
R: Rsr
{
/* .. */
}
/// Reads a 64-bit system register
pub unsafe fn __arm_rsr64<R>(special_register: R) -> u64
where
R: Rsr64
{
/* .. */
}
/// Reads a system register containing an address
pub unsafe fn __arm_rsrp<R>(special_register: R) -> *const c_void
where
R: Rsrp
{
/* .. */
}
/// Writes a 32-bit system register
pub unsafe fn __arm_wsr<R>(special_register: R, value: u32)
where
R: Wsr
{
/* .. */
}
/// Writes a 64-bit system register
pub unsafe fn __arm_wsr64<R>(special_register: R, value: u64)
where
R: Wsr64
{
/* .. */
}
/// Writes a system register containing an address
pub unsafe fn __arm_wsrp<R>(special_register: R, value: *const c_void)
where
R: Wsrp
{
/* .. */
}
The values that can be used for the special_register
argument depend on the
target subarchitecture and whether the register is a 32-bit register or a 64-bit
register.
Possible 32-bit system registers (see __arm_rsr
, __arm_rsrp
, __arm_wsr
,
and __arm_wsrp
) include:
-
The values accepted in the
spec_reg
field of the MRS instruction, for example CPSR. See ARMARM for more details. -
The values accepted in the
spec_reg
field of the MSR (immediate) instruction. See ARMARM for more details. -
The values accepted in the
spec_reg
field of the VMRS instruction, for example FPSID. See ARMARM for more details. -
The values accepted in the
spec_reg
field of the VMSR instruction, for example FPSCR. See ARMARM for more details. -
The values accepted in the
spec_reg
field of the MSR and MRS instructions with virtualization extensions, for example ELR_Hyp. See ARMARM for more details. -
The values specified in Special register encodings used in Armv7-M system instructions, for example PRIMASK. See ARMv7M for more details.
Possible 64-bit system registers (see __arm_rsr64
, __arm_rsrp64
,
__arm_wsr64
, and __arm_wsrp64
) include:
- The values accepted in the pstatefield of the MSR (immediate) instruction. See ARMARMv8 for more details.
use core::arch::arm::{BASEPRI, self};
unsafe {
let new_val: u32 = /* .. */;
let f = /* some closure */;
let old_val = arm::__arm_rsr(BASEPRI);
// start of critical section
arm::__arm_wsr(BASEPRI, new_val);
f();
// end of critical section
arm::__arm_wsr(BASEPRI, old_val);
}
In C you would write:
uint32_t old_val, new_val;
void* f;
new_val = /* .. */;
f = /* .. */;
old_val = __arm_rsr("BASEPRI");
__arm_wsr("BASEPRI", new_val)
f();
__arm_wsr("BASEPRI", old_val)
The important part of the implementation is double checking that special
register struct
s are only available on the sub-architectures where they are
physically present.
pub trait Rsr: sealed::Trait {
unsafe fn rsr(&self) -> u32;
}
pub trait Wsr: sealed::Trait {
unsafe fn wsr(&self, value: u32);
}
mod sealed {
trait Trait {}
}
pub unsafe fn __arm_rsr<R>(special_register: R) -> u32
where
R: Rsr
{
special_register.rsr();
}
pub unsafe fn __arm_wsr<R>(special_register: R, value: u32)
where
R: Wsr
{
special_register.wsr(value)
}
#[cfg(target_feature = "mclass")]
pub struct BASEPRI;
#[cfg(target_feature = "mclass")]
impl Rsr for BASEPRI {
fn rsr(&self) -> u32 {
let r: u32;
asm!("mrs $0, BASEPRI" : "=r"(r) ::: "volatile");
r
}
}
#[cfg(target_feature = "mclass")]
impl Wsr for BASEPRI {
fn wsr(&self, value: u32) {
asm!("msr BASEPRI, $0" :: "r"(value) : "memory" : "volatile")
}
}
Reference: Section 8.5 "32-bit SIMD intrinsics" of ACLE.
TODO, but at least I should tell you that this is not about NEON. ACLE says so in Section 8.5.1 "Availability":
Armv6 introduced instructions to perform 32-bit SIMD operations (i.e. two 16-bit operations or four 8-bit operations) on the Arm general-purpose registers. These instructions are not related to the much more versatile Advanced SIMD (NEON) extension, whose support is described in Advanced SIMD (NEON) intrinsics.
And in Section 5.4.9 "32-bit SIMD instructions"
__ARM_FEATURE_SIMD32 is defined to 1 if the 32-bit SIMD instructions are supported and the intrinsics defined in 32-bit SIMD intrinsics are available. This also implies support for the GE global flags which indicate byte-by-byte comparison results.
__ARM_FEATURE_SIMD32 is deprecated in ACLE 2.0 for A-profile. Users are encouraged to use NEON Intrinscs as an equivalent for the 32-bit SIMD intrinsics functionality. However they are fully supported for M and R-profiles. This is defined for AArch32 only.
So these are mainly Cortex-M and Cortex-R specific intrinsics.
This section will list the SIMD intrinsics to expose in core::arch::arm
and
will propose that they are only exposed on +mclass
and +rclass
ARMv7
targets.
ARM C Language Extensions Q2 2018
ARM Architecture Reference Manual (7-A / 7-R)