Skip to content

Commit

Permalink
docs: Move atomic operation overview by architecture to its own markd…
Browse files Browse the repository at this point in the history
…own file
  • Loading branch information
taiki-e committed Nov 23, 2024
1 parent 6a9aec8 commit d82f7c8
Show file tree
Hide file tree
Showing 11 changed files with 301 additions and 154 deletions.
3 changes: 3 additions & 0 deletions .github/.cspell/project-dictionary.txt
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,7 @@ armasm
beqz
Bicc
bnez
boundedly
casp
cbnz
ccmp
Expand All @@ -26,6 +27,7 @@ fistp
gaisler
getex
GRLIB
Halfword
hwsync
IMAFD
inequal
Expand Down Expand Up @@ -87,6 +89,7 @@ rcpc
risbg
rsbegin
rsend
RVWMO
scompare
seqz
sete
Expand Down
2 changes: 2 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -55,6 +55,8 @@ Currently, x86, x86_64, Arm, AArch64, RISC-V, LoongArch64, Arm64EC, s390x, MIPS,
\[7] Requires Rust 1.84+.<br>
\[8] Requires nightly due to `#![feature(asm_experimental_arch)]`.<br>

See also [Atomic operation overview by architecture](https://github.com/taiki-e/atomic-maybe-uninit/blob/HEAD/src/arch/README.md) for more information about atomic operations in these architectures.

Feel free to submit an issue if your target is not supported yet.

## Related Projects
Expand Down
280 changes: 280 additions & 0 deletions src/arch/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,280 @@
<!-- omit in toc -->
# Atomic operation overview by architecture

This directory contains architecture-specific atomic implementations.

This document describes the operations that are considered atomic by architecture.

- [AVR](#avr)
- [M68k](#m68k)
- [MSP430](#msp430)
- [PowerPC](#powerpc)
- [RISC-V](#risc-v)
- [s390x](#s390x)
- [SPARC](#sparc)

TODO: write sections for AArch64, Arm, Hexagon, LoongArch, MIPS, x86, Xtensa

## AVR

target_arch: avr<br>
Implementation: [avr.rs](avr.rs)

Refs: [AVR® Instruction Set Manual, Rev. DS40002198B](https://ww1.microchip.com/downloads/en/DeviceDoc/AVR-InstructionSet-Manual-DS40002198.pdf)

This architecture is always single-core and the following operations are atomic:

- Operation that is complete within a single instruction.<br>
This is because the currently executing instruction must be completed before entering the
interrupt service routine.<br>
(Refs: [AVR® Interrupts](https://developerhelp.microchip.com/xwiki/bin/view/products/mcu-mpu/8-bit-avr/structure/interrupts/))<br>
The following two kinds of instructions are related to memory access:
- 8-bit load/store
- XCH, LAC, LAS, LAT: 8-bit swap,fetch-and-{clear,or,xor} (xmegau family)

- Operations performed in a situation where all interrupts are disabled.<br>
However, pure operations that are not affected by compiler fences (note: the correct interrupt
disabling and restoring implementation must imply compiler fences, e.g., asm without nomem/readonly)
may be moved out of the critical section by compiler optimizations.

## M68k

target_arch: m68k<br>
Implementation: [m68k.rs](m68k.rs)

Refs: [M68000 FAMILY Programmer's Reference Manual](https://www.nxp.com/docs/en/reference-manual/M68000PRM.pdf)

The following instructions are atomic if the address is properly aligned and the specified storage meets the requirements:

- Load/Store Instructions
- {8,16,32}-bit

- Multiprocessor Instructions
- TAS: 8-bit TAS (M68000 or later)
- CAS: {8,16,32}-bit CAS (M68020 or later)
- CAS2: {16,32}-bit double CAS (M68020 or later)

(Refs: Section 3.1.11 "Multiprocessor Instructions" of M68000 FAMILY Programmer's Reference Manual)

Note that CAS2 is not yet supported in LLVM (as of 19).

## MSP430

target_arch: msp430<br>
Implementation: [msp430.rs](msp430.rs)

Refs: [MSP430x5xx and MSP430x6xx Family User's Guide, Rev. Q](https://www.ti.com/lit/ug/slau208q/slau208q.pdf)

This architecture is always single-core and the following operations are atomic:

- Operation that is complete within a single instruction.<br>
This is because the currently executing instruction must be completed before entering the
interrupt service routine.<br>
(Refs: [Section 1.3.4.1 "Interrupt Acceptance" of MSP430x5xx and MSP430x6xx Family User's Guide, Rev. Q](https://www.ti.com/lit/ug/slau208q/slau208q.pdf#page=59))

- Operations performed in a situation where all interrupts are disabled.<br>
However, pure operations that are not affected by compiler fences (note: the correct interrupt
disabling and restoring implementation must imply compiler fences, e.g., asm without nomem/readonly)
may be moved out of the critical section by compiler optimizations.

## PowerPC

target_arch: powerpc, powerpc64<br>
Implementation: [powerpc.rs](powerpc.rs)

Refs: Power ISA ([3.1C](https://files.openpower.foundation/s/9izgC5Rogi5Ywmm), [2.07B](https://ibm.ent.box.com/s/jd5w15gz301s5b5dt375mshpq9c3lh4u))

The following instructions are atomic if the address is properly aligned and the specified storage meets the requirements:

- Load/Store Instructions
- All {8,16,32}-bit and 64-bit (powerpc64-only) single load/store instructions other than Move Assist instruction<br>
- lq/stq: 128-bit load/store (powerpc64-only)<br>
Compatibility: ISA 2.07 or later (available since ISA 2.03, but were privileged instructions and big-endian mode only and no documented atomicity guarantee, in pre-2.07 ISA)
- ISA 2.07B: included in the requirements of server processors as Load/Store Quadword category
- ISA 3.1C: included in the Linux Compliancy subset and AIX Compliancy subset
- plq/pstq: 128-bit load/store (powerpc64-only)<br>
(Note: Not mentioned in "Single-Copy Atomicity" section, but GCC [uses them for 128-bit load/store](https://github.com/gcc-mirror/gcc/commit/3bcdb5dec72b6d7b197821c2b814bc9fc07f4628))<br>
Compatibility: ISA 3.1 or later
- ISA 3.1C: included in the Linux Compliancy subset and AIX Compliancy subset

(Refs: Section 1.4 "Single-Copy Atomicity" of Power ISA 3.1C Book II)

- Load And Reserve and Store Conditional Instructions (aka LL/SC)
- l{b,h}arx/st{b,h}cx.: {8,16}-bit LL/SC<br>
Compatibility: ISA 2.06 or later
- ISA 2.07B: included in the requirements as Base category
- ISA 3.1C: included in all compliancy subsets
- lwarx/stwcx.: 32-bit LL/SC<br>
Compatibility: PPC or later
- ISA 2.07B: included in the requirements as Base category
- ISA 3.1C: included in all compliancy subsets
- ldarx/stdcx.: 64-bit LL/SC (powerpc64-only)<br>
Compatibility: PPC or later
- ISA 2.07B: included in the requirements of 64-bit processors as 64-bit category
- ISA 3.1C: included in the Linux Compliancy subset and AIX Compliancy subset
- lqarx/stqcx.: 128-bit LL/SC (powerpc64-only)<br>
Compatibility: ISA 2.07 or later
- ISA 2.07B: included in the requirements of server processors as Load/Store Quadword category
- ISA 3.1C: included in the Linux Compliancy subset and AIX Compliancy subset

(Refs: Section 4.6.2 "Load And Reserve and Store Conditional Instructions" of Power ISA 3.1C Book II)

- Atomic Memory Operation (AMO) Instructions
- l{w,d}at: {32,64}-bit swap,fetch-and-{add,and,or,xor,max,min},etc. (powerpc64-only)<br>
<!-- (Others: Compare and Swap Not Equal, Fetch and Increment Bounded, Fetch and Increment Equal, Fetch and Decrement Bounded) -->
- st{w,d}at: {32,64}-bit add,and,or,xor,max,min,etc. (powerpc64-only)<br>
<!-- (Others: Store Twin) -->

Compatibility: ISA 3.0 or later
- ISA 3.1C: included in the AIX Compliancy subset

(Refs: Section 4.5 "Atomic Memory Operations" of Power ISA 3.1C Book II)

Load-store instructions are atomic only if properly aligned. LL/SC and AMO instructions require
proper alignment, otherwise the system alignment error handler is invoked or the results are boundedly undefined.<br>
(Refs: Section 1.4 "Single-Copy Atomicity", 4.6.2 "Load And Reserve and Store Conditional Instructions", and 4.5 "Atomic Memory Operations" of Power ISA 3.1C Book II)

Note that plq/pstq is not yet supported in LLVM (as of 19).

None of the above instructions imply a memory barrier.

- A sync (sync 0, sync 0,0, hwsync) instruction can be used as both an “import barrier” and an “export barrier”.<br>
Compatibility: POWER1 or later
- ISA 2.07B: included in the requirements as Base category
- ISA 3.1C: included in all compliancy subsets
- A lwsync (sync 1, sync 1,0) instruction can be used as both an “import barrier” and an “export barrier”,
if the specified storage location is in storage that is neither Write Through Required nor Caching Inhibited.<br>
- An “import barrier” can be constructed by a branch that depends on the loaded value (even a branch
that depends on a comparison of the same register is okay), followed by an isync instruction.<br>
Compatibility: POWER1 or later
- ISA 2.07B: included in the requirements as Base category
- ISA 3.1C: included in all compliancy subsets

(Refs: Section 1.7.1 "Storage Access Ordering" and Section B.2 "Lock Acquisition and Release, and Related Techniques" of Power ISA 3.1C Book II)

sync corresponds to SeqCst semantics, lwsync corresponds to Acquire/Release semantics, and isync
with appropriate sequence corresponds to Acquire semantics.

## RISC-V

target_arch: riscv32, riscv64<br>
Implementation: [riscv.rs](riscv.rs)

Refs: [RISC-V Instruction Set Manual](https://github.com/riscv/riscv-isa-manual)

The following instructions are atomic if the address is properly aligned and the specified storage meets the requirements:

- Load/Store Instructions (relaxed load/store)
- All {8,16,32}-bit (for RV32 & RV64) and 64-bit (for RV64) load/store instructions<br>
Note: Currently, there is no guaranteed 128-bit atomic load/store even on RV128.<br>
(Refs: [Section "Memory Model Primitives" of RVWMO Memory Consistency Model, Version 2.0](https://github.com/riscv/riscv-isa-manual/blob/riscv-isa-release-8b9dc50-2024-08-30/src/rvwmo.adoc#memory-model-primitives))

- Load-Acquire and Store-Release Instructions
- (experimental) Zalasr extension: {8,16,32}-bit (for RV32 & RV64) and 64-bit (for RV64) acquire/seqcst load, release/seqcst store<br>
(Refs: [RISC-V Zalasr Specification](https://github.com/riscv/riscv-zalasr))

- Load-Reserved/Store-Conditional (LR/SC) Instructions (aka LL/SC)
- Zalrsc extension: 32-bit (for RV32 & RV64) and 64-bit (for RV64)<br>
(Refs: ["Zalrsc" Extension for Load-Reserved/Store-Conditional Instructions](https://github.com/riscv/riscv-isa-manual/blob/riscv-isa-release-8b9dc50-2024-08-30/src/a-st-ext.adoc#zalrsc-extension-for-load-reservedstore-conditional-instructions))

- Atomic Memory Operation (AMO) Instructions
- Zaamo extension: 32-bit (for RV32 & RV64) and 64-bit (for RV64) swap,fetch_{add,and,or,xor,max,min}<br>
(Refs: ["Zaamo" Extension for Atomic Memory Operations](https://github.com/riscv/riscv-isa-manual/blob/riscv-isa-release-8b9dc50-2024-08-30/src/a-st-ext.adoc#zaamo-extension-for-atomic-memory-operations))
- Zabha extension: {8,16}-bit swap,fetch_{add,and,or,xor,max,min}<br>
(Refs: ["Zabha" Extension for Byte and Halfword Atomic Memory Operations, Version 1.0](https://github.com/riscv/riscv-isa-manual/blob/riscv-isa-release-8b9dc50-2024-08-30/src/zabha.adoc))

- Atomic Compare-and-Swap (CAS) Instructions
- Zacas extension: {32,64}-bit (for RV32 & RV64) and 128-bit (for RV64)<br>
(Refs: ["Zacas" Extension for Atomic Compare-and-Swap (CAS) Instructions, Version 1.0.0](https://github.com/riscv/riscv-isa-manual/blob/riscv-isa-release-8b9dc50-2024-08-30/src/zacas.adoc))
- Zacas and Zabha extensions: {8,16}-bit<br>
(Refs: ["Zabha" Extension for Byte and Halfword Atomic Memory Operations, Version 1.0](https://github.com/riscv/riscv-isa-manual/blob/riscv-isa-release-8b9dc50-2024-08-30/src/zabha.adoc))

Of the above instructions, instructions other than relaxed load/store, can specify the memory ordering.<br>
The mappings from the C/C++ atomic operations are described in the [RISC-V Atomics ABI Specification](https://github.com/riscv-non-isa/riscv-elf-psabi-doc/blob/draft-20240829-13bfa9f54634cb60d86b9b333e109f077805b4b3/riscv-atomic.adoc).

Note: "A" extension comprises instructions provided by Zalrsc and Zaamo extensions,
Zabha and Zacas extensions depends upon Zaamo extension.

## s390x

target_arch: s390x<br>
Implementation: [s390x.rs](s390x.rs)

Refs: z/Architecture Principles of Operation ([Fourteenth Edition](https://publibfp.dhe.ibm.com/epubs/pdf/a227832d.pdf))

The following instructions are atomic if the address is properly aligned and the specified storage meets the requirements:

- Load/Store Instructions
- All {8,16,32,64}-bit load/store instructions that having Single-Access References<br>
(Refs: Section "Storage-Operand Fetch References", "Storage-Operand Store References", and "Storage-Operand Consistency" of z/Architecture Principles of Operation, Fourteenth Edition)
- LPQ/STPQ: 128-bit load/store (arch1 or later)<br>
(Refs: Section "LOAD PAIR FROM QUADWORD" and "STORE PAIR TO QUADWORD" of z/Architecture Principles of Operation, Fourteenth Edition)

- Instructions that having Interlocked-Update References
- TS: 8-bit TAS (360 or later)<br>
<!-- (TEST AND SET) -->
- CS{,Y,G}, CDS{,Y,G}: {32,64,128}-bit CAS (CS,CDS: 370 or later, CSG,CDSG: arch1 or later, CSY,CDSY: long-displacement facility added in arch3)<br>
<!-- (COMPARE AND SWAP, COMPARE DOUBLE AND SWAP) -->
- LAA{,G}, LAAL{,G}, LAN{,G}, LAO{,G}, LAX{,G}: {32,64}-bit fetch-and-{add,and,or,xor} (interlocked-access facility 1 added in arch9)<br>
<!-- (LOAD AND ADD, LOAD AND ADD LOGICAL, LOAD AND AND, LOAD AND OR, LOAD AND EXCLUSIVE OR) -->
- A{,G}SI, AL{,G}SI: {32,64}-bit add with immediate (interlocked-access facility 1 added in arch9)<br>
<!-- (Storage-and-immediate formats of ADD IMMEDIATE and ADD LOGICAL WITH SIGNED IMMEDIATE) -->
- NI{,Y}, OI{,Y}, XI{,Y}: 8-bit {and,or,xor} with immediate (interlocked-access facility 2 added in arch10)<br>
<!-- (Storage-and-immediate formats of AND, OR, and EXCLUSIVE OR) -->
<!-- - (Others: COMPARE AND REPLACE DAT TABLE ENTRY, COMPARE AND SWAP AND PURGE, COMPARE AND SWAP AND STORE, STORE CHARACTERS UNDER MASK (conditional)) -->

(Refs: Section "Storage-Operand Update References" of z/Architecture Principles of Operation, Fourteenth Edition)

Of the above instructions, instructions that having Interlocked-Update References
other than STORE CHARACTERS UNDER MASK perform serialization.<br>
(Refs: Section "CPU Serialization" of z/Architecture Principles of Operation, Fourteenth Edition)

The following instructions are usually used as standalone serialization:

- BCR 15,0 (360 or later)
- BCR 14,0 (fast-BCR-serialization facility added in arch9)

(Refs: Section "BRANCH ON CONDITION" of z/Architecture Principles of Operation, Fourteenth Edition)

Serialization corresponds to SeqCst semantics, all memory access has Acquire/Release semantics.

## SPARC

target_arch: sparc, sparc64<br>
Implementation: [sparc.rs](sparc.rs)

Refs: The SPARC Architecture Manual ([Version 9, Version 8](https://sparc.org/technical-documents))

The following instructions are atomic if the address is properly aligned and the specified storage meets the requirements:

- Load/Store Instructions
- V7 or later: {8,16,32}-bit
- V8+,V9: 64-bit

(Refs: Section D.4.1 "Value Atomicity" of the SPARC Architecture Manual, Version 9)

- Compare-and-Swap Instructions
- V8+,V9: {32,64}-bit CAS
- V8 with LEONCASA: 32-bit CAS

(Refs: Section 8.4.6 "Hardware Primitives for Mutual Exclusion" of the SPARC Architecture Manual, Version 9)

- SWAP Instructions (deprecated in V9)
- V7 or later: 32-bit swap

(Refs: Section 8.4.6 "Hardware Primitives for Mutual Exclusion" and A.57 "Swap Register with Memory" of the SPARC Architecture Manual, Version 9)

- Load Store Unsigned Byte Instructions
- V7 or later: 8-bit TAS

(Refs: Section 8.4.6 "Hardware Primitives for Mutual Exclusion" of the SPARC Architecture Manual, Version 9)

Memory access instructions require proper alignment, but some instructions are
implementation-dependent and may work with insufficient alignment.<br>
(Refs: Section 6.3.1.1 Memory Alignment Restrictions" of the SPARC Architecture Manual, Version 9)

Which memory barrier the above instructions imply depends on the memory model used.
V8+ and V9 have three memory models: Total Store Order (TSO), Partial Store Order (PSO), and Relaxed
Memory Order (RMO). V8 has only TSO and PSO. Implementation of TSO (or a more strongly ordered model
which implies TSO) is mandatory, and PSO and RMO are optional.<br>
(Refs: Section 8.4.4 "Memory Models" of the SPARC Architecture Manual, Version 9)
15 changes: 2 additions & 13 deletions src/arch/avr.rs
Original file line number Diff line number Diff line change
Expand Up @@ -3,19 +3,8 @@
/*
AVR
This architecture is always single-core and the following operations are atomic:
- Operation that is complete within a single instruction.
This is because the currently executing instruction must be completed before entering the
interrupt service routine.
(Refs: https://developerhelp.microchip.com/xwiki/bin/view/products/mcu-mpu/8-bit-avr/structure/interrupts/)
The following two kinds of instructions are related to memory access:
- 8-bit load/store
- XCH, LAC, LAS, LAT: 8-bit swap,fetch-and-{clear,or,xor} (xmegau family)
- Operations performed in a situation where all interrupts are disabled.
However, pure operations that are not affected by compiler fences (note: the correct interrupt
disabling and restoring implementation must implies compiler fences, e.g., asm without nomem/readonly)
may be moved out of the critical section by compiler optimizations.
See "Atomic operation overview by architecture" for atomic operations in this architecture:
https://github.com/taiki-e/atomic-maybe-uninit/blob/HEAD/src/arch/README.md#avr
Refs:
- AVR® Instruction Set Manual, Rev. DS40002198B
Expand Down
13 changes: 2 additions & 11 deletions src/arch/m68k.rs
Original file line number Diff line number Diff line change
Expand Up @@ -3,17 +3,8 @@
/*
M68k
This architecture provides the following atomic instructions:
- Load/Store Instructions
- {8,16,32}-bit
- Multiprocessor Instructions
- TAS: 8-bit TAS (M68000 or later)
- CAS: {8,16,32}-bit CAS (M68020 or later)
- CAS2: {16,32}-bit double CAS (M68020 or later)
(Refs: Section 3.1.11 "Multiprocessor Instructions" of M68000 FAMILY Programmer's Reference Manual)
Note that CAS2 is not yet supported in LLVM.
See "Atomic operation overview by architecture" for atomic operations in this architecture:
https://github.com/taiki-e/atomic-maybe-uninit/blob/HEAD/src/arch/README.md#m68k
Refs:
- M68000 FAMILY Programmer's Reference Manual
Expand Down
12 changes: 2 additions & 10 deletions src/arch/msp430.rs
Original file line number Diff line number Diff line change
Expand Up @@ -3,16 +3,8 @@
/*
MSP430
This architecture is always single-core and the following operations are atomic:
- Operation that is complete within a single instruction.
This is because the currently executing instruction must be completed before entering the
interrupt service routine.
(Refs: Section 1.3.4.1 "Interrupt Acceptance" of MSP430x5xx and MSP430x6xx Family User's Guide, Rev. Q: https://www.ti.com/lit/ug/slau208q/slau208q.pdf#page=59)
- Operations performed in a situation where all interrupts are disabled.
However, pure operations that are not affected by compiler fences (note: the correct interrupt
disabling and restoring implementation must implies compiler fences, e.g., asm without nomem/readonly)
may be moved out of the critical section by compiler optimizations.
See "Atomic operation overview by architecture" for atomic operations in this architecture:
https://github.com/taiki-e/atomic-maybe-uninit/blob/HEAD/src/arch/README.md#msp430
Refs:
- MSP430x5xx and MSP430x6xx Family User's Guide, Rev. Q
Expand Down
Loading

0 comments on commit d82f7c8

Please sign in to comment.