-
Notifications
You must be signed in to change notification settings - Fork 12.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[AArch64]: 128-bit Sequentially Consistent load allows reordering before prior store when armv8 and armv8.4 implementations are Mixed #81978
Comments
@llvm/issue-subscribers-backend-aarch64 Author: Luke Geeson (lukeg101)
Consider the following litmus test:
```
C SB
{ int128_t *x = 0; int128_t *y = 0} P0 (_Atomic __int128 *x, _Atomic __int128 *y) { P1 (_Atomic __int128 *x, _Atomic __int128 *y) { exists (P0:r0=0 /\ P1:r0 = 0)
{ P0:r0=0; P1:r0=1; }
P0: P1:
{ P0:r0=0; P1:r0=0; } <--- Forbidden by source model, bug!
DMB ISH; LDP; DMB ISH
|
@tmatheson-arm @Wilco1 @efriedma-quic Eli you asked this question (it is archived so I cannot comment on the phab link) https://reviews.llvm.org/D141429#inline-1378324:
First of all, the views of the authors expressed in this comment are not endorsed by Arm or any other company mentioned. I do not represent Arm. This is another one of those compiler bugs that arise when we mix compatible implementations of atomics. Wilco and I have been looking into whether concurrency bugs arise when implementations are mixed. We actually found the above when looking back at this Phabricator link. It turns out there are compiler bugs that non-mixed testing (ie prior work) can miss. When validating that replacing I've developed a tool to automatically search for these kinds of 'mixing bugs', given a set of tests and compiler profiles that generate atomics, as input. |
The following AArch64 litmus test was generated by the tool (and tweaked by me). It can be fed into the memory model tool:
|
This allows the most reordering while still being correct. |
Hi @efriedma-quic are you aware of other cases where mixing has caused issues? I'd like to understand a bit more about the Windows case and how it was used, and other cases if you know any? |
Consider the following litmus test:
where
P0:r0 = 0
means threadP0
, local variabler0
has value0
.Building either P0/P1 for v8-a or v8.4 passes as expected. When simulating this test under the C/C++ model from its initial state, the outcome of execution in the exists clause is forbidden by the source model. The allowed outcomes are:
However when compiling
P0
, to target armv8.4-a (https://godbolt.org/z/dxTrbGxoG) using clang trunk (dmb ish; stp; dmb ish; ldp; dmb ish
), compiling thestore
onP1
to target armv8.0-a using clang (ldaxp;stlxp;cbnz
loop), and theload
onP1
to target armv8.4-a (ldp;dmb ish
) using clang. When compiled the assembly is as follows:The compiled program has the following outcomes when simulated under the AArch64 model (rename P0:X1 to P0:r0 and P1:X1 to P1:r0 to match with source outcomes):
which is due to the fact the effects of
LDP
onP1
can be reordered before the effects ofSTLXP
onP1
since there is no leadingDMB
barrier to prevent the reordering.Since there is no barrier, we propose to fix the bug by adding said barrier before
LDP
:Which prevents the buggy outcome under the AArch64 memory model.
Besides using a
DMB
, it is feasible to useLDAR
- making the SCLDP
stronger so it no longer reorders withSTLXP.
Note it is also possible to makeSTLXP
stronger by adding aDMB
after the loop (but that's not what is done for other atomic sizes)I have validated this bug whilst discussing with Wilco from Arm's compiler teams.
This bug would not have been caught in normal execution, but only when multiple implementations are mixed together.
The text was updated successfully, but these errors were encountered: