Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
In contrast to ARM, an LR/SC sequence (code between LR and SC) is very limited on RISC-V platforms. A maximum number of 16 instructions and only a part of the base "I" and "C" instruction set is permitted. Since additional loads and stores are also excluded, instrumenting an instruction inside the sequence will most likely turn it into an "unconstrained LR/SC loop" resulting in the trailing SC to always fail on our test device. The ISA only guaranties for "constrained LR/SC loops" to succeed eventually.
The way unconstrained LR/SC loops are handled is considered a hardware implementation detail. On a SiFive U54, unconstrained LR/SC loops will never succeed, resulting in deadlocks in some cases.
The Approach to fix this issue is to translate the LR/SC sequence to a mixture of a software emulated and hardware atomic sequence. The following figure hopefully gives you an idea of how it works:
The actual implementation stores the value of register
x
into thedbm_thread
structure and only uses one temporary scratch register. The ordering flagsaq
andrl
were not considered in the software emulation part (LR
replaced byLD
) which may lead to side effects (we did not encounter any side effects).Benchmarks
In terms of performance, the implementation seems to have no negative effect on real world applications. In all 4 applications, LR/SC sequences were called 40-60 times (per run).