-
Notifications
You must be signed in to change notification settings - Fork 98
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Stable assembly operations #63
Comments
Is there any sequence of instructions for which a single call to an I am thinking here about compiling in debug mode without optimizations were the code generated might differ, potentially in ways that break it. |
I’ve always been a strong proponent of the "intrinsic" functions that map down to a single instruction. For example, instead of writing Of course they have their downsides, but for most of the use cases in embedded, intrinsics are more than sufficient. In my experience using the IAR toolchain, which exposes such intrinsics for most of the important instructions, I’ve only had to maintain assembly to implement exactly two things: atomic byte increment and the context switch handler. Both of them implemented in an external assembly file. I believe that we could work on adding intrinsics to rustc (LLVM doesn’t have a platform intrinsic for ARM cpsid, for example, but it has a ... So basically, what your proposal says. |
@gnzlbg yes, compiler is free to insert whatever code it wishes between the separate |
Thanks @nagisa that makes sense. So my opinion is that it only makes sense to do this after we have stabilized inline assembly, which kind of defeats the point of doing this in the first place. IIUC the idea behind this is that if we expose the hardware ISA via intrinsics then nothing will break if LLVM changes the syntax or semantics of inline assembly because we will be able to fix this in core. But this is the problem: core will break. That is, somebody will need to fix core, and the amount of work required to fix core is linearly proportional to the usage of inline assembly within core. This might be a lot of work already, but if we go this route that will turn into a titanic amount of work. The same would happen if we decided to add another backed to rustc and had to reimplement all usages of inline assembly in "Cretonne-syntax". So IMO the only way to make upgrading to a new syntax/backend realistic is to turn the porting effort from O(N) in the usage of inline assembly to O(1). The only way I can think of to achieve this is to stabilize the inline assembly macro enough (e.g. like this https://internals.rust-lang.org/t/pre-rfc-inline-assembly/6443), so that upgrading can be done in Rust by "just" mapping Rust's inline assembly syntax to whatever syntax a new or upgraded backed uses. This will be a lot of work, but at least its independent from how often inline assembly is used. Once we are there, we might just as well stabilize inline assembly instead of pursuing this. Libraries that implement this can be developed in the ecosystem, and "core" intrinsics can continue to be added to Then there is also the issue that two consecutive |
The intrinsics would be implemented in the backend, rather than libcore. While it is true, that it would increase burden when upgrading a backend, it wouldn’t be any greater than the burden of adapting whatever we stabilise as our inline assembly implementation. |
With volatile assembly statements, the "code" would be limited to code that would be necessary to satisfy the constraints of the |
why can't global_asm! be stabilized? Isn't that just like an external assembly file? for riscv it would be quite a few intrinsics I believe. this is what I'm doing to r/w csr regs: #[cfg(target_arch = "riscv")]
macro_rules! csr_asm {
($op:ident, $csr:expr, $value:expr) => (
{
let res: usize;
unsafe {
asm!(concat!(stringify!($op), " $0, ", stringify!($csr), ", $1")
: "=r"(res)
: "r"($value)
:
: "volatile");
}
res
}
)
} the $csr value isn't an operand that can be loaded from a register, so there would need to be an intrinsic for each csr. |
For reference, here is ARM's C Language Extensions 2.0 document: ARM® C Language Extensions Release 2.0 I believe these were implemented by ARM's in-house compilers and by GCC according to this page: 6.59.7 ARM C Language Extensions (ACLE) I believe that the stdsimd group knows about the NEON extensions but probably hasn't put much thought into other ARM intrinsics. Having a standard to point to should make it easier to get them implemented. For instance, section "8.4 Hints" describe |
I've taken a look over the full instruction set listing, here's what stands out Assembly operations that are often required for nontrivial programs:
Somewhat more obscure stuff:
Unlike AVR-GCC, LLVM transparently handles accesses to and from program memory, meaning that that whole class of operations doesn't require Although currently unimplemented, I believe LLVM's atomic intrinsics directly could be mapped to the RMW atomic instructions (FWIW these map to something like |
|
I discussed this with @alexcrichton during Rust All Hands and he said he was fine with adding an stable API for assembly ops that have an unstable implementation (e.g. inline |
@japaric would those go into |
@gnzlbg That can be bikeshed in the RFC |
In order to understand which intrinsics might be needed for the various Cortex-M processors, I The first group of intrinsics is "Core Register Access". These are mainly wrappers around https://www.keil.com/pack/doc/CMSIS/Core/html/group__Core__Register__gr.html
The second group of intrinsics provides access to CPU instructions. Each of these Some of these instructions have direct equivalents in core::intrinsics or https://www.keil.com/pack/doc/CMSIS/Core/html/group__intrinsic__CPU__gr.html
In the first group, disable_fault_irq, disable_irq, enable_fault_irq, enable_irq seem to be pretty critical. The rest of the get_ / set_ functions are more specialized. In the second group, NOP, WFI, WFE, SEV, ISB, DSB are ones that I am familiar with and use often. REVSH, RBIT, and RRX seem like they would be primarily for optimization. SSAT and USAT provide more flexibility than the core saturated math primitives by allowing selection of a bit width. LDRxT and STRxT are mainly for unprivileged access checking. The rest should be covered by built-ins and atomics. |
Do gcc, clang, or msvc provide any of these as functions ? |
As far as I can tell, all three implement them via the ACLE (ARM C Language Extensions) specification. gcc - 6.59.7 ARM C Language Extensions (ACLE) clang - The arm_acle.h header is shown in their documentation msvc - ARM Intrinsics |
All of it? The |
Thanks for the pointer back to coresimd - I looked at this briefly a while back but didn't dig in to figure out the specifics of how to get the additional intrinsics implemented. coresimd/simd_llvm.rs provides the clue:
Unfortunately onlysimd intrinsics are listed there. I'd never seen the Looking further, stdsimd issue #112 mentions link_llvm_intrinsic which enables Implement all x86 vendor intrinsics shows how to create new intrinsics and also gave me enough information to find the list of LLVM arm intrinsics: https://github.com/llvm-mirror/llvm/blob/master/include/llvm/IR/IntrinsicsARM.td So, as @gnzlbg points out, we should be able to do this through pull requests to stdsimd. |
For ACLE using |
Update: rust-lang/stdarch#437 is tracking adding these assembly operations (instructions) to stdsimd. |
Discussed (triaged) in the last meeting: This is more of a nice to have as it's not required for embedded Rust on stable. If we are to get this done by the edition release these are the final deadlines:
|
Triage: Several CMSIS intrinsics have been implemented and are now available in Most of the functionality in The stabilization path for the non-SIMD subset of these intrinsics is being discussed in rust-lang/stdarch#518 (comment). |
As this mainly affects the Cortex-M ecosystem we should have someone on the @rust-embedded/cortex-m team champion this work. (This doesn't mean that you have to implement this; mentoring / helping a collaborator to implement this is also valid). Solving rust-lang/stdarch#437 (comment) is the first step towards implementing this. |
I have been doing work with asm instructions before, so I can certainly help. Reading the referenced comments and PRs I do not quite see the issue, as it is started the difference between HAL and compiler spec. @japaric To clarify, do you want help simply pushing it forward (is seems you already did the implementation) or to make the decision on CMSIS / ACLE discussion? |
@korken89 we want to stabilize the non-SIMD instructions (WFI, CPSID, etc.) as that would let us drop the build dependency on |
@japaric Thanks for the clarification! I have never written an RFC before, but I'd very much like to learn. |
A summary of "What is ACLE" and "What is CMSIS" from the point-of-view of "What should a compiler implement" would probably be a good start. It isn't really necessary to put that into an RFC or anything right now. Just posting it there as a comment would be enough to keep the discussion moving and unblock future work. The only thing we are waiting for right now is for somebody to make a good case for which intrinsics should be implemented and why. We tried that before, but we have learned new things, so it is time to briefly re-evaluate whether CMSIS was the right choice or not (or whether we should provide only ACLE, or also ACLE, etc.). |
Hmm the last two comments there, including yours, kind of settle it from a logical point of view. It should be ACLE. Any reason you reconsider that statement here? |
Well, I have no idea what ACLE and CMSIS are, never used them, etc. The only things I know about ARM are from reviewing stdsimd PRs and implementing some bit and pieces here and there, and I honestly have the feeling that I still haven't gorked how the whole ARM architecture ecosystem fits together. So from my point-of-view, my statements about ARM should have zero weight. What I was hoping is that somebody would come along the thread with an authoritative answer or source of truth about how We have been making lists about how to organize Are the any resources that explain how the whole family of architectures / ISAs are related, which extensions are there, which can be changed dynamically, etc. ? Maybe that would help me. The whole v5/.../v8 architecture families +/- 32/64 +/- thumb +/- neon/... +- {r,m,..}class +- hard/soft floats +/-... is like a big spaghetti mess inside my head. It is hard to me to tell which code is allowed in which binary for which architecture. What are the ISAs, what are the ISA extensions, which code for which extensions can I include in my binary as long as I only execute it on hardware that implements the extension (that is, which extensions can actually be detected and used at run-time). For example, I can generate an arm64 binary that contains neon instructions as long as these are not executed at run-time, but IIUC the same cannot be done for arm32v7+neon. |
I was mainly referring to this specific comment: rust-lang/stdarch#518 (comment) His Github profile says he works for ARM. If true, I think we have our authority :) Edit: One comment above states the same: rust-lang/stdarch#518 (comment) |
Triage: A pre-RFC discussing what should go in |
Update(last week): a PR that implement acle (see pre-RFC in #184) has been sent to the stdsimd repo: rust-lang/stdarch#557. Also see rust-lang/stdarch#558. Update(today): as discussed in today's meeting I'm going to remove this issue from the edition milestone. This means that this is no longer a priority for us and that we will prioritize the issues in the edition milestone over this. The rationale is that the binary blob trick already lets us implement the most used subset of CMSIS without depending on an external assembler, e.g. arm-none-eabi-gcc (although with runtime and code size overhead). |
207: [RFC] Initial Cortex-A team r=hannobraun a=andre-richter This is a proposal for creating a Cortex-A team. This is based on and will close issue #182. Please refrain from approving until I removed the [DRAFT] from the title and every proposed member has his points on the agenda. @raw-bin, @wizofe and me will be the initial members. The following work is planned for this team: ### Maintain a set of ecosystem crates under the embedded WG - [ ] Write a tool that generates accessors for the ARM System Register XML. - [ ] Move [cortex-a](https://github.com/andre-richter/cortex-a) to the WG and keep extending/completing it. - [ ] If feasible, feed the XML accessors into the crate. - [ ] Add ready-to-use idioms for bare-metal use cases. - [ ] MMU idioms - [ ] GIC idioms - [ ] ...? - [ ] Move [register-rs](https://github.com/rust-osdev/register-rs) to the WG. CC @phil-opp for sign-off. - [ ] Discuss if something the likes of `cortex-m-rt` is feasible for cortex-a. ### Provide resources for learning - [ ] Move [rust-raspi3-tutorial](https://github.com/andre-richter/rust-raspi3-tutorial) to the WG ### Make it possible to build AArch64 kernels / programs on stable. - [ ] Add a `rust-std` component for `aarch64-unknown-none` - [ ] Differentiate into further aarch64 targets, e.g. a non-fp version of `aarch64-unknown-none`, and add them as well. ### Stabilization - [ ] Push for a stable core::arch::arm module (see #63) to make the `cortex-a` crate (see its asm module) work on stable w/o depending on GCC. ------------------ @raw-bin and @wizofe, kindly let me know your agenda points and I will edit them into this comment. I understand your Redox work will most probably be a part of it, but I want to leave it to you guys to describe it. Once we are set, I will remove the [DRAFT] and open this for members to approve. @jamesmunns I tend to agree to your thought about having a microprocessor group. However, I think that cortex-a alone is such a complex architecture, it can easily keep the initial three of us busy for years to come. I would argue that we would need separate teams for PPC, RISC-V and the likes. Co-authored-by: Andre Richter <andre-richter@users.noreply.github.com>
Update: a PR that exposes the ACLE API on ARM and AArc64 recently landed in The next steps here would be to use the Finally, to me it's a bit unclear if a full RFC is needed in this case given the |
We have stabilized other APIs since the The merged PR is awesome, and definitely something worth using on nightly (it landed already) so that we can start ironing implementation bugs, and getting experience with the design. However, the API exposed by the PR does not do 1:1 what C does, it is something different than what was agreed on in the It will be hard to have a discussion about that without writing down somewhere "What does C do", "Why this API is designed differently", "What are the trade-offs involved in this design", "What are the alternatives", "How can we close the gap with C in the future", etc. and using that as a starting point to discuss these things. |
FWIW I personally still feel that it's ok to "only" FCP these APIs. Such an FCP would, however, want to be advertised to relevant parties (aka the compiler team and the embedded WG). @gnzlbg while for x86/x86_64 we have C precedent, wasm was a platform where we didn't have any C precedent so we made our own. I think that's fine to do in limited bases within platforms. For example it sounds like ARM doesn't have a widely-agreed-upon set of C APIs like x86/x86_64 does for intrinsics, so having our own idioms which take this into account for ARM seems reasonable. As for RFC vs FCP, I think an RFC is probably a little too heavyweight for this in that we've already had an RFC for
I definitely agree with this, but this is also what I imagined an FCP proposal encompassing :) |
For WASM we coordinated this a bit with the spec and the clang/LLVM devs to make sure that things "matched". For PowerPC, even though that's not stable yet, @lu-zero contacted the spec developers and they agreed to include the Rust API into the spec.
AFAIK the ACLE headers are just normal C headers, and most (all?) C and C++ programs use those. Isn't this the case? (I can't recall the exact result of the ACLE vs CMSIS discussion).
Instead of mapping the C headers, the current implementation designed a new "object-oriented" API over the registers and intrinsics that has no prior art (and limited experience on nightly), even though the API looks really good. If the FCP proposal discussed those issues "mini-RFC-style" then that's fine for me, but the other FCPs that we had for similar things were basically a one liner "This is what C / theSpec does, we did that." and that was it. I don't think that a one-liner FCP would be enough this time. |
Hm ok, I was unaware that it had switched far enough away to a more object-oriented style, sorry about that! If that's the case then I agree that is a big enough change it may want an RFC |
Maybe one could draft the FCP post first, and then we can discuss which process this should follow? With a draft of the FCP it should be pretty obvious how much new design there is in here. |
Note, as of Rust 1.34, it may be possible to achieve this in a limited fashion using linker-plugin-lto functionality. Please see the related cortex-m issue, and this test repo demonstrating this works. As a note, this isn't really inline ASM, as originally described, but instead a way to define ASM in an external C file, that can be linked in as a static archive ( |
Revisiting as part of today's issue triage. Our ACLE intrinsics never made it, but |
Triage(2018-08-21)
A pre-RFC discussing what should go in
core::arch::arm
have been opened in #184Update - 2018-07-27
Intrinsics like
__NOP
are now available incore::arch::arm
. Path towards stabilization is being discussed in rust-lang/stdarch#518 (comment).Help wanted
We are looking for someone / people to help us write an RFC on stabilization of the ARM Cortex intrinsics in core::arch::arm. Details in #63 (comment).
One of the features that ties embedded development to the nightly channel is the
asm!
macro, andby extension the
global_asm!
macro.In some cases it's possible to turn an unstable
asm!
call into a stable FFI call that invokes somesubroutine that comes from a pre-compiled external assembly file. This alternative comes at the cost
of a function call overhead per
asm!
invocation.In other cases the function call overhead of the FFI call breaks the intended semantics of the
original
asm!
call; this can be seen when reading registers like the Program Counter (PC) or theLink Register (LR).
The
asm!
feature is hard to stabilize because it's directly tied to LLVM;rustc
literally passesthe contents of
asm!
invocations to LLVM's internal assembler. It's not possible to guarantee thatthe syntax of LLVM assembly won't change across LLVM releases so stabilizing the
asm!
feature inits current form is not possible.
This ticket is about exploring making operations that require assembly available on the stable
channel. This proposal is not about stabilizing the
asm!
macro itself.The main idea is that everything that requires assembly and can be implemented as external assembly
files should use that approach. For everything that requires inlining the assembly operation the
proposal is that the
core
crate will provide such functionality.This idea is not unlike what's currently being done in stdsimd land:
core::arch
providesfunctions that are thin wrappers around unstable LLVM intrinsics that provide SIMD functionality.
Similarly
core::asm
(module up for bikeshedding) would provide functionality that requiresasm!
but in a stable fashion. Example below:
This way the functionality can be preserved across LLVM upgrades; the maintainers of the
core
crate would update the implementation to match the current LLVM assembly syntax.
TODO (outdated)
provided in
core::asm
.cortex-m
that movesmost of the assembly operations to external assembly files by enabling some Cargo feature. What can't be moved to assembly files would be a candidate for inclusion in
core::asm
.cc @gnzlbg @nagisa could we get your thoughts on this? does this sound feasible, or does this sound
like a bad idea?
cc @dvc94ch @dylanmckay @pftbest could you help me identify assembly operations that require inline assemly on AVR, MSP430 and RICSV?
The text was updated successfully, but these errors were encountered: