Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

remove cmsis module; add acle module #557

Merged
merged 31 commits into from
Feb 18, 2019
Merged

remove cmsis module; add acle module #557

merged 31 commits into from
Feb 18, 2019

Conversation

japaric
Copy link
Member

@japaric japaric commented Sep 3, 2018

ACLE (ARM C Language Extensions) is more general (supports ARMv4 to ARMv8) than
CMSIS (ARMv7-M and ARMv7-R)

Previous discussion: rust-embedded/wg#184

TODO in follow up PRs:

  • Move src/arm/dsp.rs into src/acle/{dsp,simd32}.rs. Also the names used in arm::dsp don't match ACLE -- they should start with __. cc @paoloteti

  • Add 64-bit system registers. cc @andre-richter

Pre-requisite: whitelist the aclass, v5te, v6k and v6t2 target features in rustc.

closes #437

r? @gnzlbg

Copy link
Contributor

@gnzlbg gnzlbg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the way the registers are extended with traits and used is a bit opinionated. I would like to see how the C code compares to the Rust code for some very simple tasks, like just reading the value of a register.

Typically we've tried to map as closely to C as possible - I am not opposed to something that is stylistically better, but the RFC will have to justify this.

All of this is pretty much "untested", but I think that we should merge this anyways and improve that afterwards - before the RFCs this will need to be tested and verified somehow but I don't want to block that work, nor ecosystem adoption, on any of that.

cc @alexcrichton once this is merged we should try to update stdsimd upstream so that the arm ecosystem can start depending on these. That will catch more bugs than we ever could during review and hopefully serve as a test-bed for these.

@gnzlbg gnzlbg mentioned this pull request Sep 3, 2018
@japaric
Copy link
Member Author

japaric commented Sep 3, 2018

Pre-requisite: whitelist the aclass, v5te, v6k and v6t2 target features in rustc

See rust-lang/rust#53926

@gnzlbg

I think the way the registers are extended with traits and used is a bit opinionated. I would like to see how the C code compares to the Rust code for some very simple tasks, like just reading the value of a register.

ACLE uses strings. For example usage, this is how CMSIS is implemented using ACLE. We could use strings but they are error prone (e.g. what should __rsr("NOT_A_REGISTER") and __rsr("ASPR") (that's a typo; it's "APSR") do? panic?) and IDK if rustc_args_required_const supports string literals?

@gnzlbg
Copy link
Contributor

gnzlbg commented Sep 3, 2018

and IDK if rustc_args_required_const supports string literals?

It should, but if it doesn't that should be fixable, it just needs to support &'static str. In any case, I am fine with the struct approach, the RFC will just have to explain why things are done the way they are.

@gnzlbg
Copy link
Contributor

gnzlbg commented Sep 3, 2018

Ping me when CI is green.

@japaric
Copy link
Member Author

japaric commented Sep 3, 2018

@gnzlbg CI is now green.

bors added a commit to rust-lang/rust that referenced this pull request Sep 3, 2018
@gnzlbg
Copy link
Contributor

gnzlbg commented Sep 3, 2018

@alexcrichton this direction is pretty much the consensus of the embedded-wg and this looks good to me. Do you want to / have time to review this?

@paoloteti
Copy link
Contributor

@japaric @gnzlbg

Move src/arm/dsp.rs into src/acle/{dsp,simd32}.rs. Also the names used in arm::dsp don't match ACLE -- they should start with __. cc @paoloteti

Sure, I'll do! Can We keep __ away on Rust? It is more a "C convention" that sound strange in Rust.
On MIPS We use __builtin_msa_* with (__) / __builtin prefix as described in the ISA spec?
BTW I don't have a strong opinion on __: I'm fine with or without it.

Copy link
Contributor

@gnzlbg gnzlbg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@parched commented here rust-embedded/wg#184 (comment) .

I think the following points should be addressed:

Arm generally prefers cross architecture portable solutions over adding intrinsics. For example, C11 atomics and in rust std::sync::atomic::spin_loop_hint is the portable YIELD. [...] yield isn't needed becuase it's already available as std::sync::atomic::spin_loop_hint.

Is there a "better" implementation of the YIELD intrinsic (e.g. using llvm intrinsics) than using inline assembly? What does spin_loop_hint use? If spin_loop_hint is available on core, maybe we should just call that here, or maybe, once this is merged, it might be worth it to update it to call this intrinsic. I am unsure what to do here.

The barrier and hint intrinsics don't need to be marked unsafe, only the system register access ones need to be unsafe.

We should keep all intrinsics unsafe for the time being, and if someone wants to relax that for some intrinsics, they can do so in a backwards compatible way in a follow up RFC after the main work is done.

To increase the portability, don't make the barrier argument availability conditional on target, just upgrade to next strongest when not available.

I think this could be built at zero-cost in a library on top of this one and I am unsure on whether we should add this type of portability here.

How about using an enum for the barrier parameter, rather than the specialization of some generic parameter? This will require a match for now as Rust can't yet enforce compile time arguments, but that match should be removed by the compiler when it's inlined. And hopefully be backwards compatible with const arguments in the future. What do you think @gnzlbg?

I think this is a good idea, and that we should make sure that it is possible to experiment with these APIs in a library on top of this one (maybe once this PR is merged those interested could try to build that and expose it on crates.io). It might be more worth it for the ecosystem to use a thin safer more ergonomic layer than these intrinsics directly.

Could wfi and nop be portable functions instead like above? Most architectures have equivalent instructions, and those that don't could just do nothing/compiler barrier.

I think it should also be possible to implement this in a crate on top of this.

Unfortunately rsr/wsr must take a &str parameter to allow for arbitrary "S_x_x_x_x" register names in future. For now, that means a match on the value to make a constant to pass to LLVM, but that should be fine given the limited number of named registers required for this initial M profile support. In future when compile time arguments are supported the match can be removed and all strings allowed.

@japaric what do you think? It might be worth it to just expose here a "stringy"-based API like ACLE does on C, and just expose an idiomatic API on top of it in crates.io.

Implementation of these intrinsics should probably use llvm.{arm,aarch64}.{hint,dmb,dsb.isb} and llvm.{read,write}_register rather than inline assembly as this gives LLVM (potentially) better understanding of what is going on.

@japaric there are examples of using link_llvm_intrinsics for this through the library (just grep for llvm.). I think this is worth addressing.

I would just put all the intrinsics in std::arch::arm (gated on target in the
future if needed) rather than duplicating them in std::arch::aarch64.

@parched sadly core/std are extremely explicit in that architecture-specific functionality is always behind a module containing the architecture name (arm/aarch64). The arm module is not available when targetting aarch64 (the same applies to x86/x86_64). This can be annoying, but is a std convention that might be an uphill battle to change at this point, and this can be worker around pretty easily in an external crate.


So from my POV, the bigger issue is that of whether we should expose a stringly typed API like ACLE does, or invent something new. As long as something better can be built on top of it at zero cost, I'd suggest "stick to what C does" here: it will get this merged sooner, the RFC will be smaller, the RFC process will be shorter because there will be less to discuss, etc. We are basically minimizing the risk of being incompatible with future changes / additions to ACLE.

@parched
Copy link

parched commented Sep 4, 2018

@gnzlbg

Is there a "better" implementation of the YIELD intrinsic (e.g. using llvm intrinsics) than using inline assembly?

You can check what IR clang lowers these to, in this case it's llvm.arm.hint(1). spin_loop_hint does just use inline asm though. My question is, does this need to be added here if it's already available in rust? In general I think it's wise to only add things here that have a reasonable use case to reduce the maintenance and chance for error.

We should keep all intrinsics unsafe for the time being, and if someone wants to relax that for some intrinsics, they can do so in a backwards compatible way in a follow up RFC after the main work is done.

Ok, that seems reasonable to me.

@parched sadly core/std are extremely explicit in that architecture-specific functionality is always behind a module containing the architecture name (arm/aarch64). ...

Ok, that's fair enough.

@gnzlbg
Copy link
Contributor

gnzlbg commented Sep 4, 2018

@parched

You can check what IR clang lowers these to, in this case it's llvm.arm.hint(1). spin_loop_hint does just use inline asm though. My question is, does this need to be added here if it's already available in rust? In general I think it's wise to only add things here that have a reasonable use case to reduce the maintenance and chance for error.

That makes sense. I personally think this intrinsic belongs here, and that spin lock hint should just be upgraded to call it once we land this. I don't know if this intrinsic is only useful for spin lock hint, or if it potentially can have other uses. If its the later, then putting it here makes it more "discoverable".

Copy link
Member

@alexcrichton alexcrichton left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems fine to me yeah! I don't personally know this well enough in the sense that I could provide any sort of API-design feedback or knowledge of the actual spec, but I'm willing to trust the embedded WG!

@andre-richter
Copy link
Member

The first time I got a chance to look at this.
I think I like how the shareability domains are structs that implement barrier instructions as traits. Did you come up with that for especially this PR or is this scheme used elsewhere already?

May I ask in which place it will be documented how to properly use this ACLE implementation? For example that using dsb() can/has to be combined with the provided shareability domain structs might not be clear until you browse through all the code.

Or will we spare this effort here because ideally it's us who wrap this right away into the cortex-m/r/a crates and elaborately document there?

@japaric
Copy link
Member Author

japaric commented Sep 15, 2018

Meta-comment: The embedded WG has removed "stable core::arch::arm" from the 2018 milestone since we have a good enough stable global_asm! hack that doesn't have a build dependency on (arm-none-eabi-)GCC. We'd still love to have these intrinsics because they can inline into user code but it's no longer a priority for this year.


@parched

How about using an enum for the barrier parameter, rather than the specialization of some generic parameter?

What's the motivation? The current approach is extensible and doesn't depend on unimplemented features.

Using an enum means that, to account for future additions (e.g. ARMv8.5 gets a new shareability domain), a hidden variant will have to be added to it. Which means that intrinsics that use the enum will have to include a match arm that panics on use of the hidden variant. Which means that unoptimized calls to the intrinsic will include the panicking branch, which could pull in the formatting machinery bloating the binary by a few KB.

If generics are so undesirable I would prefer to directly expose this:

#[rustc_args_required_const(0)]
pub fn __dmb(val: i32) {
    extern "C" {
        #[link_name = "llvm.arm.dmb"]
        fn __dmb(val: i32);
    }

    __dmb(val)
}

It errors at compile time ("LLVM ERROR: Cannot select: intrinsic %llvm.arm.dmb") when you use an invalid value (e.g. greater than 16).

Unfortunately rsr/wsr must take a &str parameter to allow for arbitrary "S_x_x_x_x" register names in future. (..).

Could you give examples of these arbitrary registers? I'm aware of registers of the form cp<coprocessor>:<opc1>:c<CRn>:c<CRm>:<opc2> where <coprocessor> and the other placeholders can take integer values from 0 to 7 or 0 to 15. Are you referring to those or to something else? (I wouldn't use the word "arbitrary" for those; there's a known number of them defined by ARM; "arbitrary" to me sounds like some ARM licensee can come up with any random register name and then only their toolchain would support them)

In future when compile time arguments are supported the match can be removed and all strings allowed.

Could you elaborate how compile time arguments let us specify arbitrary register names w/o a match? To me the solution would require string interpolation via a macro (see below).

macro_rules! __rsr {
    ($register:expr) => {
        {
            let r: u32;
            asm!(concat!("mrs $0,", $register) : "=r"(r) : : : "volatile");
            r
        }
    }
}

Implementation of these intrinsics should probably use llvm.{arm,aarch64}.{hint,dmb,dsb.isb} and llvm.{read,write}_register

SGTM but the LLVM docs say about llvm.{read,write}_register:

Warning: So far it only works with the stack pointer on selected architectures (ARM, AArch64, PowerPC and x86_64). Significant amount of work is needed to support other registers and even more so, allocatable registers.

Which mean I can't use those two in this PR.


@gnzlbg

what do you think? It might be worth it to just expose here a "stringy"-based API like ACLE does on C, and just expose an idiomatic API on top of it in crates.io.

I would prefer to avoid matches with panicking branches in this API because then anything built on top of this API will inherit them. Sure, the panicking branches will be optimized away in release but unoptimized code will keep them and will be bigger than it needs to be (because of the formatting machinery).

If we could get the miri in the compiler to pre-process functions that have the rustc_args_required_const attribute and eliminate all the unreachable branches then I'd be totally OK with matches and stringy APIs.


Given that we are not time pressured I would personally prefer to optimize this API to be as slim as inline asm! regardless of the applied optimization level.

@paoloteti
Copy link
Contributor

@japaric

Warning: So far it only works with the stack pointer on selected architectures (ARM, AArch64, PowerPC and x86_64). Significant amount of work is needed to support other registers and even more so, allocatable registers.

Probably this remark is outdated. I'm sure llvm.arm.write_register support other registers (probably all), but it is not a function you can link using link_name=. You need to write few lines of LLVM IR (at least in LLVM v6.0.x). clang use the LLVM API to generate this IR and its metadata.

@parched
Copy link

parched commented Sep 16, 2018

@japaric

What's the motivation? The current approach is extensible and doesn't depend on unimplemented features.

It just seems a bit more intuitive and nicer use to me, like atomic::fence's memory order parameter, and it closer matches the semantics of the instruction. What advantage does the generic approach have over explicit dmb_ish, dmb_osh etc.?

Which means that unoptimized calls to the intrinsic will include the panicking branch, which could pull in the formatting machinery bloating the binary by a few KB.

Yes, that's unfortunate, (sync::atomic suffers the same problem), I guess I was hoping in the future with const arguments that could be removed or if the user really cared they would be using an optimised build.

If generics are so undesirable I would prefer to directly expose this:

That could be fine, especially if the goal here is to be completely low level without any user friendliness. Although, I would prefer user friendliness if possible.

Could you give examples of these arbitrary registers?

I mean the likes of implementation defined registers which rustc/LLVM will never be able to know all the names of, e.g., L2CTLR_EL1 on the Cortex-A73 which is S3_1_C11_C0_2.

Could you elaborate how compile time arguments let us specify arbitrary register names w/o a match.

I was thinking by just delegating to llvm.read_register but @paoloteti mentions that might not be possible? Did you try it with .i32/.i64 suffix @paoloteti?

@paoloteti
Copy link
Contributor

@parched

I was thinking by just delegating to llvm.read_register but @paoloteti mentions that might not be possible? Did you try it with .i32/.i64 suffix @paoloteti?

llvm.read_register.i32 needs a "Type Metadata" as parameter and I know only two ways to generate metadata: using the LLVM Metadata class (as clang does) or with few lines of IR:

define i32 @read_apsr() nounwind {
entry:
  %0 = call i32 @llvm.read_register.i32(metadata !0)
  ret i32 %0
}
declare i32 @llvm.read_register.i32(metadata) nounwind
!0 = !{!"apsr"}

Previous IR code generate mrs r0, apsr.

By the way, due to the signature issue, if I try with #[link_name= ..] I get

Intrinsic has incorrect argument type!                                          
LLVM ERROR: Broken function found, compilation aborted! 

@parched
Copy link

parched commented Sep 16, 2018

Ah I see, I didn't notice the parameter was metadata. Looks like read/write_register would need special handling by the compiler then.

@gnzlbg
Copy link
Contributor

gnzlbg commented Sep 17, 2018

clang use the LLVM API to generate this IR and its metadata.

We probably need a platform-intrinsic for this in rustc that appropriate calls the builder API.

I would prefer to avoid matches with panicking branches in this API because then anything built on top of this API will inherit them.

This does not need matches with panicking branches AFAICT, unreachable_unchecked would do.

@gnzlbg
Copy link
Contributor

gnzlbg commented Sep 17, 2018

The only thing I am still unsure of is whether to use the str API that ACLE uses or the one that this PR uses. Initially, the one that @japaric suggested sounded good, but then @parched mentioned

Unfortunately rsr/wsr must take a &str parameter to allow for arbitrary "S_x_x_x_x" register names in future.

AFAICT we would need to handle those names in the intrinsic internals anyways, and if the intrinsic implementation does not support that, then newer register names won't work. I would prefer if we keep the approach that @japaric proposes here, since that would produce a compilation error in this case (e.g. because the register type is not available). This means we would need to add new types to support newer registers, but the implementation has to be upgraded anyways for those to work.

@paoloteti
Copy link
Contributor

@gnzlbg

I would prefer if we keep the approach that @japaric proposes here, since that would produce a compilation error in this case (e.g. because the register type is not available).

It is quite the same. The constant string used by clang in rsr/wsr generate a metadata entry.
For example !0 = !{!"apsr"} . If the string apsr is unknown to LLVM you still have an error.
The term 'arbitrary name' is not correct at all: string/name shall be known to LLVM in any case.
clang just check that the string is constant (inlined) and manage rsr/wsr as a real language extention/builtin instead here We are speaking about ACLE as a library.
I use ARM Compiler day-by-day and I'm more incline to @parched suggestion that sound more familiar to me, but it is just a personal choice.

@parched
Copy link

parched commented Sep 18, 2018

AFAICT we would need to handle those names in the intrinsic internals anyways, and if the intrinsic implementation does not support that, then newer register names won't work. I would prefer if we keep the approach that @japaric proposes here, since that would produce a compilation error in this case (e.g. because the register type is not available). This means we would need to add new types to support newer registers, but the implementation has to be upgraded anyways for those to work.

The AArch64 system registers take the form S[2-3]_[0-7]_C[0-15]_C[0-15]_[0-7] which mean there's 32768 of them if I've done my maths right. Rust won't need to support all of these but it will need to support the few hundred architecturally defined ones as well as the 2048 reserved for implementation definitions.

Ideally the intrinsic would validate they take the form of an architecturally defined name or the pattern above before letting LLVM error but that will likely depend on some detailed const argument support in rust. In C these instrinsics are normally compiler builtins so are not limited to what the language itself can do, could we do that here?

@gnzlbg
Copy link
Contributor

gnzlbg commented Sep 18, 2018

In C these instrinsics are normally compiler builtins so are not limited to what the language itself can do, could we do that here?

We can't do this in stdsimd, but we can add a rustc intrinsic that does this, and expose it in stable Rust from core::arch. That way we could require the argument to be a compile-time constant, validate it at compile-time without relying on LLVM to optimize match statements, and even produce type checking errors instead of monomorphization time errors which is what a stringly-typed solution in stdsimd would produce (stable Rust does not have any monomorphization time errors right now, and we shouldn't introduce any here in anything that we plan to stabilize).

@paoloteti
Copy link
Contributor

@gnzlbg Just to summarize: basically you are saying to extend arm.json in platform-intrinsics to generate IR for read/write_register ?

@gnzlbg
Copy link
Contributor

gnzlbg commented Sep 18, 2018

@paoloteti

basically you are saying to extend arm.json in platform-intrinsics to generate IR for read/write_register ?

No, I am suggesting to implement new fn read_register(&str) -> .../fn write_register(&str) -> ... platform-intrinsics that have the same semantics as the ACLE intrinsics. This isn't particularly hard, and would allow us to require &str to be a compile-time constant already, as well as be able to match the &str at compile-time, and generate the appropriate code.

@gnzlbg
Copy link
Contributor

gnzlbg commented Feb 18, 2019

If there are no more concerns / comments can we land this nightly?

I'll merge this as soon as Travis-CI is green (please ping me if I do not immediately do so, I do not get automatically pinged when that happens).

We can iterate on the API issues (string-API supporting all registers vs no-string-APIs supporting only some) in tree.

Co-Authored-By: japaric <jorge@japaric.io>
@mati865
Copy link
Contributor

mati865 commented Feb 18, 2019

Yet another error:

error[E0433]: failed to resolve: use of undeclared type or module `dsp`
   --> crates/core_arch/src/acle/simd32.rs:436:24
    |
436 |             assert_eq!(dsp::__qadd(-10, 60), 50);
    |                        ^^^ use of undeclared type or module `dsp`

@japaric
Copy link
Member Author

japaric commented Feb 18, 2019

I have fixed most tests; now I'm seeing 8 errors like this one with assert_instr:

---- core_arch::acle::simd32::assert___shsub8_shsub8 stdout ----
disassembly for shsub8_shim_shsub8:
         0: push {r4, r5, r6, r7, r8, lr}
         1: ldrb r5, [sp, #36] ; 0x24
         2: uxtb r2, r2
         3: ldrb lr, [sp, #28]
         4: uxtb r0, r0
         5: ldrb r6, [sp, #32]
         6: orr r2, r2, r3, lsl #8
         7: ldrb r8, [sp, #24]
         8: orr r0, r0, r1, lsl #8
         9: orr r1, r6, r5, lsl #8
        10: ldr ip, [pc, #56] ; 7eac <__shsub8_shim_shsub8+0x68>
        11: ldr r4, [pc, #56] ; 7eb0 <__shsub8_shim_shsub8+0x6c>
        12: pkhbt r0, r0, r2, lsl #16
        13: orr r2, r8, lr, lsl #8
        14: mov r7, #42 ; 0x2a
        15: pkhbt r1, r2, r1, lsl #16
        16: add ip, pc, ip
        17: shsub8 r0, r0, r1
        18: ldr r4, [pc, r4]
        19: uxth r1, r0
        20: lsr r2, r0, #16
        21: lsr r3, r0, #24
        22: lsr r1, r1, #8
        23: str ip, [r4]
        24: str r7, [r4, #4]
        25: pop {r4, r5, r6, r7, r8, pc}
        26: .word 0x0007100b
        27: .word 0x0009612c
thread 'core_arch::acle::simd32::assert___shsub8_shsub8' panicked at 'instruction found, but the disassembly contains too many instructions: #instructions = 28 >= 22 (limit)', crates/stdsimd-test/src/lib.rs:188:9

The instruction we want is there but there's a bunch of other stuff around. Should I increase the instruction limit to make these tests pass?

@gnzlbg
Copy link
Contributor

gnzlbg commented Feb 18, 2019

Should I increase the instruction limit to make these tests pass?

Yes, if you have inspected the assembly and it looks reasonable, you can increase the limit for particular instructions here: https://github.com/rust-lang-nursery/stdsimd/blob/master/crates/stdsimd-test/src/lib.rs#L135

@japaric
Copy link
Member Author

japaric commented Feb 18, 2019

@gnzlbg Travis is green

@gnzlbg gnzlbg merged commit e8ed083 into rust-lang:master Feb 18, 2019
gnzlbg pushed a commit that referenced this pull request Feb 18, 2019
gnzlbg pushed a commit that referenced this pull request Feb 18, 2019
gnzlbg pushed a commit that referenced this pull request Feb 18, 2019
gnzlbg pushed a commit that referenced this pull request Feb 18, 2019
gnzlbg pushed a commit that referenced this pull request Feb 18, 2019
gnzlbg pushed a commit that referenced this pull request Feb 18, 2019
also make these available on architectures that don't have a dedicated DMB / DSB
/ ISB instruction

addresses #557 (comment)
gnzlbg pushed a commit that referenced this pull request Feb 18, 2019
gnzlbg pushed a commit that referenced this pull request Feb 18, 2019
gnzlbg pushed a commit that referenced this pull request Feb 18, 2019
gnzlbg pushed a commit that referenced this pull request Feb 18, 2019
gnzlbg pushed a commit that referenced this pull request Feb 18, 2019
gnzlbg pushed a commit that referenced this pull request Feb 18, 2019
gnzlbg pushed a commit that referenced this pull request Feb 18, 2019
gnzlbg pushed a commit that referenced this pull request Feb 18, 2019
@japaric japaric deleted the acle branch February 18, 2019 18:33
@gnzlbg
Copy link
Contributor

gnzlbg commented Feb 18, 2019

Thanks a lot everybody! As mentioned, we can continue to iterate on this in tree as we gain experience with it.


I've released core_arch version 0.1.4 on crates.io with these changes so that you can start using this from crates.io right away - you can re-compile that easily with whatever target-features you want without having to use Xargo to re-compile libcore.

I'll send a PR upstream to update stdsimd to a version including this ASAP so that one can use it on nightly, but that always takes a while.


@japaric and @rust-lang-nursery/embedded-wg : after drafting an RFC for this and achieving consensus within the embedded WG on it, it might be wise to achieve consensus on the draft with some of the interested parties that have shown up in these issues and PRs before submitting a final RFC.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Implement ARM intrinsics for thumbv6 / thumbv7
8 participants