C like SIMD intrinsics for Rust #1639

siavashserver · 2016-06-05T03:47:27Z

One of important features missing from Rust lang for me and other guys involved with game development, DSP, video encoding, etc is proper SIMD support.

It's disappointing to see that such important feature is not still core feature of Rust lang, or at least receiving more love from developers after waiting since Rust's early days.

The current means of writing SIMD code like https://crates.io/crates/simd in Rust trying to provide a unified interface for different archs and abstracting stuff are way suboptimal. I like the idea of https://crates.io/crates/llvmint though.

Please give us m128, m128i, m128d, etc data types and access to whatever available inside xmmintrin.h, etc. Let us deal with platform abstractions and safety stuff ourselves.

Thanks!

The text was updated successfully, but these errors were encountered:

BurntSushi · 2016-06-05T04:46:22Z

Could you explain more about why the simd crate is "way suboptimal"?

sanxiyn · 2016-06-05T04:46:44Z

I am unsure what is being requested here, because #[repr(simd)] already give m128 and friends, and llvmint crate already gives everything inside xmmintrin.h and friends.

Is the request that repr_simd be stablized? Is the request that llvmint be included? (If so, why? It's a line away with Cargo.) Is the request for better documentation? Is the request to rename "simd" and "llvmint" to corresponding names in C so that they are more familiar?

briansmith · 2016-06-05T05:57:00Z

Is the request that repr_simd be stablized? Is the request that llvmint be included?

Whatever is needed to use llvmint using Rust Stable.

siavashserver · 2016-06-05T05:59:27Z

Could you explain more about why the simd crate is "way suboptimal"?

I have not looked deeper, but many functionalities available in C are missing there. To name a few stuff like prefetching (prefetchnta, etc), different load/store varieties (movntdqa and lddqu for example).

Is the request that repr_simd be stablized?

No; because of the given reasons up there.

Is the request that llvmint be included? (If so, why? It's a line away with Cargo.)

Kinda; SIMD support shouldn't be considered as some eye candy when advertising Rust as a high performance system programming language. It's not cool at all to see such essential feature to be handled by a single person as a hobby.

Is the request for better documentation?

Not really. It would be really appreciated if you guys use exactly same naming convention as C intrinsics or assembly ones to reduce the confusions. Besides, they are already documented very well in Intel's software manuals and all over the Internet, no need to redo. (for example: https://software.intel.com/sites/landingpage/IntrinsicsGuide/)

Is the request to rename "simd" and "llvmint" to corresponding names in C so that they are more familiar?

See above.

briansmith · 2016-06-05T06:04:16Z

It would be really appreciated if you guys use exactly same naming convention as C intrinsics or assembly ones to reduce the confusions. Besides, they are already documented very well in Intel's software manuals and all over the Internet,

I agree with this. Intel's names may not be the best, but using "better" names only makes things worse, IMO. I would like to have the exact-same names, and same better names for the generic abstraction on top.

sanxiyn · 2016-06-05T07:44:13Z

Thanks for replies. The initial post was quite light on details what "C like SIMD" entails, but replies were illuminating. The following is my understanding, please correct me if I misunderstood.

First, SIMD should be stable. I actually think everyone agrees. The reason it is not yet stable is because people disagree on the rest.

Second, SIMD should use Intel naming conventions. Note that this is not C naming conventions. As I understand, there are three different naming conventions: Intel, GCC, LLVM. Rust llvmint mechanically follows LLVM. (It is not the case Rust invented new naming scheme.) The suggestion is to mechanically follow Intel instead. Presumably, it is also suggested to follow ARM etc. for other platforms. The reason is that this allows reuse of existing documentation. I completely agree here and the reason is clear.

Here is an example with movntdqa: Intel: _mm_stream_load_si128. GCC: __builtin_ia32_movntdqa. LLVM: llvm.x86.sse41.movntdqa. Rust llvmint: sse41_movntdqa. I note that Intel deviates most from actual assembly name here.

Third, parts of SIMD support currently maintained as external library should be brought into the main Rust distribution. I am still unclear why, but it seems to me that it is hoped inclusion in the main distribution will bring more contribution. I agree it may be the case. Should it be part of std, or under rust-lang-nursery eventually moving to rust-lang like regex?

I am still unclear why current SIMD support shouldn't be stablized. RFC 1199 defined feature gate repr_simd for types, and platform_intrinsics for operations. It is true some operations are missing, but operations can be added incrementally, they are not breaking changes. So missing operations can't be the reason not to stablize. I don't think any types are missing.

I note that movntdqa and lddqu are in llvmint. For prefetchnta, llvmint only includes architecture generic form, not Intel architecture specific form, but architecture generic form is superset of Intel architecture specific form. Specifically, prefetchnta = _mm_prefetch(p, i) = prefetch(p, 0, i, 1). 0 is for read (1 is write), 1 is for data (0 is instruction). See http://llvm.org/docs/LangRef.html#llvm-prefetch-intrinsic for LLVM documentation.

eddyb · 2016-06-05T07:59:29Z

I am still unclear why current SIMD support shouldn't be stablized.

One reason I wouldn't stabilize is shuffles that don't use generic constant parameters.
I've managed to hack it up on the MIR side so that the argument is guaranteed to be constant, but it doesn't scale and it definitely doesn't allow proxying.

The main reason, however, is that there's no type-safety, no bounds on the operations.
Also, the existing intrinsic function system is a terrible misuse of the FFI import feature, from the old days of ad-hoc infrastructure necessary to bootstrap, and shouldn't be exposed in this way.

sanxiyn · 2016-06-05T07:59:51Z

By the way, Rust follows LLVM naming because Rust ultimately compiles to LLVM. While I think rationale for Intel naming is strong, it does mean Intel naming to LLVM naming mapping has to be maintained by Rust.

sanxiyn · 2016-06-05T08:04:47Z

@eddyb, I am aware of those problems. But OP said SIMD shouldn't be stablized because prefetchnta, movntdqa, lddqu are missing, which is non sequitur. First, they aren't missing, second, even if they are missing that's no reason not to stablize.

eddyb · 2016-06-05T08:10:25Z

@sanxiyn Oh, I see. Sorry for misunderstanding.

nagisa · 2016-06-05T12:05:50Z

Duplicate of #280

siavashserver · 2016-06-05T12:23:47Z

@sanxiyn Thanks for clearing things up!

I do agree with the points you did made; current SIMD proposal (RFC #1199) + missing features from llvmint and following platform specific (Intel/ARM/...) naming conventions sounds perfect to me.

Is it possible to give SIMD support stabilization a little higher priority, so we can get our hands on it in may be less than 6 months? Pleeaassee! :)

skade · 2016-06-05T12:57:25Z

@siavashserver Please be aware that we get a lot of requests for "higher priority". The best way to bring these features to conclusion is finding people competent with SIMD and rust and maybe help out with code submissions by implementing the RFC proposal. I'm aware that this isn't always possible.

There's many things that are disappointing because they are missing, but engineering time is the biggest problem here and no one is served with a mediocre stable feature.

eddyb · 2016-06-05T13:02:54Z

@skade FWIW Work is underway for const generic parameters (MIR is central to supporting them), but it might be a while before we have anything tangible, the guts of the compiler have to change significantly.

eddyb · 2016-06-05T13:07:13Z

In other words, the language feature a better SIMD is blocked on are relatively high-priority, it's just quite non-trivial.

The changes to intrinsics naming and missing intrinsics (if any) can be done in parallel, if a RFC is proposed to use some convention which predates LLVM. The implementation that governs those is actually a Python script and some JSON files, so contribution barrier should be low.

siavashserver · 2016-06-05T15:18:18Z

Sorry folks, I'm a bit confused. Do you mean that we can't see SIMD support getting sorted out till 2017 because of lack of enough man power and its complexity, and there are more important things to deal with in first place?

bstrie · 2016-06-05T16:49:06Z

@siavashserver Our ideal view of stable SIMD involves adding support for numeric type parameters, which itself demands massive changes and expansion to how we handle compile-time evaluation (see also https://github.com/solson/miri , a new Rust interpreter(!!!) that we'll likely be adding to the compiler), all of which is blocked on our middle-end's ongoing massive overhaul, a.k.a. MIR, whose last major blocker was just cleared hours ago (see rust-lang/rust#33622 ). We've got tons of people working on these things (though more help wouldn't hurt!), but these are quite enormous efforts with lots of design and iteration required, and meanwhile there are lots of other initiatives vying for priority (e.g. MIR landing will unblock not just CTFE, but also incremental recompilation and non-lexical borrows, both of which will likely be prioritized higher than CTFE by the official Rust devs (which doesn't mean that CTFE won't see attention, just that it might require community support to accelerate its development in the short-term)).

There are too many factors to give an accurate time estimate, but given the blockers and the number of other things on our plate I wouldn't expect stable SIMD to land any sooner than 2017, yes.

siavashserver · 2016-06-05T17:05:01Z

@bstrie Thank you very much for clarification!

eefriedman · 2016-06-05T19:44:06Z

Porting clang's xmmintrin.h to Rust is orthogonal to the work required to stabilize SIMD support. It could even be added to the standard library and stabilized before the standard library exposes any platform-independent SIMD types and operations. I'm not sure why everyone's saying that it's blocked by general SIMD stabilization.

It would probably be best to start off with an independent "immintrin" crate based on llvmint and simd. As far as I know, there isn't any actual missing functionality from those crates; they just doesn't expose the same names as the Intel headers. The resulting crate could then be imported into the standard library with only minor changes after going through the RFC process.

If anyone wants to pursue this, I can answer any questions; I have experience with the vector intrinsics in clang and LLVM.

briansmith · 2016-06-05T19:56:09Z

It would probably be best to start off with an independent "immintrin" crate based on llvmint and simd. As far as I know, there isn't any actual missing functionality from those crates

Good point. In fact, llvmint exposes lots of stuff that isn't SIMD that's useful.

Here is an example with movntdqa: Intel: _mm_stream_load_si128. GCC: __builtin_ia32_movntdqa. LLVM: llvm.x86.sse41.movntdqa. Rust llvmint: sse41_movntdqa. I note that Intel deviates most from actual assembly name here.

Also a good point. In order for the Intel names to make sense, everything (including the types) would have to be named the Intel way. At least in the example above, the LLVM/Rust name is clear because it's the name of the instruction.

I would be very happy to have llvmint working on Stable Rust as-is.

eddyb · 2016-06-05T19:58:43Z

I would be very happy to have llvmint working on Stable Rust as-is.

We will never stabilize something LLVM-specific like that, not without an abstraction layer on top.

comex · 2016-06-05T23:41:31Z

FWIW, it doesn't seem necessary to block all of SIMD on an issue (immediate operands) which only affects a small fraction of intrinsics (though, for the record, more than a handful). Omitting those intrinsics from an initial stabilization would be a bit weird, but not the end of the world.

A potential alternative is stabilizing inline assembly, so someone could create an unofficial SIMD crate that simulates intrinsics using that. They would then be free to use hacks like faking integer generics using array types or whatever.

nikomatsakis · 2016-06-06T08:50:02Z

There is a lot of focus here on stabilization -- but I am wondering how crucial it is that things be stable versus available in nightly builds?

nikomatsakis · 2016-06-06T08:50:47Z

(In other words, I got the impression that none of the existing crates were exporting the full functionality that was desired, but I'm not totally sure about that.)

bstrie · 2016-06-06T19:31:17Z

@nikomatsakis I'm not sure what you're proposing. A highly-desired feature available only via an unstable interface will become de facto standardized if it's in nightly for long enough, especially since nightly is infectious (for want of an unstable feature, the lib was nightly; for want of a nightly lib, the app was nightly). And the similar situation we're in now with syntax extensions being so long unstable is seen as universally undesirable (but at least we can break syntax extensions without forcing downstream consumers of syntax extensions to have to rewrite their code, a de jure unstable/de facto stable SIMD interface could lock us in forever (or at least force us to support a deprecated interface forever)).

BurntSushi · 2016-06-15T15:48:29Z

The main reason, however, is that there's no type-safety, no bounds on the operations.

Could you expand a bit more on this?

Also, the existing intrinsic function system is a terrible misuse of the FFI import feature, from the old days of ad-hoc infrastructure necessary to bootstrap, and shouldn't be exposed in this way.

Do you have any suggestion for an alternative?

eddyb · 2016-06-15T17:28:10Z

We've been hashing this out a bit on IRC, and there are two things that came out.
First, part of the ideas for intrinsics that have been floating around for a while, but didn't get into an RFC:

trait Vector {
    const ELEMS: usize;
    fn add(self, other: Self) -> Self;
    fn mul(self, other: Self) -> Self;
    fn shuffle<const I: [usize, Self::ELEMS]>(self, other: Self) -> Self;
    /* more general and per-platform functions */
}

// Everything is intentionally self-recursive below.
#[intrinsic="simd"]
impl<V: Vector> Vector for V {
    #[intrinsic="simd_elems"]
    const ELEMS: usize = V::ELEMS;

    #[intrinsic="simd_add"]
    fn add(self, other: Self) -> Self { self.add(other) }

    #[intrinsic="simd_mul"]
    fn mul(self, other: Self) -> Self { self.mul(other) }

    #[intrinsic="simd_shuffle"]
    fn shuffle<const I: [usize, V::ELEMS]>(self, other: V) -> V {
        self.shuffle::<I>(other)
    }
    ...
}

Such a scheme could preserve all of the genericity of the current "platform intrinsics", it would just be a bounded interface that accepts only SIMD types and also allows other crates to build more interesting generic abstractions on top of the libcore Vector trait.

Secondly, there is the question of whether being so generic is necessary, or good, especially with all of the platform-specific intrinsics, and @BurntSushi gave an example of _mm_cmpestri:

#[repr(simd)]
struct Simd128(...);

#[intrinsic="x86_pcmpestri128"]
fn _mm_cmpestri(a: Simd128, la: i32, b: Simd128, lb: i32, imm8: i32) -> i32 {
    _mm_cmpestri(a, la, b, lb, imm8)
}

Monomorphic platform intrinsics could still be adapted by a third-party crate to be used with arbitrary SIMD types by transmuting first, which would work especially well if that crate handled defining the SIMD types (such as with a macro), at which point the crate could expose only a relevant subset of the APIs.

I am torn, and I probably need to go back and re-read the discussion on the platform intrinsics RFC to find arguments against either of the fully-generics or the specific-types options, which both seem viable.

EDIT: Remove usage of core::ops::{Add, Mul} after @Diggsey pointed out that a specific SIMD type might want to not expose the built-in SIMD arithmetic, such as a quaternion.

BurntSushi · 2016-06-16T08:30:42Z

@eddyb Just to be more explicit, I think your comment implies that we'd have to stabilize things like _mm_cmpestri (and a whole boatload of other intrinsics) for it to be used by an external crate on Rust stable.

eddyb · 2016-06-16T09:00:51Z

@BurntSushi Yes, I don't how it could be used from Rust stable if we don't... stabilize it. Unless I'm missing something fundamental here.

eefriedman · 2016-07-12T07:38:03Z

The immintrin crate now officially exists! https://crates.io/crates/immintrin / https://github.com/eefriedman/rust-immintrin . I'm not sure exactly what possessed me to spend a day working on this, but hopefully it's useful.

seanjensengrey · 2016-07-24T22:46:53Z

@eefriedman @sanxiyn @huonw

On the Intel Intrinsics Guide

https://software.intel.com/sites/landingpage/IntrinsicsGuide/

They actually source the intrinsics data in XML from

https://software.intel.com/sites/landingpage/IntrinsicsGuide/files/data-3.3.14.xml

with records like

  <intrinsic tech="SSE3" vexEq="TRUE" rettype="__m128d" name="_mm_addsub_pd">
    <type>Floating Point</type>
    <CPUID>SSE3</CPUID>
    <category>Arithmetic</category>
    <parameter varname="a" type="__m128d"/>
    <parameter varname="b" type="__m128d"/>
    <description>Alternatively add and subtract packed double-precision (64-bit) floating-point elements in "a" to/from packed elements in "b", and store the results in "dst".</description>
    <operation>
FOR j := 0 to 1
        i := j*64
        IF (j is even) 
                dst[i+63:i] := a[i+63:i] - b[i+63:i]
        ELSE
                dst[i+63:i] := a[i+63:i] + b[i+63:i]
        FI
ENDFOR
        </operation>
    <instruction name="addsubpd" form="xmm, xmm"/>
    <perfdata arch="Haswell" lat="3" tpt="1"/>
    <perfdata arch="Ivy Bridge" lat="3" tpt="1"/>
    <perfdata arch="Sandy Bridge" lat="3" tpt="1"/>
    <perfdata arch="Westmere" lat="3" tpt="1"/>
    <perfdata arch="Nehalem" lat="3" tpt="1"/>
    <header>pmmintrin.h</header>
  </intrinsic>

see rust-lang/rust#28079

mayah · 2016-08-04T15:00:25Z

Hi, I'm also making similar one (https://crates.io/crates/x86intrin). When I started to create my library privately one month ago, I had known this thread but I didn't know someone is making the similar one. So I'm a bit surprised when I found the similar crate.

Anyway, I'm currently depending on my one to try to port my app in rust.
Naming convention is a bit different between mine and @eefriedman's. I actually need AVX2, so I'll proceed my implementation.

Not sure it's good the multiple similar libraries exist, but anyway it's good there are more people that use/test rust SIMD implementation.

hsivonen · 2016-08-29T11:45:25Z

@nikomatsakis,

There is a lot of focus here on stabilization -- but I am wondering how crucial it is that things be stable versus available in nightly builds?

If useful features are de facto "forever" in nightly only, then to use the useful things, one has to use nightly and stop paying attention to stable. If enough people do, stable becomes pointless and the features that are theoretically subject-to-change can't really be changed without breaking too many consumers. More concretely, if a feature I want to use in Firefox is in nightly Rust only, the obvious options are advocating making the feature available in stable Rust or advocating Firefox moving to nightly Rust.

@eddyb,

We will never stabilize something LLVM-specific like that, not without an abstraction layer on top.

For most of the llvmint, the functionality is not LLVM-specific but ISA-specific. The naming may be LLVM-specific but naming is a total bikeshed (as evidenced by the ISA vendors, GCC and LLVM having different names). Specifically, to have GCC compatibility, clang already has to be able to map GCC names to LLVM names. If the concern about "LLVM-specific" is about being able to change rustc to use another backend or to enable an independent compiler implementation with a non-LLVM back end, using ISA vendor naming or using LLVM naming is equally hard if the future back end uses neither (e.g. GCC naming).

As for the concrete present, since the llvmint crate exists, using the naming it uses would be the most straight-forward thing to use.

As for an abstraction layer beyond trivial name substitution, a SIMD API as part of the standard library, which is allowed to use compiler intrinsics internally, has failed to materialize. As long as llvmint isn't allowed to compile outside nightly as part of the ecosystem outside the standard library, a SIMD crate that'd compile outside nightly is not allowed to emerge as part of the ecosystem outside the standard library.

Moreover, there are things that don't fit a safe SIMD crate. In the SIMD domain, manual choice between fast but aligned access vs. slower but unaligned access isn't a good fit for a safe API. Furthermore, there are intrinsics that are outside the SIMD domain, e.g. AES and GCM-related instructions.

I think the practical way forward is to allow the llvmint crate to compile on stable and to allow the ecosystem to create safe abstractions on top (not necessarily as per-operation abstractions but potentially as safe functions that use a bunch of intrinsics internally).

Concretely, I'm writing a crate that is supposed to replace a C++ component in Gecko. The pre-existing C++ component accelerates a central operation using SSE2 intrinsics. This leaves the following options:

Regressing performance. (Obviously not OK.)
Calling to C++ for the operation that needs SSE2. (In principle, this would be a kind of admission of defeat for Rust. On the practical level, this would complicate the build, especially for a crate that supposed to work outside Gecko as well. And it would prevent inlining.)
Calling to ASM code assembled into an object file separately from Rust. (All the downsides of the previous point plus the downside of having to deal with instruction scheduling manually instead of letting LLVM deal with it.)
Inline asm!. (Not available in stable Rust, either, plus involves the downside of having to deal with instruction scheduling manually instead of letting LLVM deal with it.)
Waiting for the standard library to gain a safe abstraction for all the necessary SIMD operations. (Realistically, not going to happen within an acceptable timeframe.)
Using nightly Rust in Firefox. (This would make the Rust team and Linux distros sad.)
Stable Rust allowing llvmint to work. (This seems like the least bad of these options.)

(As for the non-ISA-specific LLVM intrinsics, leaving them out but allowing the ISA-specific ones would make things easier for future non-LLVM back ends but would favor writing ISA-specific code where ISA-independent code would be possible, which is probably a sadder near-term outcome than making future non-LLVM back ends to replicate LLVM's cross-ISA intrinsics.)

BurntSushi · 2016-12-01T13:46:47Z

For anyone following this issue, we have a (very) large ongoing thread about a path to resolving this specific issue: https://internals.rust-lang.org/t/getting-explicit-simd-on-stable-rust/4380

petrochenkov · 2018-02-24T14:30:53Z

Closing in favor of #2325

eddyb mentioned this issue Jun 17, 2016

Tracking issue for SIMD support rust-lang/rust#27731

Closed

seanjensengrey mentioned this issue Jul 24, 2016

use intel intrinsics xml data eefriedman/rust-immintrin#2

Open

nrc added T-lang Relevant to the language team, which will review and decide on the RFC. T-libs-api Relevant to the library API team, which will review and decide on the RFC. labels Aug 4, 2016

louy2 mentioned this issue Aug 5, 2016

Rewrite SIMD intrinsic with Rust? raphlinus/font-rs#9

Closed

EugeneGonzalez mentioned this issue Sep 20, 2016

Provide hardware-accelerated assembly versions EugeneGonzalez/bit_reverse#4

Open

frewsxcv mentioned this issue Feb 28, 2017

SIMD operations in Rust? rust-lang/rust#40140

Closed

petrochenkov closed this as completed Feb 24, 2018

C like SIMD intrinsics for Rust #1639

C like SIMD intrinsics for Rust #1639

Comments

siavashserver commented Jun 5, 2016

BurntSushi commented Jun 5, 2016

sanxiyn commented Jun 5, 2016

briansmith commented Jun 5, 2016

siavashserver commented Jun 5, 2016

briansmith commented Jun 5, 2016

sanxiyn commented Jun 5, 2016 • edited Loading

eddyb commented Jun 5, 2016

sanxiyn commented Jun 5, 2016

sanxiyn commented Jun 5, 2016

eddyb commented Jun 5, 2016

nagisa commented Jun 5, 2016

siavashserver commented Jun 5, 2016

skade commented Jun 5, 2016

eddyb commented Jun 5, 2016

eddyb commented Jun 5, 2016

siavashserver commented Jun 5, 2016 • edited Loading

bstrie commented Jun 5, 2016

siavashserver commented Jun 5, 2016

eefriedman commented Jun 5, 2016

briansmith commented Jun 5, 2016

eddyb commented Jun 5, 2016

comex commented Jun 5, 2016 • edited Loading

nikomatsakis commented Jun 6, 2016

nikomatsakis commented Jun 6, 2016

bstrie commented Jun 6, 2016 • edited Loading

BurntSushi commented Jun 15, 2016

eddyb commented Jun 15, 2016 • edited Loading

BurntSushi commented Jun 16, 2016

eddyb commented Jun 16, 2016

eefriedman commented Jul 12, 2016

seanjensengrey commented Jul 24, 2016

mayah commented Aug 4, 2016

hsivonen commented Aug 29, 2016 • edited Loading

BurntSushi commented Dec 1, 2016

petrochenkov commented Feb 24, 2018

sanxiyn commented Jun 5, 2016 •

edited

Loading

siavashserver commented Jun 5, 2016 •

edited

Loading

comex commented Jun 5, 2016 •

edited

Loading

bstrie commented Jun 6, 2016 •

edited

Loading

eddyb commented Jun 15, 2016 •

edited

Loading

hsivonen commented Aug 29, 2016 •

edited

Loading