Work-in-progress
This library provides a zero-overhead abstraction over std::arch
to enforce
many intrinsic pre-conditions at compile-time.
Consider the following std::arch
intrinsics:
-
_mm_castps_si128(a: __m128) -> __m128i
. Its name is informative, but its signature is not. This intrinsic casts a[4 x f32]
128-bit wide floating point vector into a[4 x i32]
integer vector, and passing it any other kind of input like a[2 x f64]
vector is going to produce garbage. This library uses thestd::simd
portable vector types to prevent these errors from happening while simulatenously making the intrinsics more convenient to use:// With std::arch this is an error: let y: i32x4 = std::arch::_mm_castps_si128(f32x4::splat(3.14)); // Two transmutes are required: let y: i32x4 = transmute(std::arch::_mm_castps_si128(transmute(f32x4::splat(3.14)))); // The transmute easily allow mistakes: let y: i64x2 = transmute(std::arch::_mm_castps_si128(transmute(f64x2::splat(3.14)))); // With typed_arch this works correctly: let y: i32x4 = _mm_castps_si128(f32x4::splat(3.14)); // And this does not compile let y: i64x2 = _mm_castps_si128(f64x2::splat(3.14)); // ERROR: expected f32x4
-
_mm_store_pd(mem_addr: *mut f64, a: __m128d)
. This intrinsic requiresmem_addr
to be aligned to a 16-byte boundary. Otherwise, a general-protection exception will be generated. When this happens, chances are that your program will crash. Withtyped_arch
passing this intrinsic an unaligned pointer is a compilation error. -
_mm_round_ps(a: __m128, rounding: i32)
. This intrinsic requires arounding
parameter which is actually is a bit-set for which only certain bit patterns make sense. Withtyped_arch
passing this intrinsic an invalid rounding mode is a compilation error.
Many many other pitfalls like these are all prevented by typed_arch
at compile-time.
The following table displays which target features are implemented and documented:
feature | Impl | Docs |
---|---|---|
mmx |
✓ | wip |
sse |
✗ | ✗ |
sse2 |
✓ | ✗ |
sse3 |
✗ | ✗ |
ssse3 |
✗ | ✗ |
sse41 |
✗ | ✗ |
sse42 |
✗ | ✗ |
sse4a |
✗ | ✗ |
avx |
✗ | ✗ |
avx2 |
✗ | ✗ |
aes |
✗ | ✗ |
abm |
✗ | ✗ |
tbm |
✗ | ✗ |
fxsr |
✗ | ✗ |
bswap |
✗ | ✗ |
eflags |
✗ | ✗ |
cpuid |
✗ | ✗ |
pclmulqdq |
✗ | ✗ |
rdrand |
✗ | ✗ |
rdtsc |
✗ | ✗ |
sha |
✗ | ✗ |
xsave |
✗ | ✗ |