Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Generic #simd type and intrinsics #1807

Merged
merged 86 commits into from
May 31, 2022
Merged

Generic #simd type and intrinsics #1807

merged 86 commits into from
May 31, 2022

Conversation

gingerBill
Copy link
Member

@gingerBill gingerBill commented May 26, 2022

#simd[N]T

where T must be an integer, float, or boolean, with no specific endianness. It is an semi-opaque data type which will not allow for indexing. All operators for array programming that are supported by [N]T are supported for the #simd[N]T.

The lane width (len(#simd[N]T) == N) of the simd vector type must be one of the following: 1, 2, 4, 8, 16, 32, 64 (power of two between 1..=64)

Note: Platform-specific SIMD intrinsics will be added in the future.

Rationale for a Generic SIMD Interface

Each platform has completely different instructions which may not work or behave the same, and have completely different trade-offs depending on the context of their use. In many cases, you want to be able to use those specific instructions in different ways depending on the specific platform, but there are numerous reasons for wanting to a highish-level generic abstraction for SIMD:

  • The foundations allow for the ability to have a much higher controlled and expressive specific implementation (specific instructions).
  • Better semantic information by having a more expressive type system which aids both the programmer and the compiler (for vectorization optimizations and debugging purposes).
  • An easier way to design around SIMD from the get-go (with a generic approach, assuming you can vectorize the code) and then allowing the programmer to write the specific platform code when necessary.
  • Many targets already have made a semi-generalized SIMD interface which you now have to deal with, of which LLVM IR (without lowering into the specific intrinsics) and WASM (upcoming "proposal" which hasn't been merged yet) being the two most prominent examples. The generic layer I have already covers pretty much everything those two targets support (except for a couple of pseudo-instructions).

The end goal has always been to have a very well designed inline assembler embedded into Odin and have that assembly language partially understand its parent language. The inline assembler has been put on the back-burner for a little while until other things have been implemented and finalized.

The inline assembler is the only language feature missing from Odin now which when that is done, I will be writing up the full language specification and then have v1.0.

Rationale for Opaqueness

Indexing

Lane/element indexing is currently not allowed on a #simd type because when replacing an element, it will do so by creating a new vector rather updating the current vector.

x = simd.extract(v, 1)
v = simd.replace(v, 1, x)

v[1] could be allowed for extracting but it would not make much sense for replacing:

v[1] = x
// would be equivalent to
v = simd.replace(v, 1, x)

This behaviour would necessitate another addressing mode in Odin for it to work correctly. Keeping things to procedure calls makes things much more consistent, less surprising, and clearer what is actually happening.

Intrinsics

simd_add(a, b) -> c
simd_sub(a, b) -> c
simd_mul(a, b) -> c
simd_div(a, b) -> c // floats only

// Keeps Odin's Behaviour
// (x << y) if y <= mask else 0
simd_shl(a, b) -> c
simd_shr(a, b) -> c

// Similar to C's Behaviour
// x << (y & mask)
simd_shl_masked(a, b) -> c
simd_shr_masked(a, b) -> c

// Saturation Arithmetic
simd_add_sat(a, b) -> c
simd_sub_sat(a, b) -> c

simd_and(a, b) -> c
simd_or(a, b) -> c
simd_xor(a, b) -> c
simd_and_not(a, b) -> c

simd_neg(a) -> b

simd_abs(a) -> b

simd_min(a) -> b
simd_max(a) -> b
simd_clamp(v, min, max) -> w

// Return an unsigned integer of the same size as the input type
// NOT A BOOLEAN
// element-wise:
//     false => 0x00...00
//     true  => 0xff...ff
simd_lanes_eq(a, b) -> c
simd_lanes_ne(a, b) -> c
simd_lanes_lt(a, b) -> c
simd_lanes_le(a, b) -> c
simd_lanes_gt(a, b) -> c
simd_lanes_ge(a, b) -> c

// extract :: proc(a: #simd[N]T, idx: uint) -> T
simd_extract(a, idx) -> e
// replace :: proc(a: #simd[N]T, idx: uint, elem: T) -> #simd[N]T
simd_replace(a, idx, e) -> b

simd_reduce_add_ordered(a) -> b
simd_reduce_mul_ordered(a) -> b
simd_reduce_min(a) -> b
simd_reduce_max(a) -> b
simd_reduce_and(a) -> b
simd_reduce_or(a) -> b
simd_reduce_xor(a) -> b

// shuffle :: proc(a, b: #simd[N]T, #const indices: ..int) -> #simd[len(indices)]T
// The builtin `swizzle` procedure works on #simd too as well as arrays
simd_shuffle(a, b, ..) -> c

// select :: proc(cond: #simd[N]boolean_or_integer, true, false: #simd[N]T) -> #simd[N]T
simd_select(cond, a, b) -> c


simd_sqrt(a) -> b
simd_ceil(a) -> b
simd_floor(a) -> b
simd_trunc(a) -> b
simd_nearest(a) -> b
simd_to_bits(a) -> b

simd_lanes_reverse(a) -> b

// simd_lanes_rotate_left({1, 2, 3, 4}, 1) == {2, 3, 4, 1}
simd_lanes_rotate_left(a, offset) -> b
// simd_lanes_rotate_left({1, 2, 3, 4}, 1) == {4, 1, 2, 3}
simd_lanes_rotate_right(a, offset) -> b

// Supported both integers and simd vectors of integers
count_ones(a) -> b
count_zeros(a) -> b
count_trailing_zeros(a) -> b
count_leading_zeros(a) -> b
reverse_bits(a) -> b

// Supported both floats and simd vectors of floats
fused_mul_add(a, b, c) -> d

non_temporal_load(a) -> b
non_temporal_store(a, b)

New Package core:simd

Contains aliases of the intrinsics.simd_* as just simd.* e.g. intrinsics.simd_ceil is simd.ceil.

Utility procedures within core:simd:

to_array_ptr :: proc(v: ^#simd[$LANES]$E) -> ^[LANES]E {...}
to_array     :: proc(v: #simd[$LANES]$E) -> [LANES]E {...}
from_array   :: proc(v: $A/[$LANES]$E) -> #simd[LANES]E {...}
from_slice   :: proc($T: typeid/#simd[$LANES]$E, slice: []E) -> T {...}

bit_not  :: proc(v: $T/#simd[$LANES]$E) -> T where intrinsics.type_is_integer(E) {...}
copysign :: proc(v, sign: $T/#simd[$LANES]$E) -> T where intrinsics.type_is_float(E) {...}
signum   :: proc(v: $T/#simd[$LANES]$E) -> T where intrinsics.type_is_float(E) {...}
recip    :: proc(v: $T/#simd[$LANES]$E) -> T where intrinsics.type_is_float(E) {...}

New Package core:simd/x86

Support for x86 specific instructions from the following sets:

  • ABM
  • ADX
  • cmpxchg16b
  • FXSR
  • pclmulqdq
  • SHA
  • SSE
  • SSE2
  • SSE3
  • SSE4.1
  • SSE4.2
  • SSSE3

Attributes

TODO for future specific stuff:

  • Compile-time Conditional CPU feature checks - @(require_target_feature=<string>)
  • Compile-time Unconditional CPU feature checks - @(enable_target_feature=<string>)

Compile Intrinsics

  • intrinsics.x86_cpuid :: proc(ax, cx: u32) -> (eax, ebc, ecx, edx: u32) ---
  • intrinsics.xgetbv :: proc(cx: u32) -> (eax, edx: u32) ---

@gingerBill gingerBill marked this pull request as ready for review May 30, 2022 15:17
@gingerBill gingerBill requested a review from Kelimion May 30, 2022 15:17
@gingerBill gingerBill merged commit a1f15c2 into master May 31, 2022
@gingerBill gingerBill deleted the simd-dev branch May 31, 2022 10:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants