Generic #simd type and intrinsics #1807

gingerBill · 2022-05-26T12:56:04Z

#simd[N]T

where T must be an integer, float, or boolean, with no specific endianness. It is an semi-opaque data type which will not allow for indexing. All operators for array programming that are supported by [N]T are supported for the #simd[N]T.

The lane width (len(#simd[N]T) == N) of the simd vector type must be one of the following: 1, 2, 4, 8, 16, 32, 64 (power of two between 1..=64)

Note: Platform-specific SIMD intrinsics will be added in the future.

Rationale for a Generic SIMD Interface

Each platform has completely different instructions which may not work or behave the same, and have completely different trade-offs depending on the context of their use. In many cases, you want to be able to use those specific instructions in different ways depending on the specific platform, but there are numerous reasons for wanting to a highish-level generic abstraction for SIMD:

The foundations allow for the ability to have a much higher controlled and expressive specific implementation (specific instructions).
Better semantic information by having a more expressive type system which aids both the programmer and the compiler (for vectorization optimizations and debugging purposes).
An easier way to design around SIMD from the get-go (with a generic approach, assuming you can vectorize the code) and then allowing the programmer to write the specific platform code when necessary.
Many targets already have made a semi-generalized SIMD interface which you now have to deal with, of which LLVM IR (without lowering into the specific intrinsics) and WASM (upcoming "proposal" which hasn't been merged yet) being the two most prominent examples. The generic layer I have already covers pretty much everything those two targets support (except for a couple of pseudo-instructions).

The end goal has always been to have a very well designed inline assembler embedded into Odin and have that assembly language partially understand its parent language. The inline assembler has been put on the back-burner for a little while until other things have been implemented and finalized.

The inline assembler is the only language feature missing from Odin now which when that is done, I will be writing up the full language specification and then have v1.0.

Rationale for Opaqueness

Indexing

Lane/element indexing is currently not allowed on a #simd type because when replacing an element, it will do so by creating a new vector rather updating the current vector.

x = simd.extract(v, 1)
v = simd.replace(v, 1, x)

v[1] could be allowed for extracting but it would not make much sense for replacing:

v[1] = x
// would be equivalent to
v = simd.replace(v, 1, x)

This behaviour would necessitate another addressing mode in Odin for it to work correctly. Keeping things to procedure calls makes things much more consistent, less surprising, and clearer what is actually happening.

Intrinsics

simd_add(a, b) -> c
simd_sub(a, b) -> c
simd_mul(a, b) -> c
simd_div(a, b) -> c // floats only

// Keeps Odin's Behaviour
// (x << y) if y <= mask else 0
simd_shl(a, b) -> c
simd_shr(a, b) -> c

// Similar to C's Behaviour
// x << (y & mask)
simd_shl_masked(a, b) -> c
simd_shr_masked(a, b) -> c

// Saturation Arithmetic
simd_add_sat(a, b) -> c
simd_sub_sat(a, b) -> c

simd_and(a, b) -> c
simd_or(a, b) -> c
simd_xor(a, b) -> c
simd_and_not(a, b) -> c

simd_neg(a) -> b

simd_abs(a) -> b

simd_min(a) -> b
simd_max(a) -> b
simd_clamp(v, min, max) -> w

// Return an unsigned integer of the same size as the input type
// NOT A BOOLEAN
// element-wise:
//     false => 0x00...00
//     true  => 0xff...ff
simd_lanes_eq(a, b) -> c
simd_lanes_ne(a, b) -> c
simd_lanes_lt(a, b) -> c
simd_lanes_le(a, b) -> c
simd_lanes_gt(a, b) -> c
simd_lanes_ge(a, b) -> c

// extract :: proc(a: #simd[N]T, idx: uint) -> T
simd_extract(a, idx) -> e
// replace :: proc(a: #simd[N]T, idx: uint, elem: T) -> #simd[N]T
simd_replace(a, idx, e) -> b

simd_reduce_add_ordered(a) -> b
simd_reduce_mul_ordered(a) -> b
simd_reduce_min(a) -> b
simd_reduce_max(a) -> b
simd_reduce_and(a) -> b
simd_reduce_or(a) -> b
simd_reduce_xor(a) -> b

// shuffle :: proc(a, b: #simd[N]T, #const indices: ..int) -> #simd[len(indices)]T
// The builtin `swizzle` procedure works on #simd too as well as arrays
simd_shuffle(a, b, ..) -> c

// select :: proc(cond: #simd[N]boolean_or_integer, true, false: #simd[N]T) -> #simd[N]T
simd_select(cond, a, b) -> c


simd_sqrt(a) -> b
simd_ceil(a) -> b
simd_floor(a) -> b
simd_trunc(a) -> b
simd_nearest(a) -> b
simd_to_bits(a) -> b

simd_lanes_reverse(a) -> b

// simd_lanes_rotate_left({1, 2, 3, 4}, 1) == {2, 3, 4, 1}
simd_lanes_rotate_left(a, offset) -> b
// simd_lanes_rotate_left({1, 2, 3, 4}, 1) == {4, 1, 2, 3}
simd_lanes_rotate_right(a, offset) -> b

// Supported both integers and simd vectors of integers
count_ones(a) -> b
count_zeros(a) -> b
count_trailing_zeros(a) -> b
count_leading_zeros(a) -> b
reverse_bits(a) -> b

// Supported both floats and simd vectors of floats
fused_mul_add(a, b, c) -> d

non_temporal_load(a) -> b
non_temporal_store(a, b)

New Package `core:simd`

Contains aliases of the intrinsics.simd_* as just simd.* e.g. intrinsics.simd_ceil is simd.ceil.

Utility procedures within core:simd:

to_array_ptr :: proc(v: ^#simd[$LANES]$E) -> ^[LANES]E {...}
to_array     :: proc(v: #simd[$LANES]$E) -> [LANES]E {...}
from_array   :: proc(v: $A/[$LANES]$E) -> #simd[LANES]E {...}
from_slice   :: proc($T: typeid/#simd[$LANES]$E, slice: []E) -> T {...}

bit_not  :: proc(v: $T/#simd[$LANES]$E) -> T where intrinsics.type_is_integer(E) {...}
copysign :: proc(v, sign: $T/#simd[$LANES]$E) -> T where intrinsics.type_is_float(E) {...}
signum   :: proc(v: $T/#simd[$LANES]$E) -> T where intrinsics.type_is_float(E) {...}
recip    :: proc(v: $T/#simd[$LANES]$E) -> T where intrinsics.type_is_float(E) {...}

New Package `core:simd/x86`

Support for x86 specific instructions from the following sets:

ABM
ADX
cmpxchg16b
FXSR
pclmulqdq
SHA
SSE
SSE2
SSE3
SSE4.1
SSE4.2
SSSE3

Attributes

TODO for future specific stuff:

Compile-time Conditional CPU feature checks - @(require_target_feature=<string>)
Compile-time Unconditional CPU feature checks - @(enable_target_feature=<string>)

Compile Intrinsics

intrinsics.x86_cpuid :: proc(ax, cx: u32) -> (eax, ebc, ecx, edx: u32) ---
intrinsics.xgetbv :: proc(cx: u32) -> (eax, edx: u32) ---

require_target_feature - required by the target micro-architecture enable_target_feature - will be enabled for the specified procedure only

gingerBill added 30 commits May 25, 2022 17:26

Make #simd an opaque type

b032d5a

Mock out simd intrinsics

3b54015

Implement backend for simd intrinsics

81dd727

Allow basic casting of simd vectors

f21e9ee

Simplify transmute for #simd

5c72974

Add simd_extract and simd_insert

4c44801

Add ranges for simd compounds literals

53f0c6e

Allow for non-constant simd vector compound types

0203bb6

Rename simd_insert to simd_replace

b168bf9

Restrict swizzle to a power of two for #simd

1549d01

Correct parapoly for #simd

63cc8a8

Add core:simd

8ac1288

Add intrinsics.simd_reduce_*

10e4de3

Add raw_simd_data

63d6c08

Allow booleans for #simd

808ea30

intrinsics.simd_shuffle

140c00a

Correct casting between integer and boolean #simd

09f936b

Add comments

57e69ea

Add intrinsics.simd_select

7002c94

Document simd stuff in intrinsics.odin

12d19d2

Minor clean up

8e57511

Allow integer vectors in select

c2f5cbd

Make simd_shuffle act closer to swizzle

cde6a2f

Remove unneeded mask

83d880a

Fix simd_shuffle

b95ca80

Remove intrinsics.odin.simd_vector in favour of #simd[N]T

0633712

Add simd.{sqrt, ceil, floor, trunc, nearest}

0fd43c1

Add simd_reverse

7ec0236

Add simd_add_sat simd_sub_sat

3550281

Add simd_rotate_left simd_rotate_right`

e331b06

gingerBill added 23 commits May 27, 2022 23:07

Add ssse3 support

4db533f

Improve vector comparison == != for horizontal reduction

618d3bf

Add intrinsics.x86_cpuid and intrinsics.x86_xgetbv

d7eaf0f

Remove old code

c60d784

Add cpu_features for core:simd/x86

910799c

Add amd64 specific instructions

3ad2cde

Add abm.odin

7f3540b

Add adx.odin

77d4409

Add fxsr.odin

846f837

Add rdtsc.odin

babfba5

Add pclmulqdq.odin

0ccbea1

Add cmpxchg16b

f5e5eac

Add sha.odin

bc3bf93

Use single line attributes

f6dfa33

Rename to lanes_rotate_left, lanes_rotate_right, lanes_reverse

cef0225

@(require_target_feature=<string>) @(enable_target_feature=<string>)

f3aefbc

require_target_feature - required by the target micro-architecture enable_target_feature - will be enabled for the specified procedure only

Fix lb_build_builtin_simd_proc

a0babef

Add SSE4.1

5170703

Correct @(require_results) on parapoly procedures

5b42dd7

Improve missing handled results for built in procedures

f3868ac

Add @(require_results) to all appropriate procedures

912d29a

Add SSE4.2

68222cb

Add enable_target_feature to ABM

4e49d24

gingerBill marked this pull request as ready for review May 30, 2022 15:17

gingerBill requested a review from Kelimion May 30, 2022 15:17

gingerBill added 3 commits May 30, 2022 16:42

Correct intrinsics.odin for documentation

cb10af0

Correct documentation

a7840d5

Fix intrinsics.non_temporal_{load, store}

516f664

gingerBill merged commit a1f15c2 into master May 31, 2022

gingerBill deleted the simd-dev branch May 31, 2022 10:53

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Generic #simd type and intrinsics #1807

Generic #simd type and intrinsics #1807

gingerBill commented May 26, 2022 •

edited

Loading

Generic #simd type and intrinsics #1807

Generic #simd type and intrinsics #1807

Conversation

gingerBill commented May 26, 2022 • edited Loading

Rationale for a Generic SIMD Interface

Rationale for Opaqueness

Indexing

Intrinsics

New Package core:simd

New Package core:simd/x86

Attributes

Compile Intrinsics

gingerBill commented May 26, 2022 •

edited

Loading

New Package `core:simd`

New Package `core:simd/x86`