ABI alignment of vector types #347

kito-cheng · 2022-10-17T15:16:13Z

We should define the ABI alignment of (scalable) vector types, that could be separated sub-item from the full vector ABI.

Vector extension only require vector load/store align to the element width, e.g. require 8 byte alignment for element width=64, however some RISC-V core implementation might require larger alignment for best performance.

So we have following options:

Alignment as the element width.
Require higher alignment for vector types, e.g. align to 8 byte or 16 byte for all vector type for any element width.

And we have LMUL in the vector extension, we might also consider that in the alignment if needed.

NOTE: ABI alignment is the minimal requirement, compiler/programmer could set that alignment to larger than the ABI alignment is also conformance to the ABI

aswaterman · 2022-10-17T20:35:42Z

And we have LMUL in the vector extension, we might also consider that in the alignment if needed.

Most implementations (especially in applications processors) will be designed to run LMUL=1 code efficiently, and so using greater alignment when LMUL>1 will probably not be profitable.

My gut feeling is that using the existing stack alignment (16 bytes) is the way to go, but I don't have a strong argument for this proposal. One consideration is that V requires VLEN >= 128, and VLEN=128 will be a popular choice for many apps processors, so 16 bytes seems like a natural choice.

Of course, when optimizing for a specific target, we can increase the alignment when stack-allocating vectors, without breaking the ABI. So if we picked an ABI alignment of 16 bytes, we wouldn't be screwing the VLEN=256 implementations too badly.

nick-knight · 2022-10-17T23:58:22Z

Although not for the faint of heart, some programmers do perform type-punning on vector registers, reinterpreting the bits with the effect of fissioning or fusing consecutive vector elements. (In fact, the V-extension intrinsics provide auxiliary intrinsics to assist.) Without mandating stricter-than-element-width alignment, when compiling a C program like the following,

// vec_u8 passed on stack, according to current calling convention
void foo (vuint8m1_t vec_u8) {
  // 8-bit arithmetic with very high vector register pressure    
  vuint64m1_t vec_u64 = vreinterpret_v_u8m1_u64m1(vec_u8); // need to reload it
  // 64-bit arithmetic
}

the compiler may need to reload vec_u8 with a vl1re8.v, which may set the wrong expectations on an implementation that internally rearranges vector data for different element widths (to reduce datapath wiring costs). On the other hand, if alignment were >= 8B, the compiler could safely generate the appropriate vl1re*.v.

I admit this is a contrived scenario. I mention this only to point out a subtlety with Kito's first option,

Alignment as the element width

kito-cheng · 2022-11-10T18:01:46Z

Sounds 128 bits alignment should be most nature choose for the alignment of vector types for all different LMUL, and that's also resolve the potential issue which @nick-knight mentioned, actually I heard same issue during collecting issue from different community guys, so the issue might not be existing in synthetic benchmark/testcase I think.

Although one arguments is we could have VLEN=32 or VLEN=64, 128 bits alignment might waste some stack space for those 2 configurations, but that should be rare configuration, and even zve32* or zve64* configure still could have VLEN >= 128.

This topic will put into next psABI call :)

workingjubilee · 2023-02-23T05:53:21Z

It's worth noting, since the case of VLEN of 256 or 512 was discussed, that x86-64's AVX, the closest comparison, specifies that __m256 and __m512 are aligned to 32 and 64 bytes. However, in practice, the VEX encoding of these SIMD instructions guarantees, in general, that using these instructions off-alignment is acceptable (with I believe the notable exception of vmovaps?), whereas the SSE instructions and their legacy encodings sometimes enforces 16 byte alignment. So you would not be alone in having these types have a cut-off point at 16 byte alignment, even if it is more "effective" than "by-spec" in that case.

programmerjake · 2023-02-23T09:37:44Z

note that rust's project-portable-simd may define vector types (not necessarily the same ones as are used for C FFI) to require alignment be small enough that the vector types have no padding (all Rust types have size that is a multiple of alignment), this allows reinterpreting pointers to aligned portions of any valid array slice as pointers to vector types (e.g. &[u32] -> &[Simd<u32, 7>]): rust-lang/portable-simd#319

workingjubilee · 2023-02-23T20:06:49Z

That's not unambiguously decided, however, or at least, I believe the point regarding preferred type punning alignment is a good one.

programmerjake · 2023-02-23T23:50:41Z

well, conveniently every type combination that is valid to type pun (so doesn't try to e.g. type pun i32x3 to i64x2 where the last i64 is half undef) already satisfies the alignment requirements if we decide the alignment is gcd(sizeof(element) * length, next_power_of_2(sizeof(element) * length), some_global_constant_limit) (compatible with what i proposed for portable-simd) because:

assuming some_global_constant_limit >= sizeof(element) for all possible element types (handles wanting vector alignment to not increase without bound)
assuming element types are power-of-2 sized which afaict is true for RVV
assuming sizeof(target_element) * target_length <= sizeof(source_element) * source_length aka. that the type pun is valid because the target vector's elements are completely within the source elements and not taking bytes from after the end of the source elements
every type pun where sizeof(target_element) <= sizeof(source_element) the alignment works out so the target vector type needs the same or less alignment than the source (always the same alignment if sizeof(target_element) * target_length == sizeof(source_element) * source_length)
every type pun where sizeof(target_element) > sizeof(source_element) the assumption that the type pun is valid means that the source vector's length is a multiple of sizeof(target_element) / sizeof(source_element) which is a power of two therefore the alignment again works out so the target vector type needs the same or less alignment than the source (always the same alignment if sizeof(target_element) * target_length == sizeof(source_element) * source_length)

The issue of Vector alignment is discussed in #347. It is mentioned that aligning to 128 bytes might deliver better performance on some RISC-V cores, but this behavior could lead to considerable stack wastage on zve32 and zve64 cores. For instance, in order to ensure a vector value in the stack conforms to the ABI specification, we could potentially waste up to 96 bits per vector object in stack for zve32, and the performance difference isn't always evident across all core implementations. Therefore, this proposal sets the alignment of vector types to element alignment, to avoid wasting a significant amount of stack space in zve32 and zve64 configurations. Also, since the ABI only specify the minimum alignment and doesn't limit the compiler from adopting higher alignment for specific CPUs. Fix #347.

kito-cheng · 2023-05-18T09:45:22Z

Further discussion move to here: #380 :)

The issue of Vector alignment is discussed in #347. It is mentioned that aligning to 128 bytes might deliver better performance on some RISC-V cores, but this behavior could lead to considerable stack wastage on zve32 and zve64 cores. For instance, in order to ensure a vector value in the stack conforms to the ABI specification, we could potentially waste up to 96 bits per vector object in stack for zve32, and the performance difference isn't always evident across all core implementations. Therefore, this proposal sets the alignment of vector types to element alignment, to avoid wasting a significant amount of stack space in zve32 and zve64 configurations. Also, since the ABI only specify the minimum alignment and doesn't limit the compiler from adopting higher alignment for specific CPUs. Fix #347.

kito-cheng added this to the Post 1.0 milestone Oct 17, 2022

programmerjake mentioned this issue Feb 24, 2023

non-power-of-2 Simd types have wrong size rust-lang/portable-simd#319

Closed

kito-cheng mentioned this issue May 18, 2023

[Vector ABI] Type size and alignment for vector types. #380

Merged

kito-cheng closed this as completed in #380 Jan 10, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ABI alignment of vector types #347

ABI alignment of vector types #347

kito-cheng commented Oct 17, 2022

aswaterman commented Oct 17, 2022

nick-knight commented Oct 17, 2022 •

edited

Loading

kito-cheng commented Nov 10, 2022

workingjubilee commented Feb 23, 2023

programmerjake commented Feb 23, 2023

workingjubilee commented Feb 23, 2023 •

edited

Loading

programmerjake commented Feb 23, 2023

kito-cheng commented May 18, 2023

ABI alignment of vector types #347

ABI alignment of vector types #347

Comments

kito-cheng commented Oct 17, 2022

aswaterman commented Oct 17, 2022

nick-knight commented Oct 17, 2022 • edited Loading

kito-cheng commented Nov 10, 2022

workingjubilee commented Feb 23, 2023

programmerjake commented Feb 23, 2023

workingjubilee commented Feb 23, 2023 • edited Loading

programmerjake commented Feb 23, 2023

kito-cheng commented May 18, 2023

nick-knight commented Oct 17, 2022 •

edited

Loading

workingjubilee commented Feb 23, 2023 •

edited

Loading