-
Notifications
You must be signed in to change notification settings - Fork 165
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ABI alignment of vector types #347
Comments
Most implementations (especially in applications processors) will be designed to run LMUL=1 code efficiently, and so using greater alignment when LMUL>1 will probably not be profitable. My gut feeling is that using the existing stack alignment (16 bytes) is the way to go, but I don't have a strong argument for this proposal. One consideration is that V requires VLEN >= 128, and VLEN=128 will be a popular choice for many apps processors, so 16 bytes seems like a natural choice. Of course, when optimizing for a specific target, we can increase the alignment when stack-allocating vectors, without breaking the ABI. So if we picked an ABI alignment of 16 bytes, we wouldn't be screwing the VLEN=256 implementations too badly. |
Although not for the faint of heart, some programmers do perform type-punning on vector registers, reinterpreting the bits with the effect of fissioning or fusing consecutive vector elements. (In fact, the V-extension intrinsics provide auxiliary intrinsics to assist.) Without mandating stricter-than-element-width alignment, when compiling a C program like the following,
the compiler may need to reload I admit this is a contrived scenario. I mention this only to point out a subtlety with Kito's first option,
|
Sounds 128 bits alignment should be most nature choose for the alignment of vector types for all different LMUL, and that's also resolve the potential issue which @nick-knight mentioned, actually I heard same issue during collecting issue from different community guys, so the issue might not be existing in synthetic benchmark/testcase I think. Although one arguments is we could have VLEN=32 or VLEN=64, 128 bits alignment might waste some stack space for those 2 configurations, but that should be rare configuration, and even zve32* or zve64* configure still could have VLEN >= 128. This topic will put into next psABI call :) |
It's worth noting, since the case of VLEN of 256 or 512 was discussed, that x86-64's AVX, the closest comparison, specifies that |
note that rust's project-portable-simd may define vector types (not necessarily the same ones as are used for C FFI) to require alignment be small enough that the vector types have no padding (all Rust types have size that is a multiple of alignment), this allows reinterpreting pointers to aligned portions of any valid array slice as pointers to vector types (e.g. |
That's not unambiguously decided, however, or at least, I believe the point regarding preferred type punning alignment is a good one. |
well, conveniently every type combination that is valid to type pun (so doesn't try to e.g. type pun
|
The issue of Vector alignment is discussed in #347. It is mentioned that aligning to 128 bytes might deliver better performance on some RISC-V cores, but this behavior could lead to considerable stack wastage on zve32 and zve64 cores. For instance, in order to ensure a vector value in the stack conforms to the ABI specification, we could potentially waste up to 96 bits per vector object in stack for zve32, and the performance difference isn't always evident across all core implementations. Therefore, this proposal sets the alignment of vector types to element alignment, to avoid wasting a significant amount of stack space in zve32 and zve64 configurations. Also, since the ABI only specify the minimum alignment and doesn't limit the compiler from adopting higher alignment for specific CPUs. Fix #347.
Further discussion move to here: #380 :) |
The issue of Vector alignment is discussed in #347. It is mentioned that aligning to 128 bytes might deliver better performance on some RISC-V cores, but this behavior could lead to considerable stack wastage on zve32 and zve64 cores. For instance, in order to ensure a vector value in the stack conforms to the ABI specification, we could potentially waste up to 96 bits per vector object in stack for zve32, and the performance difference isn't always evident across all core implementations. Therefore, this proposal sets the alignment of vector types to element alignment, to avoid wasting a significant amount of stack space in zve32 and zve64 configurations. Also, since the ABI only specify the minimum alignment and doesn't limit the compiler from adopting higher alignment for specific CPUs. Fix #347.
The issue of Vector alignment is discussed in #347. It is mentioned that aligning to 128 bytes might deliver better performance on some RISC-V cores, but this behavior could lead to considerable stack wastage on zve32 and zve64 cores. For instance, in order to ensure a vector value in the stack conforms to the ABI specification, we could potentially waste up to 96 bits per vector object in stack for zve32, and the performance difference isn't always evident across all core implementations. Therefore, this proposal sets the alignment of vector types to element alignment, to avoid wasting a significant amount of stack space in zve32 and zve64 configurations. Also, since the ABI only specify the minimum alignment and doesn't limit the compiler from adopting higher alignment for specific CPUs. Fix #347.
We should define the ABI alignment of (scalable) vector types, that could be separated sub-item from the full vector ABI.
Vector extension only require vector load/store align to the element width, e.g. require 8 byte alignment for element width=64, however some RISC-V core implementation might require larger alignment for best performance.
So we have following options:
And we have LMUL in the vector extension, we might also consider that in the alignment if needed.
NOTE: ABI alignment is the minimal requirement, compiler/programmer could set that alignment to larger than the ABI alignment is also conformance to the ABI
The text was updated successfully, but these errors were encountered: