-
Notifications
You must be signed in to change notification settings - Fork 165
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Standard Fixed-length Vector Calling Convention Variant #418
base: master
Are you sure you want to change the base?
Changes from all commits
9453d42
2df47cd
4da348a
1c19031
4902cef
76c1816
2140aa9
c26a99b
7dd1c9e
11b4776
7e9d68c
094be88
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -428,6 +428,179 @@ NOTE: `setjmp`/`longjmp` follow the standard calling convention, which clobbers | |
all vector registers. Hence, the standard vector calling convention variant | ||
won't disrupt the `jmp_buf` ABI. | ||
|
||
NOTE: Functions that use the standard vector calling convention | ||
variant follow an additional name mangling rule for {Cpp}. | ||
For more details, see <<Name Mangling for Standard Calling Convention Variant>>. | ||
|
||
=== Standard Fixed-length Vector Calling Convention Variant | ||
|
||
This section defines the calling convention variant for fixed-length vectors. | ||
The intention of this variant is to pass fixed-length vectors via the vector | ||
register. For the definition of a fixed-length vector, see | ||
<<Fixed-length vector>>. | ||
|
||
This variant is based on the standard vector calling convention variant: | ||
the register convention and the rules for passing arguments and return values | ||
are the same. | ||
|
||
NOTE: The reason we define a separate calling convention variant is that we | ||
would like to define a flexible convention to utilize the variable length | ||
feature in the vector extension, also considering embedded vector extensions, | ||
such as `Zve32x`. | ||
|
||
ABI_VLEN refers to the width of a vector register in the calling convention | ||
variant. | ||
kito-cheng marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
The ABI_VLEN must be no wider than the ISA's VLEN, meaning that the ISA may | ||
support wider vector registers than the ABI, but the ABI's VLEN cannot exceed | ||
the ISA's VLEN. | ||
|
||
ABI_VLEN represents the width (in bits) of the vector register available in the | ||
calling convention for fixed-length vectors. ABI_VLEN can vary from 32 bits | ||
(as in `Zve32x`) up to the maximum supported by the ISA. The flexibility of | ||
ABI_VLEN enables the convention to adapt to both low-end embedded systems and | ||
high-performance processors that utilize wider vector registers. | ||
|
||
The ABI_VLEN is a parameter of this calling convention variant. It could be set | ||
by the command line option for the compiler or specified by the function | ||
attribute in the source code. | ||
|
||
NOTE: We suggest the toolchain implementation set the default value of ABI_VLEN | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This isn't possible unless V or Zvl128b is in the ISA string since ABI_VLEN must be less than or equal to the ISA VLEN. |
||
to 128, as it's the most common minimal requirement. However, it is not fixed | ||
to 128, since the ISA allows the VLEN to be only 32 bits or 64 bits. This | ||
also enables the utilization of the capacity of longer VLEN. Users can build | ||
with an optimized library with larger ABI_VLEN for better utilization of those | ||
cores with longer VLEN. | ||
|
||
A fixed-length vector argument is passed in a vector argument register if the | ||
size of the vector is less than or equal to ABI_VLEN bit. | ||
|
||
[NOTE] | ||
=== | ||
Even in the absence of specific vector extension support for certain element | ||
types, such as `__bf16`, `_Float16`, `float`, or `double`, the standard | ||
fixed-length vector calling convention rules still apply. For example, | ||
even without the support of extensions like `Zvfbfmin`, `Zve32f`, or `Zve64d`, | ||
these element types will be passed according to the calling convention rules | ||
outlined here. | ||
|
||
Additionally, data types such as `__int128_t`, which currently do not | ||
have direct support in any vector extension, will also follow these rules. | ||
This design ensures that the calling convention remains forward-compatible, | ||
minimizing the need for continuous adjustments as new extensions and data types | ||
are introduced in the future. | ||
|
||
The consistency in applying these rules to unsupported element types guarantees | ||
a smooth transition when future vector extensions become available, allowing for | ||
seamless integration of new features without requiring significant changes to | ||
the calling convention. | ||
=== | ||
|
||
A fixed-length vector argument is passed in two vector argument registers, | ||
kito-cheng marked this conversation as resolved.
Show resolved
Hide resolved
|
||
similar to vector data arguments with LMUL=2, if the size of the vector is | ||
greater than ABI_VLEN bit and less than or equal to 2×ABI_VLEN bit. | ||
|
||
A fixed-length vector argument is passed in four vector argument registers, | ||
similar to vector data arguments with LMUL=4, if the size of the vector is | ||
greater than 2×ABI_VLEN bit and less than or equal to 4×ABI_VLEN bit. | ||
|
||
A fixed-length vector argument is passed in eight vector argument registers, | ||
similar to vector data arguments with LMUL=8, if the size of the vector is | ||
greater than 4×ABI_VLEN bit and less than or equal to 8×ABI_VLEN bit. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
|
||
|
||
[NOTE] | ||
=== | ||
Fixed-length vectors that are not a power-of-2 in size will be rounded up to | ||
the next power-of-2 length for the purpose of register allocation and handling. | ||
For instance, a vector type like `int32x3_t` (which contains three 32-bit | ||
integers) will be treated as an `int32x4_t` (a 128-bit vector, as LMUL=1) in | ||
the ABI, and passed accordingly. This ensures consistency in how vectors are | ||
handled and simplifies the process of argument passing. | ||
|
||
Example: Consider an `int32x3_t` vector (three 32-bit integers): | ||
- The vector's total size is 96 bits, which is not a power of 2. | ||
- The ABI will round up the size to 128 bits (corresponding to `int32x4_t`), | ||
meaning the vector will be passed using one vector argument register when | ||
ABI_VLEN=128. | ||
|
||
This rule applies to all non-power-of-2 fixed-length vectors, ensuring they | ||
are treated consistently across different ABI_VLEN settings. | ||
=== | ||
|
||
A fixed-length vector argument is passed by reference and is replaced in the | ||
argument list with the address if it is larger than 8×ABI_VLEN bit or if | ||
there is a shortage of vector argument registers. | ||
|
||
A struct containing members with all fixed-length vectors will be passed in | ||
vector argument registers like a vector tuple type if all members have the | ||
same length, the length is less than or equal to 4×ABI_VLEN bit, and the size of | ||
the whole struct is less than or equal to 8×ABI_VLEN bit. | ||
If there are not enough vector argument registers to pass the entire struct, | ||
it will pass by reference and is replaced in the argument list with the address. | ||
Otherwise, it will use the rule defined in the hardware floating-point calling | ||
convention. | ||
|
||
A struct containing just one fixed-length vector or a fixed-length vector | ||
array of length one, it will be flattened as a single fixed-length vector argument | ||
if the size of the vector is less than or equal to 8×ABI_VLEN bit. | ||
|
||
Struct with zero-length fixed-length arrays use the rule defined in the hardware | ||
floating-point calling convention, which means it won't consume vector argument | ||
register eitehr in C or {Cpp}. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. either* |
||
|
||
A struct containing just one fixed-length vector array is passed as though it | ||
were a vector tuple type if the size of the base element for the array is less than | ||
or equal to 8×ABI_VLEN bit, and the size of the array is less than 8×ABI_VLEN | ||
kito-cheng marked this conversation as resolved.
Show resolved
Hide resolved
|
||
bit. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
|
||
If there are not enough vector argument registers to pass the entire struct, | ||
it will pass by reference and is replaced in the argument list with the address. | ||
Otherwise, it will use the rule defined in the hardware floating-point | ||
calling convention. | ||
|
||
Unions with fixed-length vectors are always passed according to the integer | ||
calling convention. | ||
|
||
The details of vector argument register rules are the same as the standard | ||
vector calling convention variant. | ||
|
||
NOTE: Functions that use the standard fixed-length vector calling convention | ||
variant must be marked with STO_RISCV_VARIANT_CC. See <<Dynamic Linking>> | ||
for the meaning of STO_RISCV_VARIANT_CC. | ||
|
||
NOTE: Functions that use the standard fixed-length vector calling convention | ||
variant follow an additional name mangling rule for {Cpp}. | ||
For more details, see <<Name Mangling for Standard Calling Convention Variant>>. | ||
|
||
[NOTE] | ||
==== | ||
When ABI_VLEN is smaller than the VLEN, the number of vector argument | ||
registers utilized remains unchanged. However, in such cases, values are only | ||
placed in a portion of these vector argument registers, corresponding to the | ||
size of ABI_VLEN. The remaining portion of the vector argument registers, which | ||
extends beyond the ABI_VLEN, will remain idle. This means that while the full | ||
capacity of the vector argument registers may not be used, the allocation of | ||
these registers do not change, ensuring consistency in register usage regardless | ||
of the ABI_VLEN to VLEN ratio. | ||
|
||
Example: With ABI_VLEN at 32 bits and VLEN at 128 bits, consider passing an | ||
`int32x4_t` parameter (four 32-bit integers). | ||
|
||
Allocation: Four vector argument registers are allocated for | ||
`int32x4_t`, based on LMUL=4. | ||
|
||
Utilization: All four integers are placed in the first vector register, | ||
utilizing its full 128-bit capacity (VLEN), despite ABI_VLEN being 32 bits. | ||
|
||
Remaining Registers: The other three allocated registers remain unused and idle. | ||
==== | ||
|
||
NOTE: In a single compilation unit, different functions may use different | ||
ABI_VLEN values. This means that ABI_VLEN is not uniform across the entire unit, | ||
allowing for function-specific optimization. However, this necessitates that | ||
users ensure consistency in ABI_VLEN between calling and called functions. It | ||
is the user's responsibility to verify that the ABI_VLEN matches on both sides | ||
of a function call to ensure correct operation and data handling. | ||
|
||
=== ILP32E Calling Convention | ||
|
||
IMPORTANT: RV32E is not a ratified base ISA and so we cannot guarantee the | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The variant itself seems fine, modulo nits, but how are we planning to enable it?
If it's automatically used by
-march=rva23 -mabi=ilp32d
that will create major compatibility issues for binary distributions that use a fixed ABI and allow mixing packages at different architecture levels (either as an explicit user action, or as an implementation detail when rebuilding the distribution to change the architecture requirement).If a new
-mabi=
value is required to enable use of the variant, it will be usable on closed systems where all packages are built at once, but not on binary distributions, since there is no expectation that binary code built with different-mabi=
options is interoperable at all. This will include Debian and Alpine and might include Android and Fedora if their ABIs are finalized prior to the acceptance of this PR.If it's enabled on a per-function basis using an attribute, or automatically for functions not visible across DSO boundaries, then it's effectively part of the definition of the attribute or a compiler implementation detail and may belong in riscv-c-api-doc or gccint, not here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My expectation is that should be enabled by per-function basis by attribute, and I think that should have a riscv-c-api-doc PR for that, will send that in the next few days.