-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
crABI v1 #3470
base: master
Are you sure you want to change the base?
crABI v1 #3470
Conversation
This comment was marked as resolved.
This comment was marked as resolved.
for // in std
pub trait BoxDrop<T: ?Sized>: Sized {
fn box_drop(v: Pin<Box<T, Self>>);
}
impl<T: ?Sized, A: Allocator> BoxDrop<T> for A {
#[inline]
fn box_drop(v: Pin<Box<T, Self>>) {
struct DropPtr<T: ?Sized, A: Allocator>(*mut T, A, Layout);
impl<T: ?Sized, A: Allocator> Drop for DropPtr<T, A> {
#[inline]
fn drop(&mut self) {
if self.2.size() != 0 {
unsafe { self.1.deallocate(NonNull::new_unchecked(self.0).cast(), self.2) }
}
}
}
let l = Layout::for_value::<T>(&v);
let (p, a) = Box::into_raw_with_allocator(unsafe { Pin::into_inner_unchecked(v) });
let v = DropPtr(p, a, l);
unsafe { v.drop_in_place() }
}
}
// the standard Box type
pub struct Box<T: ?Sized, A: BoxDrop<T> = Global>(...);
// replacement version of Drop for Box
impl<T: ?Sized, A: BoxDrop<T>> Drop for Box<T, A> {
#[inline]
fn drop(&mut self) {
A::box_drop(unsafe { ptr::read(self) }.into_pin())
}
} usage demo: pub struct FooDropper;
impl BoxDrop<Foo> for FooDropper {
fn box_drop(v: Pin<Box<Foo, FooDropper>>) {
drop_foo(v);
}
}
extern "crabi" {
pub type Foo;
pub fn make_foo() -> Pin<Box<Foo, FooDropper>>;
pub fn drop_foo(v: Pin<Box<Foo, FooDropper>>);
} |
@programmerjake That's an interesting alternative! Is there a way, rather than having to implement a trait, to instead have a single type parameterized with a function type? |
I thought about it, but it's very annoying to give function types a name, since you currently have to use TAIT: struct FnDropper<T: ?Sized, F: Fn(Pin<Box<T, FnDropper<T, F>>>)>(F);
type FooDropFn = impl Fn(Pin<Box<Foo, FnDropper<Foo, FooDropFn>>>)
extern "crabi" {
type Foo;
fn make_foo() -> Pin<Box<Foo, FnDropper<Foo, FooDropFn>>>;
fn drop_foo(v: Pin<Box<Foo, FnDropper<Foo, FooDropFn>>>);
}
#[defining(FooDropFn)]
fn _f() -> FooDropFn {
drop_foo
} |
maybe better usage demo: // std API
pub struct FFIDropper;
// like C++ unique_ptr but where deleter is defined by T
pub type FFIBox<T: ?Sized> = Box<T, FFIDropper>;
// user API, this impl could easily just be a
// #[box_drop = drop_foo] proc-macro annotation on Foo
impl BoxDrop<Foo> for FFIDropper {
fn box_drop(v: Pin<FFIBox<Foo>>) {
drop_foo(v);
}
}
extern "crabi" {
pub type Foo;
pub fn make_foo() -> Pin<FFIBox<Foo>>;
pub fn drop_foo(v: Pin<FFIBox<Foo>>);
} |
lots more discussion about |
@programmerjake I attempted to partially summarize that proposal in the alternatives section. I do agree that if we accepted that general proposal, it makes sense to use it for the specific case of crABI's handling of |
Are there other languages interested in this proposal? Or is the target to enable an ABI between Rust code? |
There is interest from Nim (disclaimer: I'm Nim's BDFL). But for Nim it would be really nice if "bit flags" which Nim maps to its Also quite discouraging is the lack of Swift support, IMO. A common ABI for subsets of Nim, Rust, Swift and C++ seems quite feasible. |
Part of the goal here is to have a baseline level of support from any language that speaks C FFI, which then means that any language with a C FFI immediately has the ability to interoperate with crABI. Everything beyond that is then about the convenience of native support (whether language or library), rather than about whether it's supported or not. So, anything supported would need to map to an underlying C data type that can be passed through the C ABI. (I do expect that eventually we'll want to support a full object/trait protocol, but I'm trying to get there incrementally rather than trying to do it all at once. The initial round of crABI support is optimizing for ease of initial support/adoption, rather than completeness.) @Araq This mechanism: https://nim-lang.org/docs/manual.html#set-type-bit-fields ? (Verifying: does Nim support bitfields wider than a base integral data type, or do they have to fit in a base integral data type?) That seems like a reasonable data type to support cross-language. At a minimum, that seems like an ideal candidate for crABI v1.1 (which I expect to follow closely on the heels of crABI v1.0). |
Correct.
It supports bitsets wider than any integral data type indeed. But an ABI could limit it to an integral type. |
- `Option<bool>` is passed using a single `u8`, where 0 is `Some(false)`, 1 is | ||
`Some(true)`, and 2 is `None`. | ||
- `Option<char>` is passed using a single `u32`, where 0 through `0xD7FF` and | ||
`0xE000` through `0x10FFFF` are possible `char` values, and `0x110000` is |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there any possibility that unicode could expand the range of valid code points to include 0x110000? If so would u32::max()
or one of the surragate pair numbers be better?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there any possibility that unicode could expand the range of valid code points to include 0x110000?
Rust assumes it will never happen. That is not the choice I would make, but it is the choice that Rust makes. The Unicode Consortium supposedly will never do this, and it will cause breakage if they do, but of course it is not literally impossible that they might reverse their commitment not to do this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
With the way UTF-16 is designed, it's literally impossible to encode unicode scalar values >= 0x110000
. UTF-8/32 have ways to encode those out-of-range values but unicode almost certainly never will use them because they want UTF-16 to keep working since it's used soo many places: Win32, Java, JavaScript, C#, VB.net, etc.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
char
is a Unicode Scalar Value:
https://www.unicode.org/glossary/#unicode_scalar_value
Unicode scalar values outside the defined range are specifically forbidden by the Unicode Standard:
Ill-formed: A Unicode code unit sequence that purports to be in a Unicode encoding form is called ill-formed if and only if it does not follow the specification of that Unicode encoding form.
…
Any code unit sequence that would correspond to a code point outside the defined range of Unicode scalar values would, for example, be ill-formed.
https://www.unicode.org/glossary/#ill_formed_code_unit_sequence
(So expanding the range would be a breaking change for a lot of existing Unicode software.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is the reason 0x110000 is used for the niche rather than u32::MAX
for Rust in general?
- Once `extern "C"` supports C-compatible handling of `u128` and `i128`, | ||
`extern "crabi"` should do the same. | ||
|
||
- Extensible enums. To define types that allow for extension, crABI would |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Related to this, there are probably situations where it would be useful to declare a size and/or alignment of an enum to be larger than necessary, for future compatibility to allow adding new data that would otherwise change the ABI.
For a struct this can be done with a padding field, but that is somewhat difficult to do for an enum. I suppose you could make an unused, hidden variant with a value of the right size and alignment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Padding fields are not ideal either as they can end up in JSON or toString representations too easily where they are noise at best and a bug at worst.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe the same issue: we have found it difficult to safely define a Rust type corresponding to a C or C++ enum
, because an out-of-range representation of a Rust enum is immediate UB, unlike in C/C++. We had to resort to using a repr(C)
or repr(transparent)
struct
on the Rust side in order to be able to gracefully handle errors on the C/C++ side.
and the corresponding C/C++ type:
https://github.com/zcash/zcash/blob/2112e467ee31ea95cf81904a6aae397fa3d031ae/src/rust/include/rust/zip339.h#L17-L38
(Rust here is stricter than C++17. In the latter, casting an integer outside the range of the enumeration values to the enum
type is UB, but it is not UB to cast a value to the enum
type that is within range but not one of the defined values.)
|
||
- Is there a better way we can handle tuple types? Having to use a distinct | ||
syntax like `cr#()` is onerous; one of the primary values of tuples is | ||
brevity. In the future, if we have variadic generics, we could potentially |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Or, converserly, variadic generics could build on the crabi tuples. There is definitely some overlap here, since having a tuple with well defined structure, with fields in the declaration order is also useful for using recursion to process the first element of a tuple, and pass the rest of the tuple to next iteration.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As you note, there is some overlap, in that both crabi tuples and 'a tuple where you can borrow the tail by reference' (which could potentially used for variadic generics) need the fields to be stored in order.
However, in other ways they have opposite goals. 'A tuple where you can borrow the tail by reference' would likely need to store, e.g., cr#(u8, u8, u32)
like cr#(u8, (u8, u32))
. First, this would be less efficient due to additional padding introduced. While crABI does already sacrifice layout efficiency to some extent, by giving up reordering and some niche optimizations, that doesn't mean it doesn't care about performance, and this would be an unnecessary extra loss. Second, given crABI's goal of being easy to bind to other languages, it's beneficial for cr#(u8, u8, u32)
to translate directly to the obvious C struct equivalent (struct { uint8_t a; uint8_t b; uint32_t c; }
), rather than to some other structure.
Given that this is intended for cross-languange interfaces there should probably be a formal, mostly language agnostic specification of the ABI. Should this RFC discuss where that specification should go, and how it will be created? Will maintainers from other languages (such as nim) be involved in that process? |
I suggest adding to the RFC that crABI will not initially define a stable LLVM CFI mangling (see also #3296, ping @rcvalle). For an example of where this would be an issue, it's one thing to say that (quoting the RFC):
But under LLVM CFI, if One potential solution is to define a standard C++ equivalent name for CFI purposes, e.g. Another potential solution is to define a standard C struct name (say, I think a better approach is to add an attribute to Clang to customize the CFI mangling for a C struct definition. After all, I'm not aware of any other implementations of LLVM CFI besides rustc and Clang, so it's not critical for us to fit into the existing constraints. We could then either keep using vendor extended types or go with the C++ equivalent name; which approach works better would depend on the design of said attribute. But I'm not volunteering to send a patch to Clang; I think that onus is on anyone who wants to use crABI in the context of CFI-protected (C/C++)-Rust interop. If that doesn't happen soon, though, then what? Clearly there's no need to block anything about crABI not related to CFI. The question is whether it should block stabilizing the LLVM CFI mangling. There's an argument that it shouldn't: we could stabilize the CFI mangling as-is and it would still be useful to protect Rust-to-Rust calls. But given the desire to interoperate well with other languages, I think it would be better to wait on stabilizing the CFI mangling until we have a full answer for how it will interop with Clang. |
Please leave name mangling unspecified until interop between different languages has been established in some prototypes. It also does not have to be part of crABI at all and could be a different spec altogether. |
- Should we provide *fewer* niche optimizations? Those for `NonZero` and | ||
reference types provide obvious value; are those for `bool` and `char` really | ||
useful enough to justify the special case in languages that will have to | ||
handle them explicitly rather than automatically? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I haven't seen Option<bool>
in many APIs, but some string APIs use Option<char>
. Is there a way to count how popular these types are in libraries?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just loosely, https://github.com/search?q=%22Option%3Cbool%3E%22&type=code shows 199k results for Optoin<bool>
but only 24.6k for Option<char>
https://github.com/search?q=%22Option%3Cchar%3E%22&type=code
## `[T; N]` - fixed-size array by value | ||
|
||
crABI supports passing a fixed-size array by value (as opposed to by | ||
reference). To represent this via the C ABI, crABI treats this equivalently to | ||
a C array of the same type passed by value. | ||
|
||
Note that this means C code can use an array directly in a context where it | ||
will be interpreted by-value (such as in a struct field), but needs to use a | ||
structure with a single array field in contexts where it would otherwise be | ||
interpreted as a pointer (such as in a function argument or return value). | ||
|
||
For instance: | ||
|
||
```rust | ||
extern "crabi" fn func(rgb: [u16; 3]) | ||
``` | ||
|
||
is equivalent to: | ||
|
||
```c | ||
struct func_rgb_arg { | ||
uint16_t array[3]; | ||
}; | ||
extern void func(struct func_rgb_arg rgb); | ||
``` | ||
|
||
Note that crABI does *not* pass the length, since it's a compile-time constant; | ||
the recipient must also know the correct size. (Use one of the slice-based | ||
types for a type with a runtime-determined length.) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I seem to remember issues about zero-sized types in C; what's expected behaviour if N
is 0
? I'd expect it to act like the other zero-sized types, i.e. the function parameter in it's entirety would just not be there in the C ABI, but that's not entirely clear from the example.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the [T; 0]
case might depend on whether the array is passed by reference (address and length) or value (just address). It seems like passing by value is ambiguous here.
Also, what's the expected behaviour for arrays containing zero-sized types?
Is it different for [(), N]
and [(), 0]
?
Is it different passing by value and by reference?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For the rust abi all ZST's are ignored. Most C abi's follow this as well, but on a couple of archs ZST's still consume a register: https://github.com/rust-lang/rust/blob/6ef7d16be0fb9d6ecf300c27990f4bff49d22d46/compiler/rustc_ty_utils/src/abi.rs#L419-L421 I think we should ignore them on all archs. The C side can simply always omit the args as necessary, but it feels wrong for x86_64-pc-windows-gnu and x86_64-pc-windows-msvc to have an incompatible crABI given that it is possible to call MSVC libraries using MinGW just fine for as long as you have the right import library and in fact MinGW depends on this to call system libraries.
structures, unless the newer side restricts itself to features understood by an | ||
older version. | ||
|
||
This RFC defines crABI 1.0. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I know that we feel pretty confident in most of the decisions in this RFC, but since crABI has not seen actual usage, especially regarding interop with other languages, it feels prudent to me if we started at 0.1.0
and allowed ourselves breaking changes, and only decided to move to 1.0
once enough (different) eyes have been laid on the problem.
One way to reduce the complexity of the niche optimizations would be to only use two values for the niche value:
These can continue to use the zero value for
These could all use the all bits set (i.e. A second, further way to simplify the specification here and allow users to enable the niche optimization for their own types would be to define a
|
Slices translate to a by-value struct containing two fields: a pointer to the | ||
element type, and a `size_t` number of elements, in that order. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this properly translate to the same limitation as rust slices? I.e. at most isize::MAX bytes but at most usize::MAX items (for ZSTs)?
As a special case, if an `enum` using `repr(crabi)` has exactly two variants, | ||
one of which has no fields and the other of which has a single field, and the | ||
single field type has a specific type (defined below) with a "niche" value, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What about non_exhaustive
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is any non_exhaustive
type eligible for a stable ABI like this? Or would it somehow be limited to extensions in ways that don't break the ABI? Then again, do we have restrictions on adding fields or other layout-affecting changes being made to extant ABI-declared types?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe non_exhaustive
is orthogonal with "stable ABI".
For example, a linux syscall struct must have a stable layout, but should be planned with later extension in mind, as it may gain additional variants/options in future kernel versions, but won't change the layout of existing syscalls.
e.g. by including some reserved fields or leaving space in a bitset for to-be-added flags.
non_exhaustive
would then remind consumers to defensively program against "timetravelers from the future" (e.g. return -EINVAL;
in the linux syscall example)
Another example would be the io_uring, which keeps adding new opcodes as the kernel implementation grows (i.e. additionally-accepted values for the opcode discriminant)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
#[non_exhaustive]
doesn't have any effect for the crate that defines the type. Instead it has an effect on users. This means that if the user tried to use a newer version of the type, doing so while using the older version of the code that defined is would be UB without said code being able to report an error for the invalid discriminant. Also extensions to #[non_exhaustive]
types can only possibly work for the memory layout. The calling convention can drastically change even for adding a single field or enum variant.
that wouldn't work for (some of) the Win32 handle types, because iirc negative handles are perfectly valid, it's just |
|
I'm referring to |
*nightly-only* implementations of that version of crABI. Versions of crABI | ||
should not be considered stable until available in stable Rust. | ||
|
||
Future versions of crABI may also establish allow-by-default lints for the use |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As a (wildly excited) outsider, this raises a question for me: will the deny-activation of such lints be enough info for the compiler to restrict itself to a lower subset of crABI?
Or must we reserve some kind of "version" specifier in the repr
attribute? Something like repr(crABI, u8, v=2.1)
. perhaps a global annotation on the module level, to avoid repetition?
I'm thinking of cases like an enum that would have different possibilities of niche-optimisation under crABI v1, v2, v3..
Does the compiler have access to the active lint selection in the enum layout code?
Are lints even supposed to act like compiler flags to this extent?
Hm, I am quite a bit surprised that the RFC doesn’t talk about aliasing at all. Consider this crabi function declaration:
is the code calling the function required to ensure that src and dst do not overlap? Is the code implementing the function allowed to assume that the slices do not overlap? Do we just use Type Based Alias Analysis rules here by virtue of just deferring to C ABI? |
If the rust side is
TBAA is entirely incompatible with rust. |
Yes, but, if I understand this right, that's what the current RFC implicitly proposes to use, by saying that "we lower to C ABI". Taking example from https://stefansf.de/post/type-based-alias-analysis/, the following crabi declaration: extern void foo(int *x, short *y) also carries implicit constraint that We definitely don't want to have that in crabi, because that's not how the Rust works, and not how an ideal ABI would work. But that means we need to explicitly define aliasing rules for crabi, as otherwise we are inheriting those from C. |
If I have some crabi function which takes, say, I found the manual conversions was one of the annoying parts about using the C ABI in https://github.com/Neotron-Compute/Neotron-FFI/blob/develop/src/option.rs (e.g. here or here) But also, yay, I could basically delete the neotron-ffi crate. |
You don't get UB in C just by having pointers that alias, only if you actually dereference them with incompatible types. So the full example from the post you linked has UB if void foo(int *x, short *y) {
*x = 40;
*y = 0;
*x += 2;
} But this version would not, as long as the pointee was originally ether a dynamic allocation or a variable of declared type void foo(int *x, short *y) {
*x = 40;
memset(y, 0, sizeof(short));
*x += 2;
} As a result, there's no need for crABI to, say, munge the types to A potential exception is that if a pointer to a Rust local or global variable is sent to C, the C side might want to know what the "declared type" is for the purpose of C's aliasing model. The right answer should be that the Rust variable is like a dynamic allocation and doesn't have a declared type. But I'm not sure if that works in all cases in the current implementation, or if it should be guaranteed for all Rust implementations. Still, it seems somewhat out of scope… |
I would actually like to see a deny by default lint&compiler flag (which is a new thing I think) that would (when allowed/warned) make the transparent conversion between crabi::Option and std::Option (and the result type as well). As this would be an deny by default, you wouldn't fall into relying on it by accident, but would still allow someone to quickly prototype/get/use an FFI API quickly. I do feel strongly against having this in everyday program, as it is indeed basically a deep memory copy, but this could allow some leeway when trying stuff. It is true that having to duplicate those types is sad, but as said in the RFC, there seems to be no other way. |
I know I'm just one data point, but as a Rust user who is excited about crABI: I get it. I'll live. Better a bit of minor friction than lurking dragons. And it seems to me that it's pretty easy to leave the door open to creative solutions in future, so at least to me it doesn't seem like solving it now should be considered a blocker. |
The translation of slices and similar uses structs containing pointer/length | ||
pairs, rather than inlining the pointer and length as separate arguments. | ||
[As noted above][types], this is typically passed and returned in an efficient | ||
fashion on major targets. However, in some languages, such as C, this will | ||
require separately defining a structure and then using that structure. This | ||
still seems preferable, though, as combining the two into one struct allows for | ||
uniform handling between arguments, return values, and fields, as well as | ||
keeping the pointer and length more strongly associated. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Passing slices as two arguments (ptr
then len
) would allow Rust bindings to existing C code that does this to use slices directly. For example, POSIX ssize_t send(int sockfd, const void buf, size_t len, int flags)
could be exposed in Rust as
extern "crABI" { // or perhaps even "C" ?
fn send(socket: c_int, buf: *const [u8], flags: c_int) -> ssize_t;
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
passing a structure of two fields is generally different than passing two arguments though, no?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, it is generally different. My suggestion is to make them the same for slices with extern "crABI"
/extern "C"
, as a special case only for that specific combination.
If an `enum` specifies `repr(crabi)` but does not specify a discriminant type, | ||
the `enum` is guaranteed to use the smallest discriminant type that holds the | ||
maximum discriminant value used by a variant in the `enum`. | ||
|
||
If the `enum` has no fields, or no fields with a non-zero size, crABI will | ||
represent the `enum` as only its discriminant. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If an `enum` specifies `repr(crabi)` but does not specify a discriminant type, | |
the `enum` is guaranteed to use the smallest discriminant type that holds the | |
maximum discriminant value used by a variant in the `enum`. | |
If the `enum` has no fields, or no fields with a non-zero size, crABI will | |
represent the `enum` as only its discriminant. | |
If an `enum` specifies `repr(crabi)` but does not specify a discriminant type, | |
the `enum` is guaranteed to use the smallest discriminant type that holds the | |
maximum discriminant value used by a variant in the `enum`. Enums with zero | |
or one variants have a zero-sized discriminant. | |
If the `enum` has no fields with a non-zero size, crABI will represent the | |
`enum` as only its discriminant. |
Clarify how zero-sized enums work
crABI supports arbitrary `enum` types, if declared with `repr(crabi)`. These | ||
are always passed using the same layout that Rust uses for enums with `repr(C)` | ||
and a specified discriminant type: | ||
<https://doc.rust-lang.org/reference/type-layout.html#combining-primitive-representations-of-enums-with-fields-and-reprc> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
crABI supports arbitrary `enum` types, if declared with `repr(crabi)`. These | |
are always passed using the same layout that Rust uses for enums with `repr(C)` | |
and a specified discriminant type: | |
<https://doc.rust-lang.org/reference/type-layout.html#combining-primitive-representations-of-enums-with-fields-and-reprc> | |
crABI supports arbitrary `enum` types, representing discriminated unions, if | |
declared with `repr(crabi)`. These are always passed using the same layout that Rust | |
uses for enums with `repr(C)` and a specified discriminant type: | |
<https://doc.rust-lang.org/reference/type-layout.html#combining-primitive-representations-of-enums-with-fields-and-reprc> |
Clarify what enum
as a type is
Provide the initial version of a new ABI and in-memory representation | ||
supporting interoperability between high-level programming languages that have | ||
safe data types. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Provide the initial version of a new ABI and in-memory representation | |
supporting interoperability between high-level programming languages that have | |
safe data types. | |
Provide the initial version of a new C-Rust ABI (`crABI`) and in-memory | |
representation supporting interoperability between high-level programming languages that have | |
safe data types. |
Introduce the origin of the term somewhere in the summary
- A repr for laying out data structures (`struct`, `union`, `enum`) compatible | ||
with crABI: `repr(crabi)`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
union
is mentioned here but its ABI/layout is not mentioned in the doc, it could use a short section
|
||
An implementation of crABI should document which version of crABI it | ||
implements, which compactly conveys supported and unsupported functionality. | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can imagine some cases where ABI version winds up in the binary as a way to verify compatibility. It may not hurt to specify a preferred representation:
If ABI version needs to be encoded in a binary for any reason, it should be | |
stored as a struct representing major and minor versions as 8-bit integers. If | |
applicable, the symbol name should be `__crabi_version`. | |
```c | |
struct crabi_version { | |
uint8_t major; | |
uint8_t minor; | |
}; | |
struct crabi_version __crabi_version = { .major = 1, .minor = 0 }; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Having a global __crabi_version
would cause conflicts if multiple static libraries that export a crABI interface are linked together. And using COMDAT to deduplicatw wouldn't work either if both static libraries use a different crABI version.
outside that range are passed or returned, and in particular the compiler may | ||
generate code that does not check this assumption, or may optionally include | ||
validation assertions when debugging. | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe add a section ``## T
- owned values` and just specify that anything `#[repr(C)]` or `#[repr(crABI)]` can be passed by value. I know this is mentioned elsewhere, but a specific section would make this more in line with the other types listed.
# Rationale and alternatives | ||
[rationale-and-alternatives]: #rationale-and-alternatives | ||
|
||
Guaranteed niche optmization is the most uncertain part of the proposed crABI |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"Rationale and alternatives" is big enough that it could use subsections e.g. ## Guaranteed niche optimization
@programmerjake made a | ||
[proposal](https://github.com/rust-lang/rfcs/pull/3470#issuecomment-1674249638) | ||
([sample usage](https://github.com/rust-lang/rfcs/pull/3470#issuecomment-1674265515)) | ||
to modify the standard impl of `Drop` for `Box` to allow plugging in an | ||
arbitrary function (via a `BoxDrop` trait), to drop the `Box` as a whole. This | ||
would be generally useful (e.g. for object pooling), and would then permit | ||
crABI to define a `box_drop` function that calls an FFI function to free the | ||
object. If we accepted that proposal, it would make sense to use it to | ||
represent crABI boxes. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could this go in "Future possibilities"?
- Swift's stable ABI | ||
- The `abi_stable` crate (which aims for Rust-to-Rust stability, not | ||
cross-language interoperation, but it still serves as a useful reference) | ||
- `stabby` | ||
- UniFFI | ||
- Diplomat | ||
- C++'s various ABIs (and the history of its ABI changes). crABI does not, | ||
however, aim for compatibility with or supersetting of any particular C++ | ||
ABI. | ||
- Many, many interface description languages (IDLs). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do Swift, C++ or any others define an ABI for a slice?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The wasm component model has a list type, but currently only allowes owned values. C++ has string_view
as &str
counterpart and span<T>
as &[T]
counterpart.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The slice type
[]T
is a sequence of a*[cap]T
pointer to the slice backing store, an int giving the len of the slice, and anint
giving the cap of the slice.
That sounds like it may be storage-compatible with the struct slice { uint8_t *data; size_t len; }
defined here, which is kind of nice. It seems like C compatibility isn't necessarily a goal of go's ABI though so that might be it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For what it's worth, string_view
seems to use the opposite ((len, pointer)
) on both GCC and Clang
(thanks Miguel for the correction)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess you meant (size, pointer), but while libstdc++ uses that one, libc++ (i.e. LLVM's) and Microsoft's use (pointer, size) instead.
For span
(dynamic), all three use (pointer, size).
Well, at least the versions I looked at.
- Should we provide *fewer* niche optimizations? Those for `NonZero` and | ||
reference types provide obvious value; are those for `bool` and `char` really | ||
useful enough to justify the special case in languages that will have to | ||
handle them explicitly rather than automatically? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just loosely, https://github.com/search?q=%22Option%3Cbool%3E%22&type=code shows 199k results for Optoin<bool>
but only 24.6k for Option<char>
https://github.com/search?q=%22Option%3Cchar%3E%22&type=code
|
||
Today, developers building projects incorporating multiple languages, or | ||
calling a library written in one language from another, often have to use the C | ||
ABI as a lowest-common-denominator for cross-language function calls. As a |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ABI as a lowest-common-denominator for cross-language function calls. As a | |
ABI as a lowest common denominator for cross-language function calls. As a |
("Lowest common denominator" doesn't have any compound adjectives.)
both languages have a safe type for counted UTF-8 strings. | ||
|
||
For popular pairs of languages, developers sometimes create higher-level | ||
binding layers for combining those languages. However, the creation of such |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
binding layers for combining those languages. However, the creation of such | |
binding layers to support communication between those languages. However, the creation of such |
Any type whose ABI is already defined by C will be passed through crABI | ||
identically. Types defined by crABI that the C ABI does not support will be | ||
translated into a representation using types the C ABI supports (potentially | ||
indirectly via other crABI-supported types). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In that case, has it been considered to extend the C ABI (as supported by Rust) rather than defining a new ABI? After all they will coincide for anything that is currently valid.
I can think of the following potential reasons not to:
- If the platform's "official" C ABI were to newly define something differently to Rust's anticipation of it (say, defining a different way of passing
[u]int128_t
parameters), that would introduce an incompatibility.
- Note that if that happened, the guarantee in the above paragraph couldn't hold either. A new ABI version would probably need to be created to restore it.
- A new ABI might be able to make weaker stability guarantees initially.
(1) seems not convincing, but (2) could be.
specify a discriminant type, the enum is guaranteed to use the smallest | ||
discriminant type that holds the maximum discriminant value used by a variant | ||
in the enum. (This differs from the behavior of `repr(C)` enums without a | ||
discriminant type.) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think this should differ. For a start it contradicts the paragraph about C ABI compatibility at lines 69-72 above. The representation of an enum
in C is implementation-defined as far as the standard is concerned, but in practice you can assume it is a well-defined function, for each platform, of the minimum and maximum enumeration values. (Typically, it has the same representation as the smallest standard integer type that can represent the elements potentially subject to some minimum width.)
I don't think users will expect repr(crabi)
to differ from repr(C)
here. It is better to write repr(crabi, u8)
, for example, if that's what you mean. If it does differ, then that must be explicitly called out in the paragraph at lines 69-72.
We do need to specify what the niche value is for Option
of an enum
type, if we are supporting niche-value optimization for enum
s (and I think we should). The obvious answer is "0 if possible, otherwise the all-ones value of the chosen integer type if possible, otherwise there is no niche-value optimization for this type".
larger than 0x10FFFF, or a value in the range 0xD800 to 0xDFFF inclusive. | ||
|
||
Note that there is no special handling for an array of values of this type, | ||
which is not equivalent to a string (unless using a UCS-4 encoding). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's not equivalent to either a NUL-terminated or a counted UCS-4 encoding either.
which is not equivalent to a string (unless using a UCS-4 encoding). | |
which is not equivalent to a string. |
# crABI versioning and evolution | ||
[crabi-versioning-and-evolution]: #crabi-versioning-and-evolution | ||
|
||
crABI has has a major and minor version number, similar to semver. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a small typo here, has has
that already implies passing by pointer (such as a function argument or return | ||
value), or translate it explicitly to a pointer (e.g `uint16_t (*rgb)[3]`) in | ||
contexts where just writing the array would imply by-value (such as a struct | ||
field). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The C types uint16_t *
and uint16_t (*)[3]
are distinct types. The wording here clearly specifies that the former be used, yet an example uses the latter.
It should be made clear which one to use. And I believe &[T; N]
should always be lowered to T (*)[N]
.
C can represent this as an array (e.g.
uint16_t rgb[3]
) in contexts where
that already implies passing by pointer (such as a function argument or return
value)
This can simply be removed. It is true that function parameters declared as arrays are pointers and that returning arrays is illegal, but that doesn’t concern crABI.
Note that the most eminently bikesheddable portion of this proposal is the
handling of niches, and the crABI
Option
andResult
types built aroundthat. There are multiple open questions specifically about that.
I've also listed an open question about how to represent owned crABI
pointers as Rust types:
Box<T>
versusBox<T, NoDeallocate>
versusBox<T, FFIDeallocate<obj_free>>
.Rendered