crABI v1 #3470

joshtriplett · 2023-08-11T05:05:41Z

Note that the most eminently bikesheddable portion of this proposal is the
handling of niches, and the crABI Option and Result types built around
that. There are multiple open questions specifically about that.

I've also listed an open question about how to represent owned crABI
pointers as Rust types: Box<T> versus Box<T, NoDeallocate> versus
Box<T, FFIDeallocate<obj_free>>.

Rendered

programmerjake · 2023-08-11T06:05:08Z

for Box<T, A>, we could introduce a new trait:

// in std
pub trait BoxDrop<T: ?Sized>: Sized {
    fn box_drop(v: Pin<Box<T, Self>>);
}

impl<T: ?Sized, A: Allocator> BoxDrop<T> for A {
    #[inline]
    fn box_drop(v: Pin<Box<T, Self>>) {
        struct DropPtr<T: ?Sized, A: Allocator>(*mut T, A, Layout);
        impl<T: ?Sized, A: Allocator> Drop for DropPtr<T, A> {
            #[inline]
            fn drop(&mut self) {
                if self.2.size() != 0 {
                    unsafe { self.1.deallocate(NonNull::new_unchecked(self.0).cast(), self.2) }
                }
            }
        }
        let l = Layout::for_value::<T>(&v);
        let (p, a) = Box::into_raw_with_allocator(unsafe { Pin::into_inner_unchecked(v) });
        let v = DropPtr(p, a, l);
        unsafe { v.drop_in_place() }
    }
}

// the standard Box type
pub struct Box<T: ?Sized, A: BoxDrop<T> = Global>(...);

// replacement version of Drop for Box
impl<T: ?Sized, A: BoxDrop<T>> Drop for Box<T, A> {
    #[inline]
    fn drop(&mut self) {
        A::box_drop(unsafe { ptr::read(self) }.into_pin())
    }
}

usage demo:

pub struct FooDropper;

impl BoxDrop<Foo> for FooDropper {
    fn box_drop(v: Pin<Box<Foo, FooDropper>>) {
        drop_foo(v);
    }
}

extern "crabi" {
    pub type Foo;
    pub fn make_foo() -> Pin<Box<Foo, FooDropper>>;
    pub fn drop_foo(v: Pin<Box<Foo, FooDropper>>);
}

joshtriplett · 2023-08-11T06:10:33Z

@programmerjake That's an interesting alternative!

Is there a way, rather than having to implement a trait, to instead have a single type parameterized with a function type?

programmerjake · 2023-08-11T06:21:59Z

Is there a way, rather than having to implement a trait, to instead have a single type parameterized with a function type?

I thought about it, but it's very annoying to give function types a name, since you currently have to use TAIT:

struct FnDropper<T: ?Sized, F: Fn(Pin<Box<T, FnDropper<T, F>>>)>(F);

type FooDropFn = impl Fn(Pin<Box<Foo, FnDropper<Foo, FooDropFn>>>)
extern "crabi" {
    type Foo;
    fn make_foo() -> Pin<Box<Foo, FnDropper<Foo, FooDropFn>>>;
    fn drop_foo(v: Pin<Box<Foo, FnDropper<Foo, FooDropFn>>>);
}
#[defining(FooDropFn)]
fn _f() -> FooDropFn {
    drop_foo
}

programmerjake · 2023-08-11T06:27:29Z

maybe better usage demo:

// std API
pub struct FFIDropper;
// like C++ unique_ptr but where deleter is defined by T
pub type FFIBox<T: ?Sized> = Box<T, FFIDropper>;

// user API, this impl could easily just be a
// #[box_drop = drop_foo] proc-macro annotation on Foo
impl BoxDrop<Foo> for FFIDropper {
    fn box_drop(v: Pin<FFIBox<Foo>>) {
        drop_foo(v);
    }
}

extern "crabi" {
    pub type Foo;
    pub fn make_foo() -> Pin<FFIBox<Foo>>;
    pub fn drop_foo(v: Pin<FFIBox<Foo>>);
}

programmerjake · 2023-08-11T07:42:58Z

lots more discussion about BoxDrop and FFIBox and stuff here: https://rust-lang.zulipchat.com/#narrow/stream/213817-t-lang/topic/BoxDrop.20proposal/near/383840871

joshtriplett · 2023-08-11T07:46:54Z

@programmerjake I attempted to partially summarize that proposal in the alternatives section. I do agree that if we accepted that general proposal, it makes sense to use it for the specific case of crABI's handling of Box.

EdorianDark · 2023-08-12T16:49:03Z

Provide the initial version of a new ABI and in-memory representation supporting interoperability between high-level programming languages that have safe data types.

Are there other languages interested in this proposal? Or is the target to enable an ABI between Rust code?

Araq · 2023-08-12T21:36:17Z

There is interest from Nim (disclaimer: I'm Nim's BDFL). But for Nim it would be really nice if "bit flags" which Nim maps to its set construct could become part of the spec. (Rust can only do the terrible low level bitwise operations here.)

Also quite discouraging is the lack of Swift support, IMO. A common ABI for subsets of Nim, Rust, Swift and C++ seems quite feasible.

joshtriplett · 2023-08-12T23:08:29Z

Part of the goal here is to have a baseline level of support from any language that speaks C FFI, which then means that any language with a C FFI immediately has the ability to interoperate with crABI. Everything beyond that is then about the convenience of native support (whether language or library), rather than about whether it's supported or not. So, anything supported would need to map to an underlying C data type that can be passed through the C ABI.

(I do expect that eventually we'll want to support a full object/trait protocol, but I'm trying to get there incrementally rather than trying to do it all at once. The initial round of crABI support is optimizing for ease of initial support/adoption, rather than completeness.)

@Araq This mechanism: https://nim-lang.org/docs/manual.html#set-type-bit-fields ? (Verifying: does Nim support bitfields wider than a base integral data type, or do they have to fit in a base integral data type?) That seems like a reasonable data type to support cross-language. At a minimum, that seems like an ideal candidate for crABI v1.1 (which I expect to follow closely on the heels of crABI v1.0).

Araq · 2023-08-13T00:34:47Z

@Araq This mechanism: https://nim-lang.org/docs/manual.html#set-type-bit-fields ?

Correct.

(Verifying: does Nim support bitfields wider than a base integral data type, or do they have to fit in a base integral data type?

It supports bitsets wider than any integral data type indeed. But an ABI could limit it to an integral type.

tmccombs · 2023-08-13T01:44:57Z

text/3470-crabi-v1.md

+- `Option<bool>` is passed using a single `u8`, where 0 is `Some(false)`, 1 is
+  `Some(true)`, and 2 is `None`.
+- `Option<char>` is passed using a single `u32`, where 0 through `0xD7FF` and
+  `0xE000` through `0x10FFFF` are possible `char` values, and `0x110000` is


Is there any possibility that unicode could expand the range of valid code points to include 0x110000? If so would u32::max() or one of the surragate pair numbers be better?

Is there any possibility that unicode could expand the range of valid code points to include 0x110000?

Rust assumes it will never happen. That is not the choice I would make, but it is the choice that Rust makes. The Unicode Consortium supposedly will never do this, and it will cause breakage if they do, but of course it is not literally impossible that they might reverse their commitment not to do this.

With the way UTF-16 is designed, it's literally impossible to encode unicode scalar values >= 0x110000. UTF-8/32 have ways to encode those out-of-range values but unicode almost certainly never will use them because they want UTF-16 to keep working since it's used soo many places: Win32, Java, JavaScript, C#, VB.net, etc.

char is a Unicode Scalar Value:
https://www.unicode.org/glossary/#unicode_scalar_value

Unicode scalar values outside the defined range are specifically forbidden by the Unicode Standard:

Ill-formed: A Unicode code unit sequence that purports to be in a Unicode encoding form is called ill-formed if and only if it does not follow the specification of that Unicode encoding form.
…
Any code unit sequence that would correspond to a code point outside the defined range of Unicode scalar values would, for example, be ill-formed.

https://www.unicode.org/glossary/#ill_formed_code_unit_sequence

(So expanding the range would be a breaking change for a lot of existing Unicode software.)

What is the reason 0x110000 is used for the niche rather than u32::MAX for Rust in general?

tmccombs · 2023-08-13T01:53:36Z

text/3470-crabi-v1.md

+- Once `extern "C"` supports C-compatible handling of `u128` and `i128`,
+  `extern "crabi"` should do the same.
+
+- Extensible enums. To define types that allow for extension, crABI would


Related to this, there are probably situations where it would be useful to declare a size and/or alignment of an enum to be larger than necessary, for future compatibility to allow adding new data that would otherwise change the ABI.

For a struct this can be done with a padding field, but that is somewhat difficult to do for an enum. I suppose you could make an unused, hidden variant with a value of the right size and alignment.

Padding fields are not ideal either as they can end up in JSON or toString representations too easily where they are noise at best and a bug at worst.

Maybe the same issue: we have found it difficult to safely define a Rust type corresponding to a C or C++ enum, because an out-of-range representation of a Rust enum is immediate UB, unlike in C/C++. We had to resort to using a repr(C) or repr(transparent) struct on the Rust side in order to be able to gracefully handle errors on the C/C++ side.

Example: https://github.com/zcash/zcash/blob/2112e467ee31ea95cf81904a6aae397fa3d031ae/src/rust/src/zip339_ffi.rs#L11-L37

and the corresponding C/C++ type:
https://github.com/zcash/zcash/blob/2112e467ee31ea95cf81904a6aae397fa3d031ae/src/rust/include/rust/zip339.h#L17-L38

(Rust here is stricter than C++17. In the latter, casting an integer outside the range of the enumeration values to the enum type is UB, but it is not UB to cast a value to the enum type that is within range but not one of the defined values.)

tmccombs · 2023-08-13T02:07:13Z

text/3470-crabi-v1.md

+
+- Is there a better way we can handle tuple types? Having to use a distinct
+  syntax like `cr#()` is onerous; one of the primary values of tuples is
+  brevity. In the future, if we have variadic generics, we could potentially


Or, converserly, variadic generics could build on the crabi tuples. There is definitely some overlap here, since having a tuple with well defined structure, with fields in the declaration order is also useful for using recursion to process the first element of a tuple, and pass the rest of the tuple to next iteration.

As you note, there is some overlap, in that both crabi tuples and 'a tuple where you can borrow the tail by reference' (which could potentially used for variadic generics) need the fields to be stored in order.

However, in other ways they have opposite goals. 'A tuple where you can borrow the tail by reference' would likely need to store, e.g., cr#(u8, u8, u32) like cr#(u8, (u8, u32)). First, this would be less efficient due to additional padding introduced. While crABI does already sacrifice layout efficiency to some extent, by giving up reordering and some niche optimizations, that doesn't mean it doesn't care about performance, and this would be an unnecessary extra loss. Second, given crABI's goal of being easy to bind to other languages, it's beneficial for cr#(u8, u8, u32) to translate directly to the obvious C struct equivalent (struct { uint8_t a; uint8_t b; uint32_t c; }), rather than to some other structure.

tmccombs · 2023-08-13T02:11:31Z

Given that this is intended for cross-languange interfaces there should probably be a formal, mostly language agnostic specification of the ABI. Should this RFC discuss where that specification should go, and how it will be created? Will maintainers from other languages (such as nim) be involved in that process?

comex · 2023-08-13T20:02:12Z

I suggest adding to the RFC that crABI will not initially define a stable LLVM CFI mangling (see also #3296, ping @rcvalle).

For an example of where this would be an issue, it's one thing to say that (quoting the RFC):

extern "crabi" fn func(buf: &mut [u8]);

is equivalent to:

struct u8_slice {
    uint8_t *data;
    size_t len;
};
extern void func(struct u8_slice buf);

But under LLVM CFI, if func ends up being turned into a function pointer, it will get a hash based on the Itanium C++ name mangling of the function signature, which has to match between caller and callee ends. On the C side, it would be mangled based on the actual name of the struct (in this case, u8_slice). On the Rust side, slices are currently mangled as vendor extended types, which can't be expressed in C or C++.

One potential solution is to define a standard C++ equivalent name for CFI purposes, e.g. [u8] could be mangled as if it were a C++ type named ::rust::slice<u8>. That would help when binding to C++, but not when binding to C, let alone other languages.

Another potential solution is to define a standard C struct name (say, _rust_slice_u8), but that's just really gross (how would it deal with more complex types? manually writing out the mangling in the struct name?), and also wouldn't work well with C++.

I think a better approach is to add an attribute to Clang to customize the CFI mangling for a C struct definition. After all, I'm not aware of any other implementations of LLVM CFI besides rustc and Clang, so it's not critical for us to fit into the existing constraints. We could then either keep using vendor extended types or go with the C++ equivalent name; which approach works better would depend on the design of said attribute.

But I'm not volunteering to send a patch to Clang; I think that onus is on anyone who wants to use crABI in the context of CFI-protected (C/C++)-Rust interop.

If that doesn't happen soon, though, then what? Clearly there's no need to block anything about crABI not related to CFI. The question is whether it should block stabilizing the LLVM CFI mangling. There's an argument that it shouldn't: we could stabilize the CFI mangling as-is and it would still be useful to protect Rust-to-Rust calls. But given the desire to interoperate well with other languages, I think it would be better to wait on stabilizing the CFI mangling until we have a full answer for how it will interop with Clang.

Araq · 2023-08-14T05:00:40Z

Please leave name mangling unspecified until interop between different languages has been established in some prototypes. It also does not have to be part of crABI at all and could be a different spec altogether.

text/3470-crabi-v1.md

teor2345 · 2023-08-18T06:38:16Z

text/3470-crabi-v1.md

+- Should we provide *fewer* niche optimizations? Those for `NonZero` and
+  reference types provide obvious value; are those for `bool` and `char` really
+  useful enough to justify the special case in languages that will have to
+  handle them explicitly rather than automatically?


I haven't seen Option<bool> in many APIs, but some string APIs use Option<char>. Is there a way to count how popular these types are in libraries?

Just loosely, https://github.com/search?q=%22Option%3Cbool%3E%22&type=code shows 199k results for Optoin<bool> but only 24.6k for Option<char> https://github.com/search?q=%22Option%3Cchar%3E%22&type=code

madsmtm · 2023-08-18T10:10:19Z

text/3470-crabi-v1.md

+## `[T; N]` - fixed-size array by value
+
+crABI supports passing a fixed-size array by value (as opposed to by
+reference). To represent this via the C ABI, crABI treats this equivalently to
+a C array of the same type passed by value.
+
+Note that this means C code can use an array directly in a context where it
+will be interpreted by-value (such as in a struct field), but needs to use a
+structure with a single array field in contexts where it would otherwise be
+interpreted as a pointer (such as in a function argument or return value).
+
+For instance:
+
+```rust
+extern "crabi" fn func(rgb: [u16; 3])
+```
+
+is equivalent to:
+
+```c
+struct func_rgb_arg {
+    uint16_t array[3];
+};
+extern void func(struct func_rgb_arg rgb);
+```
+
+Note that crABI does *not* pass the length, since it's a compile-time constant;
+the recipient must also know the correct size. (Use one of the slice-based
+types for a type with a runtime-determined length.)


I seem to remember issues about zero-sized types in C; what's expected behaviour if N is 0? I'd expect it to act like the other zero-sized types, i.e. the function parameter in it's entirety would just not be there in the C ABI, but that's not entirely clear from the example.

I think the [T; 0] case might depend on whether the array is passed by reference (address and length) or value (just address). It seems like passing by value is ambiguous here.

Also, what's the expected behaviour for arrays containing zero-sized types?
Is it different for [(), N] and [(), 0]?
Is it different passing by value and by reference?

For the rust abi all ZST's are ignored. Most C abi's follow this as well, but on a couple of archs ZST's still consume a register: https://github.com/rust-lang/rust/blob/6ef7d16be0fb9d6ecf300c27990f4bff49d22d46/compiler/rustc_ty_utils/src/abi.rs#L419-L421 I think we should ignore them on all archs. The C side can simply always omit the args as necessary, but it feels wrong for x86_64-pc-windows-gnu and x86_64-pc-windows-msvc to have an incompatible crABI given that it is possible to call MSVC libraries using MinGW just fine for as long as you have the right import library and in fact MinGW depends on this to call system libraries.

madsmtm · 2023-08-18T10:17:26Z

text/3470-crabi-v1.md

+structures, unless the newer side restricts itself to features understood by an
+older version.
+
+This RFC defines crABI 1.0.


I know that we feel pretty confident in most of the decisions in this RFC, but since crABI has not seen actual usage, especially regarding interop with other languages, it feels prudent to me if we started at 0.1.0 and allowed ourselves breaking changes, and only decided to move to 1.0 once enough (different) eyes have been laid on the problem.

nortti0 · 2023-08-20T21:21:47Z

One way to reduce the complexity of the niche optimizations would be to only use two values for the niche value:

Option<&T>, Option<&mut T>, Option<NonNull<T>>, Option<Box<T>>, and
Option of any function pointer type are all passed using a null pointer to
represent None.

Option of any of the NonZero* types is passed using a value of the
underlying numeric type with 0 as None.

These can continue to use the zero value for None.

Option<bool> is passed using a single u8, where 0 is Some(false), 1 is
Some(true), and 2 is None.

Option<char> is passed using a single u32, where 0 through 0xD7FF and
0xE000 through 0x10FFFF are possible char values, and 0x110000 is
None.

Option<OwnedFd> and Option<BorrowedFd> are passed using -1 to represent
None

These could all use the all bits set (i.e. 0.wrapping_sub(1)) value for None, as is already used for the *Fd types.

A second, further way to simplify the specification here and allow users to enable the niche optimization for their own types would be to define a NonNegativeI* (other spellings are possible, e.g. u31) series of types. Types that are currently called out here explicitly would instead be defined to be passed using those types, and the niche optimizations section could be changed to read:

Option<&T>, Option<&mut T>, Option<NonNull<T>>, Option<Box<T>>, and Option of any function pointer type are all passed using a null pointer to represent None.
Option of any of the NonZero* types is passed using a value of the underlying numeric type with 0 as None.
Option of any of the NonNegative* types is passed using a value of the underlying numeric type with all bits set as None.
Option of a repr(transparent) type containing one of the above as its only non-zero-sized field will use the same representation.

the8472 · 2023-08-21T16:44:44Z

text/3470-crabi-v1.md

+Slices translate to a by-value struct containing two fields: a pointer to the
+element type, and a `size_t` number of elements, in that order.


Does this properly translate to the same limitation as rust slices? I.e. at most isize::MAX bytes but at most usize::MAX items (for ZSTs)?

the8472 · 2023-08-21T16:59:26Z

text/3470-crabi-v1.md

+As a special case, if an `enum` using `repr(crabi)` has exactly two variants,
+one of which has no fields and the other of which has a single field, and the
+single field type has a specific type (defined below) with a "niche" value,


What about non_exhaustive?

Is any non_exhaustive type eligible for a stable ABI like this? Or would it somehow be limited to extensions in ways that don't break the ABI? Then again, do we have restrictions on adding fields or other layout-affecting changes being made to extant ABI-declared types?

I believe non_exhaustive is orthogonal with "stable ABI".

For example, a linux syscall struct must have a stable layout, but should be planned with later extension in mind, as it may gain additional variants/options in future kernel versions, but won't change the layout of existing syscalls.
e.g. by including some reserved fields or leaving space in a bitset for to-be-added flags.

non_exhaustive would then remind consumers to defensively program against "timetravelers from the future" (e.g. return -EINVAL; in the linux syscall example)

Another example would be the io_uring, which keeps adding new opcodes as the kernel implementation grows (i.e. additionally-accepted values for the opcode discriminant)

#[non_exhaustive] doesn't have any effect for the crate that defines the type. Instead it has an effect on users. This means that if the user tried to use a newer version of the type, doing so while using the older version of the code that defined is would be UB without said code being able to report an error for the invalid discriminant. Also extensions to #[non_exhaustive] types can only possibly work for the memory layout. The calling convention can drastically change even for adding a single field or enum variant.

programmerjake · 2023-08-21T17:24:19Z

A second, further way to simplify the specification here and allow users to enable the niche optimization for their own types would be to define a NonNegativeI* (other spellings are possible, e.g. u31) series of types. Types that are currently called out here explicitly would instead be defined to be passed using those types, and the niche optimizations section could be changed to read:

that wouldn't work for (some of) the Win32 handle types, because iirc negative handles are perfectly valid, it's just -1 that's invalid.

ChrisDenton · 2023-08-21T17:35:04Z

that wouldn't work for (some of) the Win32 handle types, because iirc negative handles are perfectly valid, it's just -1 that's invalid.

-1 is perfectly valid. It's a pseudo handle for the current process. But yes, negative handles are valid; they're static values with special meaning. The only common niche between real handles and pseudo handles is 0.

programmerjake · 2023-08-21T19:53:22Z

that wouldn't work for (some of) the Win32 handle types, because iirc negative handles are perfectly valid, it's just -1 that's invalid.

-1 is perfectly valid. It's a pseudo handle for the current process. But yes, negative handles are valid; they're static values with special meaning. The only common niche between real handles and pseudo handles is 0.

I'm referring to OwnedSocket which has a niche for INVALID_SOCKET aka. -1 because that is not a valid socket handle.

juleskers · 2023-08-22T07:17:43Z

text/3470-crabi-v1.md

+*nightly-only* implementations of that version of crABI. Versions of crABI
+should not be considered stable until available in stable Rust.
+
+Future versions of crABI may also establish allow-by-default lints for the use


As a (wildly excited) outsider, this raises a question for me: will the deny-activation of such lints be enough info for the compiler to restrict itself to a lower subset of crABI?

Or must we reserve some kind of "version" specifier in the repr attribute? Something like repr(crABI, u8, v=2.1). perhaps a global annotation on the module level, to avoid repetition?

I'm thinking of cases like an enum that would have different possibilities of niche-optimisation under crABI v1, v2, v3..
Does the compiler have access to the active lint selection in the enum layout code?
Are lints even supposed to act like compiler flags to this extent?

matklad · 2023-08-24T09:19:12Z

Hm, I am quite a bit surprised that the RFC doesn’t talk about aliasing at all. Consider this crabi function declaration:

struct u8_slice {
    uint8_t *data;
    size_t len;
};
extern void copy(struct u8_slice dst, struct u8_slice src);

is the code calling the function required to ensure that src and dst do not overlap? Is the code implementing the function allowed to assume that the slices do not overlap?

Do we just use Type Based Alias Analysis rules here by virtue of just deferring to C ABI?

bjorn3 · 2023-08-24T09:26:25Z

is the code calling the function required to ensure that src and dst do not overlap?

If the rust side is &mut [u8] then yes. If it is *mut [u8] then no. In general I did expect the regular rust memory model rules to apply.

Do we just use Type Based Alias Analysis rules here by virtue of just deferring to C ABI?

TBAA is entirely incompatible with rust.

matklad · 2023-08-24T10:24:54Z

TBAA is entirely incompatible with rust.

Yes, but, if I understand this right, that's what the current RFC implicitly proposes to use, by saying that "we lower to C ABI".

Taking example from https://stefansf.de/post/type-based-alias-analysis/, the following crabi declaration:

extern void foo(int *x, short *y)

also carries implicit constraint that x and y do not alias (level of confidence: 0.7).

We definitely don't want to have that in crabi, because that's not how the Rust works, and not how an ideal ABI would work. But that means we need to explicitly define aliasing rules for crabi, as otherwise we are inheriting those from C.

jonathanpallant · 2023-08-24T12:58:25Z

If I have some crabi function which takes, say, core::crabi::Option<u32> as an argument, can I pass Some(42) and have it auto-convert (or auto-infer), or do I have to say core::crabi::Option::Some(42) (or Some(42).into())?

I found the manual conversions was one of the annoying parts about using the C ABI in https://github.com/Neotron-Compute/Neotron-FFI/blob/develop/src/option.rs (e.g. here or here)

But also, yay, I could basically delete the neotron-ffi crate.

comex · 2023-08-24T16:26:13Z

extern void foo(int *x, short *y)
also carries implicit constraint that x and y do not alias (level of confidence: 0.7).

You don't get UB in C just by having pointers that alias, only if you actually dereference them with incompatible types.

So the full example from the post you linked has UB if x == y:

void foo(int *x, short *y) {
    *x  = 40;
    *y  = 0;
    *x += 2;
}

But this version would not, as long as the pointee was originally ether a dynamic allocation or a variable of declared type int:

void foo(int *x, short *y) {
    *x  = 40;
    memset(y, 0, sizeof(short));
    *x += 2;
}

As a result, there's no need for crABI to, say, munge the types to void * at the boundary. As such, I think the issue is largely out of scope for crABI.

A potential exception is that if a pointer to a Rust local or global variable is sent to C, the C side might want to know what the "declared type" is for the purpose of C's aliasing model. The right answer should be that the Rust variable is like a dynamic allocation and doesn't have a declared type. But I'm not sure if that works in all cases in the current implementation, or if it should be guaranteed for all Rust implementations. Still, it seems somewhat out of scope…

Maix0 · 2023-11-23T14:03:09Z

I would actually like to see a deny by default lint&compiler flag (which is a new thing I think) that would (when allowed/warned) make the transparent conversion between crabi::Option and std::Option (and the result type as well).

As this would be an deny by default, you wouldn't fall into relying on it by accident, but would still allow someone to quickly prototype/get/use an FFI API quickly.

I do feel strongly against having this in everyday program, as it is indeed basically a deep memory copy, but this could allow some leeway when trying stuff.

It is true that having to duplicate those types is sad, but as said in the RFC, there seems to be no other way.

jeffparsons · 2023-12-17T02:49:52Z

It is true that having to duplicate those types is sad, but as said in the RFC, there seems to be no other way.

I know I'm just one data point, but as a Rust user who is excited about crABI: I get it. I'll live. Better a bit of minor friction than lurking dragons.

And it seems to me that it's pretty easy to leave the door open to creative solutions in future, so at least to me it doesn't seem like solving it now should be considered a blocker.

Jules-Bertholet · 2024-03-13T03:11:45Z

text/3470-crabi-v1.md

+The translation of slices and similar uses structs containing pointer/length
+pairs, rather than inlining the pointer and length as separate arguments.
+[As noted above][types], this is typically passed and returned in an efficient
+fashion on major targets. However, in some languages, such as C, this will
+require separately defining a structure and then using that structure. This
+still seems preferable, though, as combining the two into one struct allows for
+uniform handling between arguments, return values, and fields, as well as
+keeping the pointer and length more strongly associated.


Passing slices as two arguments (ptr then len) would allow Rust bindings to existing C code that does this to use slices directly. For example, POSIX ssize_t send(int sockfd, const void buf, size_t len, int flags) could be exposed in Rust as

extern "crABI" { // or perhaps even "C" ? fn send(socket: c_int, buf: *const [u8], flags: c_int) -> ssize_t; }

passing a structure of two fields is generally different than passing two arguments though, no?

Yes, it is generally different. My suggestion is to make them the same for slices with extern "crABI"/extern "C", as a special case only for that specific combination.

tgross35 · 2024-03-13T04:33:32Z

text/3470-crabi-v1.md

+If an `enum` specifies `repr(crabi)` but does not specify a discriminant type,
+the `enum` is guaranteed to use the smallest discriminant type that holds the
+maximum discriminant value used by a variant in the `enum`.
+
+If the `enum` has no fields, or no fields with a non-zero size, crABI will
+represent the `enum` as only its discriminant.


Suggested change

If an `enum` specifies `repr(crabi)` but does not specify a discriminant type,

the `enum` is guaranteed to use the smallest discriminant type that holds the

maximum discriminant value used by a variant in the `enum`.

If the `enum` has no fields, or no fields with a non-zero size, crABI will

represent the `enum` as only its discriminant.

If an `enum` specifies `repr(crabi)` but does not specify a discriminant type,

the `enum` is guaranteed to use the smallest discriminant type that holds the

maximum discriminant value used by a variant in the `enum`. Enums with zero

or one variants have a zero-sized discriminant.

If the `enum` has no fields with a non-zero size, crABI will represent the

`enum` as only its discriminant.

Clarify how zero-sized enums work

tgross35 · 2024-03-13T04:35:36Z

text/3470-crabi-v1.md

+crABI supports arbitrary `enum` types, if declared with `repr(crabi)`. These
+are always passed using the same layout that Rust uses for enums with `repr(C)`
+and a specified discriminant type:
+<https://doc.rust-lang.org/reference/type-layout.html#combining-primitive-representations-of-enums-with-fields-and-reprc>


Suggested change

crABI supports arbitrary `enum` types, if declared with `repr(crabi)`. These

are always passed using the same layout that Rust uses for enums with `repr(C)`

and a specified discriminant type:

<https://doc.rust-lang.org/reference/type-layout.html#combining-primitive-representations-of-enums-with-fields-and-reprc>

crABI supports arbitrary `enum` types, representing discriminated unions, if

declared with `repr(crabi)`. These are always passed using the same layout that Rust

uses for enums with `repr(C)` and a specified discriminant type:

<https://doc.rust-lang.org/reference/type-layout.html#combining-primitive-representations-of-enums-with-fields-and-reprc>

Clarify what enum as a type is

tgross35 · 2024-03-13T04:48:50Z

text/3470-crabi-v1.md

+Provide the initial version of a new ABI and in-memory representation
+supporting interoperability between high-level programming languages that have
+safe data types.


Suggested change

Provide the initial version of a new ABI and in-memory representation

supporting interoperability between high-level programming languages that have

safe data types.

Provide the initial version of a new C-Rust ABI (`crABI`) and in-memory

representation supporting interoperability between high-level programming languages that have

safe data types.

Introduce the origin of the term somewhere in the summary

tgross35 · 2024-03-13T04:51:47Z

text/3470-crabi-v1.md

+- A repr for laying out data structures (`struct`, `union`, `enum`) compatible
+  with crABI: `repr(crabi)`.


union is mentioned here but its ABI/layout is not mentioned in the doc, it could use a short section

tgross35 · 2024-03-13T05:07:12Z

text/3470-crabi-v1.md

+
+An implementation of crABI should document which version of crABI it
+implements, which compactly conveys supported and unsupported functionality.
+


I can imagine some cases where ABI version winds up in the binary as a way to verify compatibility. It may not hurt to specify a preferred representation:

Suggested change

If ABI version needs to be encoded in a binary for any reason, it should be

stored as a struct representing major and minor versions as 8-bit integers. If

applicable, the symbol name should be `__crabi_version`.

```c

struct crabi_version {

uint8_t major;

uint8_t minor;

};

struct crabi_version __crabi_version = { .major = 1, .minor = 0 };

Having a global __crabi_version would cause conflicts if multiple static libraries that export a crABI interface are linked together. And using COMDAT to deduplicatw wouldn't work either if both static libraries use a different crABI version.

tgross35 · 2024-03-13T05:52:11Z

text/3470-crabi-v1.md

+outside that range are passed or returned, and in particular the compiler may
+generate code that does not check this assumption, or may optionally include
+validation assertions when debugging.
+


Maybe add a section ``## T - owned values` and just specify that anything `#[repr(C)]` or `#[repr(crABI)]` can be passed by value. I know this is mentioned elsewhere, but a specific section would make this more in line with the other types listed.

tgross35 · 2024-03-13T05:53:51Z

text/3470-crabi-v1.md

+# Rationale and alternatives
+[rationale-and-alternatives]: #rationale-and-alternatives
+
+Guaranteed niche optmization is the most uncertain part of the proposed crABI


"Rationale and alternatives" is big enough that it could use subsections e.g. ## Guaranteed niche optimization

tgross35 · 2024-03-13T05:55:20Z

text/3470-crabi-v1.md

+@programmerjake made a
+[proposal](https://github.com/rust-lang/rfcs/pull/3470#issuecomment-1674249638)
+([sample usage](https://github.com/rust-lang/rfcs/pull/3470#issuecomment-1674265515))
+to modify the standard impl of `Drop` for `Box` to allow plugging in an
+arbitrary function (via a `BoxDrop` trait), to drop the `Box` as a whole. This
+would be generally useful (e.g. for object pooling), and would then permit
+crABI to define a `box_drop` function that calls an FFI function to free the
+object. If we accepted that proposal, it would make sense to use it to
+represent crABI boxes.


Could this go in "Future possibilities"?

tgross35 · 2024-03-13T05:57:36Z

text/3470-crabi-v1.md

+- Swift's stable ABI
+- The `abi_stable` crate (which aims for Rust-to-Rust stability, not
+  cross-language interoperation, but it still serves as a useful reference)
+- `stabby`
+- UniFFI
+- Diplomat
+- C++'s various ABIs (and the history of its ABI changes). crABI does not,
+  however, aim for compatibility with or supersetting of any particular C++
+  ABI.
+- Many, many interface description languages (IDLs).


Do Swift, C++ or any others define an ABI for a slice?

The wasm component model has a list type, but currently only allowes owned values. C++ has string_view as &str counterpart and span<T> as &[T] counterpart.

Go does: https://go.googlesource.com/go/+/refs/heads/dev.regabi/src/cmd/compile/internal-abi.md

The slice type []T is a sequence of a *[cap]T pointer to the slice backing store, an int giving the len of the slice, and an int giving the cap of the slice.

That sounds like it may be storage-compatible with the struct slice { uint8_t *data; size_t len; } defined here, which is kind of nice. It seems like C compatibility isn't necessarily a goal of go's ABI though so that might be it.

For what it's worth, string_view seems to use the opposite ((len, pointer)) on both GCC and Clang

(thanks Miguel for the correction)

I guess you meant (size, pointer), but while libstdc++ uses that one, libc++ (i.e. LLVM's) and Microsoft's use (pointer, size) instead.

For span (dynamic), all three use (pointer, size).

Well, at least the versions I looked at.

tgross35 · 2024-03-13T06:00:02Z

text/3470-crabi-v1.md

+- Should we provide *fewer* niche optimizations? Those for `NonZero` and
+  reference types provide obvious value; are those for `bool` and `char` really
+  useful enough to justify the special case in languages that will have to
+  handle them explicitly rather than automatically?


Just loosely, https://github.com/search?q=%22Option%3Cbool%3E%22&type=code shows 199k results for Optoin<bool> but only 24.6k for Option<char> https://github.com/search?q=%22Option%3Cchar%3E%22&type=code

Co-authored-by: Jeff Parsons <jeff@parsons.io>

daira · 2024-03-27T05:35:29Z

text/3470-crabi-v1.md

+
+Today, developers building projects incorporating multiple languages, or
+calling a library written in one language from another, often have to use the C
+ABI as a lowest-common-denominator for cross-language function calls. As a


Suggested change

ABI as a lowest-common-denominator for cross-language function calls. As a

ABI as a lowest common denominator for cross-language function calls. As a

("Lowest common denominator" doesn't have any compound adjectives.)

daira · 2024-03-27T05:40:49Z

text/3470-crabi-v1.md

+both languages have a safe type for counted UTF-8 strings.
+
+For popular pairs of languages, developers sometimes create higher-level
+binding layers for combining those languages. However, the creation of such


Suggested change

binding layers for combining those languages. However, the creation of such

binding layers to support communication between those languages. However, the creation of such

daira · 2024-03-27T05:58:25Z

text/3470-crabi-v1.md

+Any type whose ABI is already defined by C will be passed through crABI
+identically. Types defined by crABI that the C ABI does not support will be
+translated into a representation using types the C ABI supports (potentially
+indirectly via other crABI-supported types).


In that case, has it been considered to extend the C ABI (as supported by Rust) rather than defining a new ABI? After all they will coincide for anything that is currently valid.

I can think of the following potential reasons not to:

If the platform's "official" C ABI were to newly define something differently to Rust's anticipation of it (say, defining a different way of passing [u]int128_t parameters), that would introduce an incompatibility.

Note that if that happened, the guarantee in the above paragraph couldn't hold either. A new ABI version would probably need to be created to restore it.

A new ABI might be able to make weaker stability guarantees initially.

(1) seems not convincing, but (2) could be.

daira · 2024-03-27T11:45:42Z

text/3470-crabi-v1.md

+specify a discriminant type, the enum is guaranteed to use the smallest
+discriminant type that holds the maximum discriminant value used by a variant
+in the enum. (This differs from the behavior of `repr(C)` enums without a
+discriminant type.)


I don't think this should differ. For a start it contradicts the paragraph about C ABI compatibility at lines 69-72 above. The representation of an enum in C is implementation-defined as far as the standard is concerned, but in practice you can assume it is a well-defined function, for each platform, of the minimum and maximum enumeration values. (Typically, it has the same representation as the smallest standard integer type that can represent the elements potentially subject to some minimum width.)

I don't think users will expect repr(crabi) to differ from repr(C) here. It is better to write repr(crabi, u8), for example, if that's what you mean. If it does differ, then that must be explicitly called out in the paragraph at lines 69-72.

We do need to specify what the niche value is for Option of an enum type, if we are supporting niche-value optimization for enums (and I think we should). The obvious answer is "0 if possible, otherwise the all-ones value of the chosen integer type if possible, otherwise there is no niche-value optimization for this type".

daira · 2024-03-27T11:59:34Z

text/3470-crabi-v1.md

+larger than 0x10FFFF, or a value in the range 0xD800 to 0xDFFF inclusive.
+
+Note that there is no special handling for an array of values of this type,
+which is not equivalent to a string (unless using a UCS-4 encoding).


It's not equivalent to either a NUL-terminated or a counted UCS-4 encoding either.

Suggested change

which is not equivalent to a string (unless using a UCS-4 encoding).

which is not equivalent to a string.

MolotovCherry · 2024-04-07T18:12:41Z

text/3470-crabi-v1.md

+# crABI versioning and evolution
+[crabi-versioning-and-evolution]: #crabi-versioning-and-evolution
+
+crABI has has a major and minor version number, similar to semver.


Just a small typo here, has has

Humm42 · 2024-07-03T00:49:49Z

text/3470-crabi-v1.md

+that already implies passing by pointer (such as a function argument or return
+value), or translate it explicitly to a pointer (e.g `uint16_t (*rgb)[3]`) in
+contexts where just writing the array would imply by-value (such as a struct
+field).


The C types uint16_t * and uint16_t (*)[3] are distinct types. The wording here clearly specifies that the former be used, yet an example uses the latter.

It should be made clear which one to use. And I believe &[T; N] should always be lowered to T (*)[N].

C can represent this as an array (e.g. uint16_t rgb[3]) in contexts where
that already implies passing by pointer (such as a function argument or return
value)

This can simply be removed. It is true that function parameters declared as arrays are pointers and that returning arrays is illegal, but that doesn’t concern crABI.

crABI v1

7717d03

joshtriplett added the T-lang Relevant to the language team, which will review and decide on the RFC. label Aug 11, 2023

RFC 3470

d18beea

joshtriplett force-pushed the crabi-v1 branch from cec3dbd to d18beea Compare August 11, 2023 05:24

This comment was marked as resolved.

Sign in to view

Partially summarize the BoxDrop proposal in the alternatives section

d6225f7

Add links to BoxDrop proposal

849585b

EdorianDark mentioned this pull request Aug 12, 2023

Experimental feature gate proposal crabi rust-lang/rust#105586

Open

tmccombs reviewed Aug 13, 2023

View reviewed changes

omentic mentioned this pull request Aug 13, 2023

ABI freedom nim-lang/RFCs#506

Open

jeffparsons reviewed Aug 14, 2023

View reviewed changes

text/3470-crabi-v1.md Outdated Show resolved Hide resolved

teor2345 reviewed Aug 18, 2023

View reviewed changes

madsmtm reviewed Aug 18, 2023

View reviewed changes

the8472 reviewed Aug 21, 2023

View reviewed changes

juleskers reviewed Aug 22, 2023

View reviewed changes

jeffparsons mentioned this pull request Oct 26, 2023

Develop a plugin interface for the CLI bytecodealliance/wasmtime#7348

Open

ssokolow mentioned this pull request Jan 10, 2024

PoC: add Rust submodule as libbasic_rust systemd/systemd#19598

Open

Jules-Bertholet reviewed Mar 13, 2024

View reviewed changes

tgross35 reviewed Mar 13, 2024

View reviewed changes

Fix typo

337e891

Co-authored-by: Jeff Parsons <jeff@parsons.io>

daira reviewed Mar 27, 2024

View reviewed changes

programmerjake mentioned this pull request Apr 3, 2024

RFC: Trait for !Sized thin pointers #3536

Closed

MolotovCherry reviewed Apr 7, 2024

View reviewed changes

programmerjake mentioned this pull request Apr 24, 2024

Tracking issue for RFC 1861: Extern types rust-lang/rust#43467

Open

3 tasks

Humm42 reviewed Jul 3, 2024

View reviewed changes

		Slices translate to a by-value struct containing two fields: a pointer to the
		element type, and a `size_t` number of elements, in that order.

		- A repr for laying out data structures (`struct`, `union`, `enum`) compatible
		with crABI: `repr(crabi)`.


		An implementation of crABI should document which version of crABI it
		implements, which compactly conveys supported and unsupported functionality.

+If ABI version needs to be encoded in a binary for any reason, it should be
+stored as a struct representing major and minor versions as 8-bit integers. If
+applicable, the symbol name should be `__crabi_version`.
+```c
+struct crabi_version {
+    uint8_t major;
+    uint8_t minor;
+};
+struct crabi_version __crabi_version = { .major = 1, .minor = 0 };

	ABI as a lowest-common-denominator for cross-language function calls. As a
	ABI as a lowest common denominator for cross-language function calls. As a

	binding layers for combining those languages. However, the creation of such
	binding layers to support communication between those languages. However, the creation of such

	which is not equivalent to a string (unless using a UCS-4 encoding).
	which is not equivalent to a string.

crABI v1 #3470

Are you sure you want to change the base?

crABI v1 #3470

Conversation

joshtriplett commented Aug 11, 2023 • edited Loading

This comment was marked as resolved.

programmerjake commented Aug 11, 2023 • edited Loading

joshtriplett commented Aug 11, 2023

programmerjake commented Aug 11, 2023

programmerjake commented Aug 11, 2023 • edited Loading

programmerjake commented Aug 11, 2023 • edited Loading

joshtriplett commented Aug 11, 2023

EdorianDark commented Aug 12, 2023

Araq commented Aug 12, 2023 • edited Loading

joshtriplett commented Aug 12, 2023

Araq commented Aug 13, 2023

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

tgross35 Mar 13, 2024 • edited Loading

Choose a reason for hiding this comment

tmccombs Aug 13, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

daira Mar 27, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

tmccombs commented Aug 13, 2023

comex commented Aug 13, 2023 • edited Loading

Araq commented Aug 14, 2023

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

bjorn3 Aug 19, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

nortti0 commented Aug 20, 2023

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

programmerjake commented Aug 21, 2023

ChrisDenton commented Aug 21, 2023

programmerjake commented Aug 21, 2023

Choose a reason for hiding this comment

matklad commented Aug 24, 2023

bjorn3 commented Aug 24, 2023

matklad commented Aug 24, 2023

jonathanpallant commented Aug 24, 2023 • edited Loading

comex commented Aug 24, 2023

Maix0 commented Nov 23, 2023

jeffparsons commented Dec 17, 2023

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

tgross35 Mar 27, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

daira Mar 27, 2024 • edited Loading

Choose a reason for hiding this comment

joshtriplett commented Aug 11, 2023 •

edited

Loading

programmerjake commented Aug 11, 2023 •

edited

Loading

programmerjake commented Aug 11, 2023 •

edited

Loading

programmerjake commented Aug 11, 2023 •

edited

Loading

Araq commented Aug 12, 2023 •

edited

Loading

tgross35 Mar 13, 2024 •

edited

Loading

tmccombs Aug 13, 2023 •

edited

Loading

daira Mar 27, 2024 •

edited

Loading

comex commented Aug 13, 2023 •

edited

Loading

bjorn3 Aug 19, 2023 •

edited

Loading

jonathanpallant commented Aug 24, 2023 •

edited

Loading

tgross35 Mar 27, 2024 •

edited

Loading

daira Mar 27, 2024 •

edited

Loading

daira Mar 27, 2024 •

edited

Loading