Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support defining C-compatible variadic functions in Rust #2137

Merged
merged 25 commits into from
Sep 29, 2017

Conversation

joshtriplett
Copy link
Member

@joshtriplett joshtriplett commented Sep 3, 2017

Support defining C-compatible variadic functions in Rust, via new intrinsics.
Rust currently supports declaring external variadic functions and calling them
from unsafe code, but does not support writing such functions directly in Rust.
Adding such support will allow Rust to replace a larger variety of C libraries,
avoid requiring C stubs and error-prone reimplementation of platform-specific
code, improve incremental translation of C codebases to Rust, and allow
implementation of variadic callbacks.

This RFC does not propose an interface intended for native Rust code to pass
variable numbers of arguments to a native Rust function, nor an interface that
provides any kind of type safety. This proposal exists primarily to allow Rust
to provide interfaces callable from C code.

Rendered

@joshtriplett
Copy link
Member Author

A bit of background: this was inspired by a co-worker who wanted to define a debugging library to intercept certain C calls, for use with $LD_PRELOAD. He wanted to try writing that library in Rust. However, some of the calls he wanted to intercept included variadic functions; that was the only thing stopping him from writing that code in Rust, and forcing him to write it in C instead.

@comex
Copy link

comex commented Sep 3, 2017

/// Obtain the variable arguments of the current function. Produces a
/// compile-time error if called from a non-variadic function. The compiler
/// will supply the appropriate lifetime when called, and prevent that
/// lifetime from outliving the variadic function.

Having the compiler enforce special lifetime rules for one function sounds really dubious. Either don't enforce lifetimes or make it work with the normal rules - either with a API that takes impl for<'a> FnOnce(VaList<'a>), or with the …args option. (There's no rule that the extra argument has to be a VaList itself, as opposed to something that can be turned into one.)

@joshtriplett
Copy link
Member Author

@comex I didn't intend it to be a "special lifetime rule", so much as the compiler simply defining and passing the appropriate lifetime.

I don't really think an API based around calling a closure would work very smoothly, but I'd be open to switching to the ...args option. And having that be an object that can produce a VaList (borrowed from that object), rather than the VaList itself, seems plausible. I might experiment with that and see how it turns out. Suggestions for the name of such a type?

@joshtriplett joshtriplett added T-lang Relevant to the language team, which will review and decide on the RFC. T-libs-api Relevant to the library API team, which will review and decide on the RFC. labels Sep 3, 2017
@joshtriplett
Copy link
Member Author

joshtriplett commented Sep 3, 2017

I'd also appreciate feedback from appropriate compiler folks, regarding what solution might make the most sense from a compiler-internals point of view, in terms of preventing the type from outliving the variadic function associated with it.

Cc @arielb1 @eddyb @petrochenkov

@canndrew
Copy link
Contributor

canndrew commented Sep 3, 2017

Either don't enforce lifetimes or make it work with the normal rules

People have talked about introducing a special 'fn lifetime which represents the lifetime of the current function. This could be a good usecase for that.

ie.

impl<'a> VaList<'a> {
    pub unsafe fn start() -> VaList<'fn>
}

impl<'a> Drop for VaList<'a>;

/// The type of arguments extractable from VaList
trait VaArg;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This could be an unsafe trait?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

the raw C types corresponding to the Rust integer and float types above:

```rust
impl VaArg for c_char;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unless these type aliases became concrete types, I don't think the std and libc crates need to provide any additional impls.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.

let args = VaList::start();
let x: u8 = args::arg();
let y: u16 = args::arg();
let z: u32 = args::arg();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You mean

let mut args = VaList::start();
let x: u8 = args.arg();
let y: u16 = args.arg();
let z: u32 = args.arg();

?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch, thank you.

```rust
let closure = |arg, arg2, ...| {
// implementation
};
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can't cast a closure to extern "C" fn, so I don't think supporting variadic closure is necessary in this RFC.

fn main() {
    unsafe {
        let _x = (|| {}) as extern fn(); //~ ERROR E0605
    }
}

Copy link
Contributor

@jethrogb jethrogb Sep 3, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Plus, I think extern "C" variadic functions should always be unsafe functions, and closures currently can't be unsafe.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't realize this; I'd assumed that RFC 1558 made this possible, but digging through it, it doesn't allow coercing to extern fn, only to fn. I filed rust-lang/rust#44291 to allow coercing to extern fn, but in the meantime, I'll drop this for now and move it to an open for the future. Would have been nice for more convenient callbacks.

@eddyb
Copy link
Member

eddyb commented Sep 3, 2017

The only safe way that comes to mind is requiring the function take a VaList<'a> lang item type and not use the same lifetime passed to it (anonymous or not, elision would still see it) anywhere else in the signature, and we synthesize the necessary intrinsic calls to create the VaList.

EDIT: there would need to be a way to mark the function as variadic, args: VaList... maybe?

@mark-i-m
Copy link
Member

mark-i-m commented Sep 3, 2017

The RFC says that only libc would be allowed to provide impls of VaArg. Does this mean that it is an unstable intrinsic or language feature? Or does it mean that libc becomes explicitly magical among creates?

@joshtriplett
Copy link
Member Author

@mark-i-m Fixed now, since libc's types are just type aliases.

…times

This allows the compiler to pass an appropriate lifetime without as much
magic.
@joshtriplett
Copy link
Member Author

@eddyb OK, based on your comment, I've tried reworking the RFC so that it accepts a VaList<'a> as an argument, with the syntax args: ... (not mentioning the type or its lifetime, so you can't reuse it elsewhere in the signature); the compiler can then pass an appropriate lifetime that prevents the VaList from outliving the variadic function. Does that look reasonable to you?

use std::intrinsics::VaArg;

#[no_mangle]
pub unsafe extern "C" fn func(fixed: u32, args: ...) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be mut args: ... then?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good call! Fixed.

Such a declaration looks like this:

```rust
pub unsafe extern "C" fn func(arg: T, arg2: T2, args: ...) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For symmetry, I think we should also allow

extern {
    fn func(arg: u8, arg2: u16, args: ...);
//                              ^~~~~
}

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, for external variadic functions with the current mechanism?

I'm not sure that makes sense to add, since the argument isn't named in C. What would be the advantage of this?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@joshtriplett Just for consistency, the args: has no meaning here 😄. This is really outside of the scope of this RFC, feel free to ignore it.


Note that extracting an argument from a `VaList` follows the platform-specific
rules for argument passing and promotion. In particular, many platforms promote
any argument smaller than a C `int` to an `int`. On such platforms, extracting

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is true for all platforms, it's required by the C standard.

If the argument has integral or enumeration type that is subject to the integral promotions, or a floating-point type that is subject to the floating-point promotion, the value of the argument is converted to the promoted type before the call.

Basically, any integral type smaller than an int is converted to int (and any floating point type smaller than a double (basically float or short float, for implementations that provide the latter, is converted to double)).

Side note, this paragraph points out something important:

After these conversions, if the argument does not have arithmetic, enumeration, pointer, pointer to member, or class type, the program is ill-formed.

emphasis on class type and enumeration - this means that it should be valid for programs to impl VaArg for their #[repr(C|uN|iN)] structure and simple enumeration types.

I'd also argue that VaArg should be implemented for &T, &mut T, Option<&T>, and Option<&mut T>. This is all unsafe anyways :)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll phrase the bit about promotions more carefully, and put some thought into the structure and enum cases.

I'd prefer not to implement VaArg for references; there are plenty of ways to convert a pointer to a reference in unsafe code, if you really want a reference, and those ways allow you to inject the appropriate lifetime of that reference more easily.

Copy link

@strega-nil strega-nil Sep 4, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I prefer not treating references specially - they are just pointers after all. And it just makes sense sometimes - if you're implementing printf:

#[no_mangle]
unsafe extern "C" printf(fmt: &CStr, args: ...) -> c_int {
  for ch in fmt {
    // ...
    // found a %s:
    {
      print!("{}", args.next::<&CStr>());
    }
  }
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Except &CStr is a fat pointer.

Copy link

@strega-nil strega-nil Sep 4, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jethrogb for now ;)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ubsan If, at some point in the future, there's a reasonable possibility of extracting a reference directly from a variadic function argument and getting a reasonable result, then we can add such a mechanism at that time. Until then, though, I'd like to stick with only allowing raw pointers.

Copy link
Member Author

@joshtriplett joshtriplett Sep 5, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ubsan I clarified the RFC's details about promotion to take your explanation into account, and to avoid suggesting "platform-specific".

Copy link

@strega-nil strega-nil Sep 6, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@joshtriplett I disagree with the reasoning there. There is no reason not to allow references. You almost never actually want raw pointers when dealing with C; you want Option<&T> (and given that references are valid in C functions...)


impl<'a> VaList<'a> {
/// Extract the next argument from the argument list.
pub unsafe fn arg<T: VaArg>(&mut self) -> T;

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't like this name - I'd prefer next.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

bikeshed: next_unchecked?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think arg was chosen to match the C name: va_arg... If we are going to deviate from this, we could change the interface altogether...

fn my_vararg_fn(args: VaList<c_int>) {...}

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

but you can pass any type through variable arguments, not just one type.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was chosen to match va_arg, yes. I intentionally didn't use next, because I don't want to give the impression that this works anything like an iterator, rather than a razor-sharp ball of unsafety.

impl<'a> Drop for VaList<'a>;

/// The type of arguments extractable from VaList
unsafe trait VaArg;

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what's the point of the VaArg trait anyways? No Rust types should be invalid to pass/take in variadic functions (although perhaps non-#[repr(C)] types shouldn't be valid? there's no technical reason though).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 for replacing the trait with a #[repr(C)] check.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ubsan I don't think it makes sense to accept arbitrary Rust types, and in particular not references, or arbitrary structures. But accepting #[repr(C)] structures makes sense (that's possible in C), as does accepting C-style enums with a declared size.

@eddyb
Copy link
Member

eddyb commented Sep 3, 2017

@joshtriplett Not bad, I don't have any more suggestions at this time, as I believe that solves the problem of the short-lived data escaping the function.

[reference-level-explanation]: #reference-level-explanation

LLVM already provides a set of intrinsics, implementing `va_start`, `va_arg`,
`va_end`, and `va_copy`. The implementation of `VaList::start` will call the
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since you've changed the RFC to args: ..., the VaList::start function should no longer be mentioned. Just say va_start will be automatically inserted at the start of the function.

name along with the `...`:

```rust
pub unsafe extern "C" fn func(fixed: u32, ...args) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

VaList::start should now be an alternative itself. This is at best an "alternative syntax".

@kennytm
Copy link
Member

kennytm commented Sep 3, 2017

This is getting long, so I write it as the main comment here.

You mentioned

A function declared with extern "C" may accept a VaList parameter, corresponding to a va_list parameter in the corresponding C function. For instance, the libc crate could define the va_list variants of printf as follows:

unsafe extern "C" { pub fn vprintf(format: *const c_char, ap: VaList) -> c_int; }

This will conflict with impl Drop for VaList which associated va_end to it. Consider

unsafe extern fn forward_to_printf(fmt: *const c_char, ap: ...) -> c_int {
    vprintf(fmt, ap)
}

Here ap is moved into vprintf, which means vprintf should be responsible for dropping ap. However, in C, it is the caller's responsibility to call va_end:

int forward_to_printf(const char* fmt, ...) {
    va_list ap;
    va_start(ap, fmt);
    vprintf(fmt, ap);
    va_end(ap);
}

That means vprintf cannot take VaList as-is. Perhaps it could provide a method:

impl VaList {
    unsafe fn as_va_list(&mut self) -> va_list;
}

so the caller still retains the ownership of VaList:

unsafe extern fn forward_to_printf(fmt: *const c_char, ap: ...) -> c_int {
    vprintf(fmt, ap.as_va_list())
}
(Some non-standard stuff)

Uniquely-borrowing the VaList instead of moving it will allow you to translate weird things like this into Rust:

void just_print_one(va_list ap) {
    printf("%d\n", va_arg(ap, int));
}
void print_nums(size_t n, ...) {
    va_list ap;
    va_start(ap, n);
    for (size_t i = 0; i < n; ++ i) {
        just_print_one(ap);
    }
    va_end(ap);
}
int main() {
    print_nums(4, 1, 2, 3, 4);
}

Note that this violates the C standard §7.16/3, so it is actually invalid. But this works on my machine™.

The object ap may be passed as an argument to another function; if that function invokes the va_arg macro with parameter ap, the value of ap in the calling function is indeterminate and shall be passed to the va_end macro prior to any further reference to ap253).

253) It is permitted to create a pointer to a va_list and pass that pointer to another function, in which case the original function may make further use of the original list after the other function returns.

Actually, would it be even better if args: ... means the type of args is &mut VaList instead of VaList? And va_list is an alias of *mut VaList.

@aturon aturon merged commit 09e34be into rust-lang:master Sep 29, 2017
@aturon
Copy link
Member

aturon commented Sep 29, 2017

This RFC has been merged! The tracking issue keeps the syntax explicitly opened as a not-completely-resolved question; further discussion on that point should move over to the tracking issue.

Thanks @joshtriplett!

@joshtriplett
Copy link
Member Author

The RFC as accepted proposes putting VaList in core::intrinsics. However, in rust-lang/rust#49878 , @dlrobertson observed that core::intrinsics has a module-level #![unstable] attached to it, requiring the core_intrinsics feature to use. On that basis, he suggested putting it in core::ffi instead. If that seems reasonable, I could update the RFC to change the proposed location of the API.

(I'd like to avoid bikeshedding this if possible. It definitely has to go in core so that it works without std. The exact location in core doesn't matter much. core::ffi seems like the appropriate place to me.)

@strega-nil
Copy link

@joshtriplett that seems reasonable, although I want to make sure - it's still on unstable, right?

core::intrinsics seems like an incredibly weird place for VaList and associated things.

@joshtriplett
Copy link
Member Author

@ubsan It's still going to be unstable until stabilized, yes. The issue was that it'd be non-trivial to ever stabilize bits of core::intrinsics.

Also, I think core::intrinsics still makes sense for the underlying functions like va_start and va_arg, which we likely won't stabilize, unlike the higher-level VaList interface.

@dlrobertson
Copy link

dlrobertson commented May 16, 2018

@ubsan The VaList implementation will be unstable under the feature c_variadic

@strega-nil
Copy link

@joshtriplett right, that makes sense. We really don't want to stabilize bits of core::intrinsics - we did it once with transmute, and it's... eergh

@dlrobertson
Copy link

I've started implementing variadic functions in rust, but some people have suggested changing the syntax to pub unsafe extern "C" fn func(fixed: u32, mut args: ...VaList). The benefit of this being, this will be closer to the syntax used for variadic generics. If we go this route, this will also mean I will not have to add another ast::TyKind and hir::TyKind, which will drastically simplify the code required.

@joshtriplett
Copy link
Member Author

joshtriplett commented Dec 11, 2018

@dlrobertson Writing VaList gets into questions of "but what about lifetimes". We discussed that alternative as part of the RFC and the syntax we settled on didn't have that. Could we please stick with just mut args: ... if possible?

Could you explain more about the complexity that would create?

@dlrobertson
Copy link

dlrobertson commented Dec 11, 2018

@joshtriplett I have git branches with each implemented, just not generating the va_start/va_end yet. My main concern is the diverging syntaxes for variadic generics and variadic arguments.

Writing VaList gets into questions of "but what about lifetimes".

I'm not sure what is meant by this? The VaList is generated inside the function, so it must not be able to escape. As a result, the type we generate in typeck must be subst with a scoped region for the given function. Then you get a lifetime error when you run something like the following.

fn test(fixed: i32, ap: ...VaList) -> VaList { ap }
// or even
fn test<'a>(fixed: i32, ap: ...VaList<'a>) -> VaList<'a> { ap }

Could you explain more about the complexity that would create?

The complexity is just a side point and mainly has to do with the internals of parsing in the compiler. Either way we've converged after typeck.

  • Using ... syntax
    • Add a new TyKind to the ast and hir (not TyS)
    • Parse the new type in parse_ty_common
    • set variadic for the function if TyKind::VarArgs was found
  • Using ...VaList
    • Add variadic to the ast and hir Arg
    • If we find ... eat it and set variadic for the arg
    • set variadic for the function if arg.variadic == true

Adding a new TyKind is fairly invasive, but like I said, I have both implemented, so I'll be okay with either.

@jethrogb
Copy link
Contributor

jethrogb commented Dec 11, 2018

I'm not sure what is meant by this? The VaList is generated inside the function, so it must not be able to escape. As a result, the type we generate in typeck must be subst with a scoped region for the given function. Then you get a lifetime error when you run something like the following.

fn test(fixed: i32, ap: ...VaList) -> VaList { ap }
// or even
fn test<'a>(fixed: i32, ap: ...VaList<'a>) -> VaList<'a> { ap }

I believe the issue is that you're writing a type that has a lifetime parameter, except it's not possible to write the correct lifetime as there is no name for it. By writing mut args: ... you're sidestepping the issue because you're not naming the type.

@dlrobertson
Copy link

I believe the issue is that you're writing a type that has a lifetime, except it's not possible to write the correct lifetime.

Good point. That could indeed be confusing. For now I'll continue with just ... then.

@joshtriplett
Copy link
Member Author

joshtriplett commented Dec 11, 2018 via email

@Ixrec
Copy link
Contributor

Ixrec commented Dec 11, 2018

I also see it as a good thing that C variadics get a syntax which is similar enough to variadic generics to be unsurprising, but still distinct enough that it's hard to get confused about which feature you're looking at. Because they are two very different and fundamentally incompatible features.

@eddyb
Copy link
Member

eddyb commented Dec 11, 2018

@dlrobertson Writing VaList gets into questions of "but what about lifetimes"

This is easily solved by checking that the lifetime parameter of VaList doesn't show up anywhere else in the signature.

It needs to be an "universally quantified" lifetime anyway, just like VaList::copy, so that the body of the function is checked with 0 assumptions, and without the type being explicitly spelled out, we would have to synthesize that ourselves somewhere later in the compiler.


@dlrobertson got a bit into the weeds of the implementation, but this is my perspective on why I think it's better for the user to write out VaList:

With e.g. args: VaList... you're explicit that the variadics you're using are C ones, and what the mechanism you interact with to extract arguments, is (i.e. that you have a VaList value).
I'd rather have args: VaList... (C) vs args: A... (VG) than args: ... vs args: A....

We could even rename VaList to make it clearer it's C variadics (CVaList?).

(Also, a random note: Pat: ...Type or Pat...: Type are suboptimal choices of syntax when you take into account Pat: Type becoming Pat syntax)

@joshtriplett
Copy link
Member Author

That seems like revisiting the same argument that was made and explored during the RFC. Is there a new argument here that wasn't previously explored?

@dlrobertson
Copy link

dlrobertson commented Feb 25, 2019

Added a comment on rust-lang/rust#57760, but realized I should probably post here too. We need a library feature-gate for VaList and a language feature-gate for C-variadic definitions. In addition, a language feature gate cannot have the same name as a library feature gate. As a result, I think we're forced to use two feature gates:

  • c_va_list: library feature gate forVaList
  • c_variadic: language feature gate for C-variadic function definitions

@Centril
Copy link
Contributor

Centril commented Feb 25, 2019

@dlrobertson Sounds good to me, go for it. :)

@dlrobertson
Copy link

Just FYI @varkor fixed the issue with feature-gates so the c_variadic feature enables both core::ffi::VaList and Rust defined C-compatible variadic functions in rust-lang/rust#57760

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-ffi FFI related proposals. A-syntax Syntax related proposals & ideas A-typesystem Type system related proposals & ideas final-comment-period Will be merged/postponed/closed in ~10 calendar days unless new substational objections are raised. T-lang Relevant to the language team, which will review and decide on the RFC.
Projects
None yet
Development

Successfully merging this pull request may close these issues.