Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Opaque Data structs for FFI #1993

Closed
wants to merge 1 commit into from
Closed

Conversation

mystor
Copy link

@mystor mystor commented May 6, 2017

This RFC is an offshoot of (and hopes to supersede) #1861 as mentioned in this comment.

This is a rewrite of the original RFC with a focus on the simpler syntactically Opaque unsized struct, and tries to serve as a document exploring both the motivation for implementing this feature in rust today, as well as the different approaches for handling the size_of_val and type_of_val features.

I made the decision to focus the core text of the RFC on the DynSized approach to the size_of_val and type_of_val problem, as it is the solution to the problem which causes the largest change to the language, and needs the most exploration. The other solutions are mentioned in the Alternatives section, and are simpler in concept and implementation, so need less text discussing them. However, I have tried to keep the pieces of the RFC which discuss this separated, so that it is easy to tell which parts of the RFC are related to the DynSized trait, and which are related to the Opaque struct.

rendered

@mystor mystor mentioned this pull request May 6, 2017
@kennytm
Copy link
Member

kennytm commented May 6, 2017

Reading the RFC makes me think, could impl !Sized for Opaque {} be an alternative to a lang-item + DynSized? It seems not because:

  • size_of_val would still work on T: !Sized unless the compiler special-case it for slices and traits
  • There is no guarantee that impl !Sized would still ensure &Opaque is thin
  • Supporting impl !Sized is more work than a special lang item
  • The general semantic is unclear


Finally, many trait parmaeters would be updated in the standard library to relax
their trait bounds from `?Sized` to `?DynSized` which should be a
backwards-compatible change.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(nit: parmaeters ↦ parameters)

size_of_val is used indirectly whenever the type needs to manage allocation itself, so types like Box and Rc need to remain T: ?Sized.

Traits like Borrow and AsRef should be loosened to ?DynSized. UnsafeCell should also be relaxed to ?DynSized provided that it is safe to transmute an &mut Opaque to &UnsafeCell<Opaque>.

It may not be an easy change.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This could fairly easily be implemented, and then trait bounds relaxed on various places where it is safe in a follow up. I agree that types like Rc shouldn't have their bound relaxed from T: ?Sized to T: ?DynSized.

I can look through the standard library and determine which bounds should be relaxed, adding that to the RFC if we want. Actually relaxing the bounds shouldn't be too hard.

@mark-i-m
Copy link
Member

mark-i-m commented May 7, 2017

The syntax Trait + ?DynSized could also be written which would relax this DynSized requirement, and not include the vtable entries for size_of_val and align_of_val

What does it mean to create a trait object of an opaque type?


Perhaps NonOpaque or Transparent would be a better name than DynSized.

Strictly speaking, we cannot say that the type can be figured out at runtime, right?


What about using libc::c_void?


Also, I generally don't know how I feel about this... it is definitely an important problem, and I like this solution's concept, but it seems to touch rather a lot of stuff across the language. That makes me nervous. I feel like there are possibly a lot of places where bounds would change and unexpected behavior might happen, like the point that @kennytm makes here.

@mystor
Copy link
Author

mystor commented May 7, 2017

What does it mean to create a trait object of an opaque type?

A reference to a trait, or a trait object &Trait is represented something like this:

// A trait like `Trait` with a method:
trait Trait {
    fn method(&self);
}

// Is represented something like this (I haven't actually looked 
// into Rust's exact VTable layout, so this might be wrong...)
struct TraitVTable {
    size: usize,
    align: usize,
    methods: TraitVTableMethods,
}

struct TraitVTableMethods {
    method: fn(this_ptr: *const u8),
}

struct &Trait {
    data: *const u8,
    vtable: *const TraitVTable,
}

If you have an opaque type:

struct Mine(Opaque);

impl Trait for Mine {
    fn method(&self) { println!("Hello!"); }
}

Would then be cast to a trait object by taking the pointer to the Mine object and putting it in the data slot, and taking a pointer to the vtable and putting it in the vtable slot. A trait object which references an Opaque object would be written as &Trait + ?DynSized and would look something like:

struct &Trait + ?DynSized {
    data: *const u8,
    vtable: *const TraitVTableMethods,
}

That way it doesn't need a size and align field in its VTable.

Perhaps NonOpaque or Transparent would be a better name than DynSized.

Strictly speaking, we cannot say that the type can be figured out at runtime, right?

We do know the type, and if we have a trait object of the opaque type it can be figured out at runtime, we just cannot know the layout of the type. That's what Opaque means (just like forward-declared types in C/C++).

What about using libc::c_void?

That type is not safe to put behind a & or &mut pointer as it is represented as struct c_void(u8). I cover that possibility in the Motivations section.

If we do land this type, it would be awesome to change the definition of libc::c_void to use an Opaque definition, but that would be a backwards-incompatible change so we would need a major version bump of libc, which would be unfortunate.

Also, I generally don't know how I feel about this... it is definitely an important problem, and I like this solution's concept, but it seems to touch rather a lot of stuff across the language. That makes me nervous. I feel like there are possibly a lot of places where bounds would change and unexpected behavior might happen, like the point that @kennytm makes here.

I agree that this is an unfortunately large change to the language, which is why I try to also mention the other, much smaller, options which don't involve introducing the DynSized trait.

This was designed such that no bounds would change without them being explicitly annotated, so no unexpected behavior should happen. If some behavior would change anywhere, that's a bug in the RFC's design. This RFC should be 100% backwards-compatible.

The main reason why DynSized was used in the main body of the RFC is because it needs the most explanation, as it is the most complex option, and I didn't want to elaborate on all of this in the Alternatives section. It also needs the most exploration of its impacts on the rest of the language.

@mystor
Copy link
Author

mystor commented May 7, 2017

Reading the RFC makes me think, could impl !Sized for Opaque {} be an alternative to a lang-item + DynSized?

Nope, I think we would need both impl !Sized and DynSized. impl !Sized could be used as a replacement for lang-item. The other option with impl !Sized would be to make it so each implementation has to choose its own size_of_val and align_of_val, and people writing individual bindings would have to write specific code for each type that they wrap which calculates the correct values.

@kennytm
Copy link
Member

kennytm commented May 7, 2017

@mystor Well RFC #19 will need to be heavily modified to allow actual implementation in an impl !Sized. I guess a lang-item is good enough 😆.

// wat.
impl !Sized for CStr {
    fn size_of_val(&self) -> usize { strlen(...) + 1 }
}

@mystor
Copy link
Author

mystor commented May 7, 2017

@kennytm I do mention how this would be implemented under this RFC - It would be done by writing impl DynSized for CStr.

@nrc nrc added the T-lang Relevant to the language team, which will review and decide on the RFC. label May 11, 2017
@aturon aturon self-assigned this May 11, 2017
@mark-i-m
Copy link
Member

@mystor

Ah, got it. So self in these methods would then only be usable for calling other methods? Since that data is opaque, what is the point of that? Why not just write a function?

Strictly speaking, we cannot say that the type can be figured out at runtime, right?

Sorry, this is a typo. It should say "we cannot say that the size can be figured out at runtime"... that's why I proposed a name that does not include the word "sized".

That type is not safe to put behind a & or &mut pointer as it is represented as struct c_void(u8). I cover that possibility in the Motivations section.

If we do land this type, it would be awesome to change the definition of libc::c_void to use an Opaque definition, but that would be a backwards-incompatible change so we would need a major version bump of libc, which would be unfortunate.

Got it 👍

It also needs the most exploration of its impacts on the rest of the language.

Mostly, I am thinking of weird corner cases that might impact soundness or something... like what is the interaction with ! or PhantomData or closures or lifetimes? For example,

  • Is Fn() -> impl Opaque not opaque? What about an FnMut() -> () closure that contains an Opaque (e.g. through capturing)?
  • Is PhantomData<(!, T)> where T: Opaque a ZST or an Opaque? or does it matter? can this cause weirdness with dropck?

Part of me wonders if there is a more formal way to define Opaqueness in the type system. i.e. is there any type theory exploring this issue which might show us where bugs might show up? (btw, I don't know that much about type theory, so I might be totally wrong).

Also, I do want to underscore that I like the RFC in general... I just have vague uneasiness, too...

@mystor
Copy link
Author

mystor commented May 12, 2017

@mark-i-m

So the first thing I want to point out is that Opaque is not a Trait, rather
it is a concrete type like [u8] or str, which is unsized. You wrote <T: Opaque> in your comment, which is not legal syntax, and would be like writing
<T: [u8]>.

Just like in rust today, you cannot have a value of type Opaque in the same
way you cannot have a value of type [u8], you can only hold a value of these
types behind a reference, like &Opaque or &[u8].

Another value is "made opaque" by including a field of an "opaque" data type as
its last member, for example, the CStr data type would be "made opaque" by
defining it with a single Opaque field.

Ah, got it. So self in these methods would then only be usable for calling other methods? Since that data is opaque, what is the point of that? Why not just write a function?

You would not be able to implement a trait which requires passing around self
by value on an opaque type, just like how you cannot implement it on [u8].
However, just like [u8], you would be able to implement traits which only
require passing around values of the type by reference. For example, struct CStr(Opaque) would be able to implement the trait Deref<Target=[u8]>, as:

struct CStr(Opaque);

impl Deref for CStr {
    type Target = [u8];
    fn deref(&self) -> &[u8] {
        unsafe {
            slice::from_raw_parts(self as *const _ as *const u8, self.len())
        }
    }
}

The reason why I want to support implementing inherit methods and traits on
opaque objects is for consistency with the rest of the types in the language. We
should treat these types the same way as other unsized types as much as
possible. There is no reason to require you to write:

let x: &CStr = ...;
cstr_len(x);
// -- instead of --
x.len();

Sorry, this is a typo. It should say "we cannot say that the size can be figured out at runtime"... that's why I proposed a name that does not include the word "sized".

Values which are Opaque do not implement either Sized or DynSized, which
is what makes them unique. It would be possible with the full extensions of this
RFC to unsafely implement DynSized on one of these values, which would allow
defining methods which calculate size.

Mostly, I am thinking of weird corner cases that might impact soundness or something... like what is the interaction with ! or PhantomData or closures or lifetimes? For example,

  • Is Fn() -> impl Opaque not opaque? What about an FnMut() -> () closure that contains an Opaque (e.g. through capturing)?

You cannot write this expression, as Opaque is not a trait. In addition, it is
not possible to capture a value of type Opaque in a closure, as you cannot
have a value, you can only capture references to a value containing Opaque.

  • Is PhantomData<(!, T)> where T: Opaque a ZST or an Opaque? or does it matter? can this cause weirdness with dropck?

It would be a ZST, just like PhantomData<(!, [u8])>. It would have the same
implications as raw slices do on dropck today.

Part of me wonders if there is a more formal way to define Opaqueness in the type system. i.e. is there any type theory exploring this issue which might show us where bugs might show up? (btw, I don't know that much about type theory, so I might be totally wrong).

This is what we're doing here. We're defining Opaqueness as being !Sized and
!DynSized, and thus having no way of defining what the layout or alignment of
the data behind the reference is, nor ability to gain a value of this type.

Also, I do want to underscore that I like the RFC in general... I just have vague uneasiness, too...

My main comfort with this RFC is that it is very similar to code which we
already have today, namely it is very similar to our existing !Sized types
like [u8] and str. Most of the problems which might come up with adding such
a feature to the type system have already been worked out for those types, with
the small exception of the size_of_val and align_of_val methods which are
dealt with in this RFC.

IMO the main risk here doesn't really come from its interactions with the type system, so
much as the problem of someone implementing it, and how to handle the
size_of_val and align_of_val options. The biggest problem with the RFC as
written is that it adds a new magical built-in trait, like ?Sized, to handle
size_of_val and align_of_val. This is definitely possible and correct, but
may be a change which we don't want to make to the language due to its scale.


unsafe impl DynSized for MyStruct {
fn size_of_val(&self) -> usize {
somehow_determine_size()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Allowing user to provide these functions means calling size_of_val(&x) may take unknown run time (e.g. the implementation for CStr would be O(n)), while the size_of_val(&x) of all existing types are O(1). Is this an acceptable trade-off to make?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a matter of taste I've wondered about myself, but since custom DSTs is a very much optional followup project (I do want them but opaque data is quite worthwhile on its own), I'm not inclined to think the resolution of this question blocks this RFC.

I suppose it wouldn't hurt to still mention that though :).

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I mostly mention the impl DynSized construct because it would be required in order to migrate CStr to using Opaque, rather than because it is something which I think is essential to the core of this RFC. I'd imagine that we would stabilize impl DynSized separately from the Opaque work, and mention as much in the RFC text.

@liigo
Copy link
Contributor

liigo commented Jun 7, 2017

surprised to find that there is no definition of semantic of the DynSized in this rfc

@mystor
Copy link
Author

mystor commented Jun 7, 2017

@liigo what do you think is missing? There is an entire section of the rfc dedicated to describing the dynsized trait, and I tried my best to be thorough.

@liigo
Copy link
Contributor

liigo commented Jun 9, 2017

@mystor Maybe there lacks a summary line of docs of DynSized.

@aturon
Copy link
Member

aturon commented Jul 19, 2017

Given that the original extern type RFC is in FCP, I'm going to close this one out for the time being. We can re-open later if needed.

Thanks, @mystor, for whipping this up!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
T-lang Relevant to the language team, which will review and decide on the RFC.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants