Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Interoperability With C++ Destruction Order #135

Closed
ssbr opened this issue Jan 5, 2022 · 6 comments
Closed

Interoperability With C++ Destruction Order #135

ssbr opened this issue Jan 5, 2022 · 6 comments
Labels
final-comment-period The FCP has started, most (if not all) team members are in agreement major-change Major change proposal T-lang to-announce Not yet announced MCP proposals

Comments

@ssbr
Copy link

ssbr commented Jan 5, 2022

Proposal

Summary and problem statement

Correct C++/Rust interop involves extensive use of ManuallyDrop to control Rust destruction semantics and match with C++, creating unpleasant API surfaces.

Motivation, use-cases, and solution sketches

I'm working on "high-fidelity" C++/Rust interop. In particular, if C++ can work with values, I would like Rust code to be able to do the same, so as to allow seamless migration and interop between the two languages without unexpected design or performance changes. This is possible, even simple, using a library similar to moveit.

However, the semantics around object destruction complicate this, forcing pervasive use of ManuallyDrop.

What's so bad about ManuallyDrop?

It makes the structs difficult to work with. For example, here is normal Rust or C++ code:

// Rust
let mut foo = Foo { x };
foo.x = y;
// C++
auto foo = Foo { .x = x };
foo.x = y;

But with ManuallyDrop, it becomes:

let mut foo = Foo {x: ManuallyDrop::new(x)};
*foo.x = y;

This ends up applying to every non-Copy field in every C++ struct.

We would not accept that kind of user experience for native Rust types, and so, I believe, we should not accept it for FFI C++ types. Ideally, the user experience should be the same, even if there is some increased complexity behind the scenes when automatically generating C++ bindings (e.g. automatically generated Drop implementations.) The more difficult it is to use from Rust, the harder it is to port existing C++ code to Rust, or justify writing new Rust code in an existing C++ codebase.

Destruction Order

C++, unlike Rust, has a fixed initialization and destruction order: fields are initialized in declaration order, and destroyed in the reverse order. This causes the lifetimes of each field to be properly nested: for a struct S {T1 x; T2 y;};, the lifetime of y is within the lifetime of x. This allows self-referential structs to have fields that contain references to other fields above them, without any risk of undefined behavior.

In Rust, initialization order is not fixed (fields are initialized in the order they are specified in the expression that creates the struct), and so the destruction order can be anything. Rust chooses to destroy in declaration order, which means that for self-referential structs, you must actually place self-referential fields above the fields they refer to, and initialize in the reverse order.

For example, consider the following C++ struct:

struct S {
  T1 x;
  T2 y;  // important that this is destroyed before x, for whatever reason (e.g. holds a reference to x).
};

The following equivalently laid out Rust struct would destroy in the incorrect order:

// WRONG
#[repr(C)]
struct S {
  x: T1,
  y: T2,
}

Correct destruction order, today

Today, the following Rust binding would work, because while S is nontrivial, it does not have a user-defined destructor. (That's a common case: for example, 99.8% of structs in my (large) codebase do not have a user-defined destructor, due to the way the Google C++ Style Guide pushes type design.)

#[repr(C)]
struct S {
  x: ManuallyDrop<T1>,
  y: ManuallyDrop<T2>,
}

impl Drop for S {
  fn drop(&mut self) {
    unsafe {
      y.drop();
      x.drop();
    }
  }
}

For as long as this type is present in both C++ and Rust, one of the two languages will need to override field destruction order using the language's equivalent of Drop, where every non-trivially-destructible (C++) or non-Copy (Rust) field is wrapped in that language's equivalent of ManuallyDrop. This presents an ongoing API friction for users.

Note: there is no other way to work around this, today. Any C++/Rust binding which works by calling destructors of the C++ subobjects individually must manually drop in order to drop in the correct order.

Invoking foreign destructors

In both C++ and Rust, destruction occurs in two parts:

  1. The user-defined destructor (Drop in Rust)
  2. The sub-object destruction ("drop glue" in Rust)

The only customization hook is #1: you can define Drop in Rust or the destructor in C++, and this is run in addition to standard destruction logic of #2.

But when destroying an object, you don't get to pick and choose: in both C++ and Rust, you always destroy the entire object, including all subobjects/fields, rather than being allowed to only run part of the destruction logic and leave the rest for later. (This is true both for standard C++, as well as for specific C++ ABIs like the Itanium ABI.)

Therein lies the problem:

struct Foo {
  std::string string;

  ~Foo() {std::cout << "Hello, world."; }
};

If we tried to create an equivalent Rust struct which invokes the destructor via FFI, we'd be stuck:

#[repr(C)]
struct Foo {
  string: CxxString,
}

impl Drop for Foo {
  fn drop(&mut self) {
    // what do we do here?
  }
}

We can't fill in Drop::drop correctly: there is no way to call the destructor that doesn't also result in destroying the string field, which would be a double-destroy, because that field will also be destroyed in Rust by the drop glue.

Correct foreign destruction, today

The only way around this is to wrap every non-Copy field in a ManuallyDrop, so that it is not destroyed by the implicit drop glue:

#[repr(C)]
struct Foo {
  string: ManuallyDrop<CxxString>,
}

impl Drop for Foo {
  fn drop(&mut self) {
    // call C++ std::destroy_at(self) via FFI
    ffi_destroy_Foo(self);
  }
}

Note: there is no other way to work around this, today. Any C++/Rust binding which allows calling the C++ destructor must manually (not) drop, to avoid double-destruction.

Alternatives

Suppress the drop glue via an attribute

We could add an attribute, such as #[manually_drop] (or #[manually_drop_fields]), which suppresses the drop glue from dropping the whole struct: the fields will not be automatically dropped, but must be dropped by hand using e.g. std::ptr::drop_in_place. The field is still dropped when directly assigned to, unlike ManuallyDrop -- from a user's point of view, it looks like any other field, except that it isn't dropped automatically in the drop glue. (Since there's no sensible way to handle it, we should either forbid partial destructuring, or forbid destructuring entirely.)

This is the most general proposal, which solves all the issues above.

In this case, ManuallyDrop would just be another type, defined as:

#[manually_drop]
struct ManuallyDrop<T>{value: T}  // no Drop impl

To have C++-like field destruction order, one can mark the struct #[manually_drop], and manually drop each field in a Drop impl:

#[manually_drop]
#[repr(C)]
struct Foo {
  x: T1,
  y: T2,
}

impl Drop for Foo {
  fn drop(&mut self) {
    unsafe {
      std::mem::drop(&mut self.y);
      std::mem::drop(&mut self.x);
    }
  }
}

And one can invoke a foreign destructor without double-dropping, by suppressing the drop glue with #[manually_drop]:

#[manually_drop]
#[repr(C)]
struct Foo {
  string: CxxString,
}

impl Drop for Foo {
  fn drop(&mut self) {
    // call C++ std::destroy_at(self) via FFI
    ffi_destroy_Foo(self);
  }
}

(This could also improve union ergonomics: we could permit #[manually drop] union U { x: T } in place of union U { x: ManuallyDrop<T> }. These features would also presumably interact in the obvious way: Unions would allow, in addition to Copy types, any #[manually_drop] type which has no Drop impl. (Including, of course, ManuallyDrop itself.))

Reverse field drop order via an attribute

For codebases following the Google C++ Style Guide, the normal case for a struct (a class with public fields) is that it does not have a user-defined destructor. Therefore, it would suffice for the majority of types whose fields are accessible from Rust to simply adopt C++ destruction order, via an attribute that specifies C++ destruction order:

#[drop_order(from_end)]
#[repr(C)]
struct S {
  x: T1,
  y: T2,
}
// no Drop impl needed

(I do not propose making this the default, now or ever -- it would only be useful for repr(C) structs, and only those which are representing C++ types.)

The structure of the attribute allows for expansion: presumably from_end and from_start are present by default. Other drop orders could be:

  • dont_care or optimal or similar: choose the drop order based on what would be most efficient (e.g. drop in the order of the fields as they are laid out in memory).
  • do_not_drop or none: the same thing as #[manually_drop] above.
  • explicit (contentious): one could e.g. write #[drop_order(explicit)] struct S { #[drop_order(1)] a: T1, #[drop_order(0)] b: T2}.

Links and related work

Initial people involved

Changelog

2022-01-06 (major)

  • Changed problem statement to just be about the large amount of ManuallyDrop in C++ interop generally.
  • Changed from DropGlue trait to #[manually_drop] attribute -- much simpler! Also made this the preferred alternativel
  • Added section on calling the C++ destructor.
  • Added section on pinned unions.
  • Added another helpful alternative: mark ManuallyDrop as structurally pinned. (This solves the pinned unions issue only.)

2022-01-10 (minor)

  • Added field assignment examples to "What's so bad about ManuallyDrop?"
  • #[manually_drop]: assignment still drops the field, should forbid (partial) destructuring.

2022-01-11

  • Fixed incorrect ManuallyDrop assignment example
  • Added links to earlier discussions of drop order.

2022-01-20

  • Removed discussion of ManuallyDrop structural pin projection, unions, etc., as they are solvable without language changes.

Summary of discussion in Zulip

<=2022-01-06

  • Aaron Hill: "I think it would be useful to make the attribute name be less ambiguous for someone who doesn't know the 'standard' drop order. Maybe something like #[drop_fields_reversed] or #[drop_from_end]?"
  • nagisa: Proposal insufficiently justified, doesn't explain why ManuallyDrop, which is designed for this, isn't sufficient.
  • Devin Jeanpierre (it me): ManuallyDrop docs say not to use it for this, but to use implicit drop order. Also presents an API surface we wouldn't consider acceptable in non-FFI code.
  • Nick Cameron: "I'd rather see a more general version, something like a manually_drop attribute which means that all
    fields are implicitly manually dropped and then it is the responsibility of the author to drop them in the destructor."
  • Jacob Lifshay: "this would also allow converting ManuallyDrop to a pure library type, no lang item needed"
  • Devin Jeanpierre: That would also solve issues where ManuallyDrop doesn't guarantee it's structurally pinned -- users could write their own ManuallyDrop type which is, and it would be usable in union.
  • Daniel Henry-Mantilla: Ideally would be able to specify e.g. fn drop (bar: Pin<&move Bar>, baz: Pin<&move Baz>), to avoid all the unsafe code that comes with #[manually_drop]

<= 2022-01-10

  • Jake Goulding: not clear on why a proc macro isn't sufficient. Also, does this forbid destructuring?
  • tm: Do fields still get implicitly dropped on assignment? What about with incomplete destructuring?
    • bjorn3: Not very useful without a Drop impl, which suppresses destructuring, so the destructuring question isn't critical.
    • Connor Herman: forbid assignment, defer question to later.
    • Devin Jeanpierre: making e.g. assignment ergonomic is the whole point, let's allow it and make it implicitly drop. #[manually_drop] only suppresses drop glue, not assignment etc. (Also, should probably suppress destructuring, because that would be unsafe.)

<= 2022-01-11

  • Example is wrong; don't need to manually drop on assignment with ManuallyDrop, can use DerefMut instead.
  • scottmcm: at least two drop orders: forward backward. Could be better to use drop_order(backward) (or drop_order(from_end)) rather than drop_from_end. Needs a third order to justify -- perhaps "do_not_drop_fields", which is what #[manually_drop] is? (Note: "dont_care" is suggested later.)
  • Jake Goulding: also potentially manually list out order -- e.g. #[drop_order(explicit)] struct S { #[drop_order(1)] a: T1, #[drop_order(0)] b: T2}

<= 2022-01-17

  • Nick Cameron: Prefer that if you want that level of customization, you do it entirely by hand, not with attributes. Also lets you set drop order dynamically.
  • Some side discussion on C++ trivial ABI vs non-trivial-ABI types. See also https://reviews.llvm.org/D114732.
  • Jake Goulding: "My problem is that implementing Drop also means that I no longer can destructure. That’s the angle I’d love to see addressed." (TODO: address in this proposal?)
  • cuviper: In addition to forward/backward could have drop_order(dont_care) which drops in the most efficient order.
  • Devin Jeanpierre: from_end/from_start rather than forward/backward?

<= 2022-01-20

  • Offline discussion (ACL'd link) with Alice Ryhl.
    • ManuallyDrop cannot be (safely) structurally pinned due to the drop guarantee -- you'd need to guarantee the pinned interior is not forgotten.
    • Regardless of choice of structural pinning, if the ManuallyDrop is private, then you can pin-project it manually and uphold drop guarantee and other pin invariants. So e.g. to create a binding for a union, you would instead expose a newtype wrapper around a union, with accessor methods that do not directly expose Pin<ManuallyDrop<T>> for the fields.
    • Conclusion overall: this is not an unsolvable problem / doesn't need rust changes. (And so I've removed discussion of this from the proposal.)
@ssbr ssbr added major-change Major change proposal T-lang labels Jan 5, 2022
@rustbot
Copy link
Collaborator

rustbot commented Jan 5, 2022

This issue is not meant to be used for technical discussion. There is a Zulip stream for that. Use this issue to leave procedural comments, such as volunteering to review, indicating that you second the proposal (or third, etc), or raising a concern that you would like to be addressed.

@rustbot rustbot added the to-announce Not yet announced MCP proposals label Jan 5, 2022
@cramertj
Copy link
Member

cramertj commented Jan 5, 2022

@rustbot second

I'm happy to be the liason for this.

@rustbot rustbot added the final-comment-period The FCP has started, most (if not all) team members are in agreement label Jan 5, 2022
@pnkfelix
Copy link
Member

pnkfelix commented Jan 11, 2022

old but relevant discussion can be found here: An issue regarding the default drop order rust-lang/rfcs#744, which was resolved by this RFC: rust-lang/rfcs#1857

@cramertj cramertj changed the title Attribute to reverse field destruction order in the drop glue Interoperability With C++ Destruction Order Jan 11, 2022
@nikomatsakis
Copy link
Contributor

It looks like this was seconded, and we just need to create an issue for it -- I"m going to do this!

@pnkfelix
Copy link
Member

pnkfelix commented Aug 9, 2022

I created a tracking issue as described on the initiatives process.

Based on the amount of engagement we saw on the zulip topic for the MCP, I am inclined to create a dedicated Zulip stream for this project.

@nikomatsakis
Copy link
Contributor

I'm closing this proposal as accepted. I do want to note that we've adopted a revised process around this kind of work. This particular work seems to qualify as an "experiment" with @cramertj as the liaison.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
final-comment-period The FCP has started, most (if not all) team members are in agreement major-change Major change proposal T-lang to-announce Not yet announced MCP proposals
Projects
None yet
Development

No branches or pull requests

5 participants