Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Amend Rc/Arc::from_raw() docs regarding unsafety #68099

Merged
merged 3 commits into from
Mar 22, 2020

Conversation

lukaslueg
Copy link
Contributor

@lukaslueg lukaslueg commented Jan 10, 2020

This question on SO boils down to "is it safe to ::from_raw() a Rc<T>/Arc<T> using a dummy T even if T is never dereferenced via the new Rc/Arc?". It almost never is.

This PR amends the docs of from_raw() regarding this point.

Constructing an Rc/Arc is unsafe even if the wrapped `T`
is never dereferenced.
@rust-highfive
Copy link
Collaborator

r? @shepmaster

(rust_highfive has picked a reviewer for you, use r? to override)

@rust-highfive rust-highfive added the S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. label Jan 10, 2020
@Dylan-DPC-zz
Copy link

I'm not sure if it applies to this PR as well, but worth noting this comment by @Mark-Simulacrum regarding something similar on another PR that added documentation about something being unsafe:
#67722 (comment)

/// This function is unsafe because improper use may lead to memory problems. For example, a
/// double-free may occur if the function is called twice on the same raw pointer.
/// This function is unsafe because improper use may lead to memory unsafety,
/// even if `T` is never accessed. For example, a double-free may occur if the function is
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess this means "if Rc<T> is never accessed"`?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

True as well, although the pitfall here is to assume accessing Rc<T> itself was safe if T is never touched.

even if Rc<T> or T is never accessed.

?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This whole paragraph seems a bit strange to me. Underspecified in a way. So what are the conditions for this method to be safe?

  • What's specified in the first paragraph (from a previous into_raw call)
  • Only call from_raw once per raw pointer.
  • T must (still) be initialized so that it can be dropped.

That's it, right? Shouldn't we explicitly document that? We can still mention the double free later.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's worth noting (in my opinion) that the from_raw once per raw pointer makes relatively little sense, and IMO represents more hassle than it's worth. It means you can't technically store a pointer and then clone off copies of it, i.e., something like this would not be valid, but I think should be.

Other than that, I don't see any issues with the other points.

unsafe fn clone_arc(ptr: *const T) -> Arc<T> {
    let a = Arc::from_raw(ptr);
    let copy = a.clone();
    mem::forget(a);
    copy
}

let ptr = Arc::into_raw(some_arc);
clone_arc(ptr);
clone_arc(ptr);
clone_arc(ptr);

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. I guess we can just say "the user of from_raw has to make sure a specific T is only dropped once". Then it's the users responsibility.

@@ -573,10 +573,11 @@ impl<T: ?Sized> Rc<T> {
/// Constructs an `Rc` from a raw pointer.
///
/// The raw pointer must have been previously returned by a call to a
/// [`Rc::into_raw`][into_raw].
/// [`Rc::into_raw`][into_raw] using the same `T`.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is not quite true, or at least is likely over-constraining. For example, I believe it is fine to go between the unsized representation and the sized representation via pointer casts (e.g,. Rc<T> unsizes to Rc<dyn Debug> and then you into_raw, cast to T, and from_raw).

I'm not sure what exact wording we want to guarantee here, though, maybe just same T is in practice both sufficient and good enough for most use cases. Maybe something like "layout and alignment equivalent to T; note that this is essentially a reference to reference transmute if the T differs from the one used by into_raw, and so the same conditions apply"?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I thought about that as well, it could be done. My thinking, however, is that it's too flaky to try to document under which conditions this is safe (and therefore binds all future changes to Rc/Arc).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about this? (basically the same what @Mark-Simulacrum suggested)

The raw pointer must have been previously returned by a call to a Rc<U>::into_raw, where U must have the same alignment and memory layout as T (which is trivially satisfied if T == U). Note that if U is not T, this is basically like transmuting references of different types. See mem::transmute for more information on what restrictions apply in that case.

I usually wouldn't mind over-constraining, but if we document this exactly, this PR is just a doc bug fix.

@Mark-Simulacrum
Copy link
Member

I'm not sure if it applies to this PR as well, but worth noting this comment by @Mark-Simulacrum regarding something similar on another PR that added documentation about something being unsafe:
#67722 (comment)

No, that's specifically because we have not yet firmly decided whether Any should be an unsafe trait or not, so I didn't want to go adding documentation to that effect before we did so.

There's open questions here as well but I don't think they're "some other PR is changing things here as well" at this point.

I'm going to r? @LukasKalbertodt -- we'll likely want to nail down some proposed wording and then possibly FCP merge (since it's specifying, and likely narrowing in some sense, the allowed uses of into/from_raw).

@LukasKalbertodt
Copy link
Member

then possibly FCP merge (since it's specifying, and likely narrowing in some sense, the allowed uses of into/from_raw).

But if this PR just specifies things that were true all along, this is just a doc bug fix, right? In that case an FCP shouldn't be necessary, should it? Not entirely familiar with the procedures yet.

@Mark-Simulacrum
Copy link
Member

I believe so.

However, I sort of expect we'll be moving from "unclear specification" to "fairly clear specification" -- in that case I would get sign off, personally. But if we end up with just general commentary without specification, then it seems fine to forgo FCP.

@JohnCSimon
Copy link
Member

JohnCSimon commented Jan 26, 2020

Ping from triage: @LukasKalbertodt any updates on this review?

@LukasKalbertodt LukasKalbertodt added S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Jan 26, 2020
@lukaslueg
Copy link
Contributor Author

lukaslueg commented Jan 27, 2020

New proposal for from_raw, including the comments above.

Constructs an Rc<T> from a raw pointer.

The raw pointer must have been previously returned by a call to Rc<U>::into_raw where U must have the same size and alignment as T. This is trivially true if T == U.
Note that if U is not T but has the same size and alignment, this is basically like transmuting references of different types. See mem::transmute for more information on what restrictions apply in this case.

The user of from_raw has to make sure the specific value of T is only dropped once.

This function is unsafe because improper use may lead to memory unsafety, even if the returned Rc<T> is never accessed.

The same would apply to Arc<T>::from_raw.

Notice that it says "... the same size and alignment ...", not "... the same alignment and memory layout ...", because the inner layout of T. does not need to be the same for from_raw to return a valid Rc. There may still be problems if it is dereferenced to T but there had been ample warning.

I'm unsure about the single sentence regarding "... only dropped once". Are people aware how to meet this requirement if they tinker with raw pointers?

@LukasKalbertodt
Copy link
Member

@lukaslueg That sounds good to me!

I'm unsure about the single sentence regarding "... only dropped once". Are people aware how to meet this requirement if they tinker with raw pointers?

I think this should be sufficient. It's an unsafe method after all, people are expected to know what they are doing and research any unclear points.

If you push that as a commit, I'd be fine with starting an FCP merge (as was suggested above).

@lukaslueg
Copy link
Contributor Author

Pushed.

Quick recap: The point of this PR is to amend the docs regarding the behavior of [Arc/Rc]::from_raw; memory unsafety can occur if the inner type in from_raw is different from the original type, even if the resulting Arc/Rc is never accessed.

@LukasKalbertodt LukasKalbertodt added the T-libs-api Relevant to the library API team, which will review and decide on the PR/issue. label Jan 29, 2020
@LukasKalbertodt
Copy link
Member

Let's see if I already have the rights to...

@rfcbot fcp merge

This is a rather minor documentation fix regarding the safety of {Arc, Rc}::from_raw. But since the new documentation is pretty specific and these are important types, @Mark-Simulacrum suggested going through an FCP with this.

@sfackler
Copy link
Member

@rfcbot fcp merge

@rfcbot
Copy link

rfcbot commented Jan 29, 2020

Team member @sfackler has proposed to merge this. The next step is review by the rest of the tagged team members:

No concerns currently listed.

Once a majority of reviewers approve (and at most 2 approvals are outstanding), this will enter its final comment period. If you spot a major issue that hasn't been raised at any point in this process, please speak up!

See this document for info about what commands tagged team members can give me.

@rfcbot rfcbot added proposed-final-comment-period Proposed to merge/close by relevant subteam, see T-<team> label. Will enter FCP once signed off. disposition-merge This issue / PR is in PFCP or FCP with a disposition to merge it. labels Jan 29, 2020
@LukasKalbertodt LukasKalbertodt removed the S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. label Jan 29, 2020
@hdhoang hdhoang added the S-waiting-on-team Status: Awaiting decision from the relevant subteam (see the T-<team> label). label Jan 30, 2020
Comment on lines +575 to +577
/// The raw pointer must have been previously returned by a call to
/// [`Rc<U>::into_raw`][into_raw] where `U` must have the same size
/// and alignment as `T`. This is trivially true if `U` is `T`.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is same size and align strong enough? What stops rustc from ordering the strong and weak fields of RcBox and ArcInner in a different order for different T?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Technically, nothing, but we know today that rustc will not do so I believe. I continue to believe that it would be a good idea to add #[repr(C)] or something like it to RcBox and ArcInner, since there should be no reason to not guarantee (internally at least) that the field layout is consistent.

Notably, the data field must be last, due to unsizing, and reodering two usize fields is not currently ever going to happen anyway, I believe.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

reordering two usize fields is not currently ever going to happen anyway, I believe

Currently it doesn't, but rust-lang/unsafe-code-guidelines#36 is about whether to make this a guarantee, and I don't think it's decided yet. For example PGO-based field layout would absolutely reorder fields with the same size and align to keep frequently read data together and frequently written data in a different cache line.

I don't necessarily intend on blocking this PR on this, but I would feel more comfortable with #[repr(C)] on those types.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would be in favor of adding #[repr(C)] as well. Doing so in this PR makes sense to me.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Amended RcBox and ArcInner with repr(C)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you also add a comment to those repr(C) annotations that notes that they're not for C-compat, but strictly so that the layout is equivalent for transmutable types (or so)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

force-pushed

Future-proof these types in case rustc reorders
the inner fields. As per discussion in PR rust-lang#68099.
/// This function is unsafe because improper use may lead to memory problems. For example, a
/// double-free may occur if the function is called twice on the same raw pointer.
/// The user of `from_raw` has to make sure a specific value of `T` is only
/// dropped once.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does this mean? Isn’t Rc::drop the one responsible for dropping T when the refcount reaches zero?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The user is responsible for not calling from_raw (and allowing the returned value to drop) "too many times" is what this is getting at. I want to avoid saying something like "only once," since it can be incredibly useful to keep around a single pointer in FFI code and "clone out" Rc's from it, for example.

Copy link
Contributor

@SimonSapin SimonSapin Feb 13, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I see what you’re getting at but I feel this one sentence does not express it well. Maybe we could talk about how each Rc or Arc “owns” one strong reference, and into_raw/from_raw transfer ownership? I think that’s a better mental model, and double-drop is just one of the possible (bad) consequences.

(By the way I think there should be an alternative to from_raw that does not transfer ownership of the strong reference as a less fragile alternative to ManuallyDrop::new(Rc::from_raw(ptr)), but that’s out of scope for this PR. Maybe &*mut T -> &Rc<T>?)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed on both points. I'm not sure of the exact wording, but maybe @lukaslueg can come up with some draft wording?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm a little hesitant on this point, for fear of convolution.

Calling into_raw() / from_raw() does not modify the internal reference counter, cloning and dropping the resulting Rc<T> does. One can think of these functions as temporarily lending exactly one "unit of ownership" to the raw pointer. The compiler can't check unique ownership through raw pointers, though, especially as all raw pointers are Copy. The user has to manually ensure T is always dropped exactly once, which is not the case if (among other things) T is dropped through the raw pointer or if from_raw() is called multiple times on inadvertently made copies of the raw pointer.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Mark-Simulacrum any thoughts on the previous comment?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's certainly more detailed :)

However, I think it should clarify that even if you know T is safe to drop lots of times (e.g., () unit type), it is still not okay to let Rc drop more than once for the "last" time.

@rfcbot rfcbot added final-comment-period In the final comment period and will be merged soon unless new substantive objections are raised. and removed proposed-final-comment-period Proposed to merge/close by relevant subteam, see T-<team> label. Will enter FCP once signed off. labels Mar 12, 2020
@rfcbot
Copy link

rfcbot commented Mar 12, 2020

🔔 This is now entering its final comment period, as per the review above. 🔔

@rfcbot rfcbot added finished-final-comment-period The final comment period is finished for this PR / Issue. and removed final-comment-period In the final comment period and will be merged soon unless new substantive objections are raised. labels Mar 22, 2020
@rfcbot
Copy link

rfcbot commented Mar 22, 2020

The final comment period, with a disposition to merge, as per the review above, is now complete.

As the automated representative of the governance process, I would like to thank the author for their work and everyone else who contributed.

The RFC will be merged soon.

@LukasKalbertodt
Copy link
Member

@bors r+ rollup

@bors
Copy link
Contributor

bors commented Mar 22, 2020

📌 Commit 586c7e3 has been approved by LukasKalbertodt

@bors bors added S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. and removed S-waiting-on-team Status: Awaiting decision from the relevant subteam (see the T-<team> label). labels Mar 22, 2020
bors added a commit to rust-lang-ci/rust that referenced this pull request Mar 22, 2020
Rollup of 10 pull requests

Successful merges:

 - rust-lang#68099 (Amend Rc/Arc::from_raw() docs regarding unsafety)
 - rust-lang#70172 (parse/lexer: support `StringReader::retokenize` called on external files.)
 - rust-lang#70209 (parser: recover on `for<'a> |...| body` closures)
 - rust-lang#70223 (fix type of const params in associated types.)
 - rust-lang#70229 (more clippy fixes)
 - rust-lang#70240 (Return NonZeroU64 from ThreadId::as_u64.)
 - rust-lang#70250 (Remove wrong entry from RELEASES.md)
 - rust-lang#70253 (Remove another wrong entry from RELEASES.md)
 - rust-lang#70254 (couple more clippy fixes (let_and_return, if_same_then_else))
 - rust-lang#70266 (proc_macro_harness: Use item header spans for errors)

Failed merges:

r? @ghost
@bors bors merged commit 0bc5fc9 into rust-lang:master Mar 22, 2020
@lukaslueg lukaslueg deleted the into_raw_unsafe branch March 22, 2020 18:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
disposition-merge This issue / PR is in PFCP or FCP with a disposition to merge it. finished-final-comment-period The final comment period is finished for this PR / Issue. S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. T-libs-api Relevant to the library API team, which will review and decide on the PR/issue.
Projects
None yet
Development

Successfully merging this pull request may close these issues.