-
Notifications
You must be signed in to change notification settings - Fork 12.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Commit to safety rules for dyn trait upcasting #101336
Comments
This comment was marked as outdated.
This comment was marked as outdated.
also cc @crlf0710, the person driving this initiative |
OK, I've adjusted the proposal slightly so that the validity invariant is now word-aligned, non-null. This is consistent with other pointer-like values and offers us more room for future change if needed. As such, I am ready to ... @rfcbot fcp merge |
Team member @nikomatsakis has proposed to merge this. The next step is review by the rest of the tagged team members: No concerns currently listed. Once a majority of reviewers approve (and at most 2 approvals are outstanding), this will enter its final comment period. If you spot a major issue that hasn't been raised at any point in this process, please speak up! See this document for info about what commands tagged team members can give me. |
I am not sure if that is true -- there could be code relying on this safety invariant. Extreme case -- this fn is sound under the rules as specified above: pub fn evil(p: *const dyn Debug) {
let parts: [usize; 2] = unsafe { mem::transmute(p) };
if parts[1] == 0 {
// A compiler-allocated vtable can never be at address 0.
unsafe { hint::unreachable_unchecked() };
}
} This relates to that question about "type system invariants with wiggle room" that you keep getting back to. :) |
I just want to note that with It is perhaps also worth drawing analogy to slices, which stably have a metadata of unburdened With There's an interesting secondary question, though: do reborrowing rules apply to the pointer's metadata? Given the vtable reference is shared and Because either semantic can be implemented, this needn't block on figuring out One final note: it's possible (if annoying) to provide a constant initializer for even a valid vtable pointer. References to There are enough annoyances and roadblocks there that just treating it as a no is reasonable, but it's at least theoretically possible. Footnotes
|
FWIW, for soundness of those functions, it is enough for the safety invariants to be the same. They don't have to have the same validity invariants.
At least on raw pointers I don't think we should have So the raw ptr vtable type would be more like |
Ah, yes, you're correct, but ... hmm. This is equally true of the validity and safety invariants, right? The example you gave was code assuming that the vtable was not NULL, but that's actually guaranteed to be true because of the validity invariant (the safety invariant further refines that to "vtable must be valid" (which we don't precisely define, but anyway). On the topic of what valid means, I realized that the definition of a valid vtable for a |
To concur and state explicitly: if we want to allow dangling (Definitely not an immediate question, but I wonder if we can't make |
Agreed, that is a meaningless definition. There's no such thing as a "hidden type" in the op.sem. What I proposed is to say that it must be a valid vtable for the given trait, for an arbitrary type. We need it to be for the right trait so that vtable lookups for upcasts and dyn method dispatch will work. |
Validity invariants are an op.sem / UB thing, which means they are whole-program questions: does this program (when interacting with a given environment for I/O) have UB, yes or no? On that level, weakening the validity invariant is always backwards-compatible since it makes fewer programs UB. Weakening becomes a problem when considering individual functions, which usually only makes sense for considering safe functions and asking whether they are sound -- hence this is primarily a concern for safety invariants. If this is an unsafe function, then it must anyway document what the preconditions are. A problem only arises if the precondition is explicitly (or implicitly) documented as "can be an arbitrary valid raw ptr", i.e., if it refers to the validity invariant. I don't know how common of a precondition that is for unsafe functions. It is usually not very useful; I would expect the precondition to be the safety invariant unless explicitly stated otherwise, and if it is explicitly stated I think it will rarely reference the validity invariant and instead just spell out what it requires. A function that says "vtable must be aligned and non-null" remains correct even if validity later allows null vtables. |
🔔 This is now entering its final comment period, as per the review above. 🔔 |
This isn't quite my experience. While this is a common (and probably better) style, I've also seen code which instead of stating the preconditions, states the weakenings. So instead of "the pointer's vtable pointer must be aligned and non-null," something along the lines of "the pointer's vtable pointer is not required to point to a valid vtable." An example might be /// Extract the vtable pointer from a trait object pointer.
///
/// The validity of the resulting pointer is unchanged from that of the input's vtable pointer.
fn get_vtable(p: *const dyn Trait) -> ptr::NonNull<()> {
let meta = ptr::metadata(p);
// elided, since this is more strongly typed it's difficult
} This kind of weakening is more commonly seen with something like This probably still allows weakening the validity requirement, as "safety minus specified" doesn't change, but I'd worry about people substituting "safety minus (current safety additions to validity)" as just "validity". |
Hmm, this is probably off-topic, but I'm not sure I buy this distinction you are trying to draw. It seems like unsafe code authors and the compiler are equally capable of optimizing layouts and other things based on validity invariants. As a silly example, maybe we're doing some kind of hand-rolled type that we want to exploit the niche just like macro_rules! my_option {
($t:path) => {
union MyOption {
data: *const dyn $t,
raw_bytes: [usize; 2]
}
...
}
} Now I create I guess this comes back to what you said that you expect people saying "any value that meets the validity invariant" to be unusual. Maybe! I'm not sure. |
If by "any such value" you mean "anything that satisfies the validity invariant", then that is exactly the case I mentioned where the documentation explicitly or implicitly references the validity invariant. If it instead said "anything that satisfies the validity invariant as it is defined in Rust 1.65", then that would be unambiguous and weakening the validity invariant later would not be a problem. |
@RalfJung I updated the text to read:
|
Can we explicitly leave the validity invariant as something to be specified in the future? i.e. producers of fat raw pointers would be required to satisfy the entire safety invariant as the validity invariant, but consumers of them should not rely on it. |
I feel fairly strongly that such documentation is unconditionally wrong and needs to be fixed. This would include @nikomatsakis 's:
Yes. I also feel very strongly that this needs to be the case. As a matter of fact, I'd even consider "the assumptions an API may make when nothing is explicitly stated" a good candidate for the definition of "safety invariant."
The documentation should imo avoid using "validity" here. The most natural interpretation of this text documentation would imply that the input pointer does not have to satisfy it's validity requirements, which is of course not the case. Better alternatives might be "the vtable pointer is only required to be non-null" or "the safety conditions are the same ones as for
"The input must satisfy it's validity requirements" is always a useless thing to document, because code that violates this necessarily has language UB anyway. Wrt @nikomatsakis 's suggestion:
I'm not sure what this was supposed to say, possibly "doesn't require a valid vtable"? This phrasing is slightly ambiguous. If it is meant to mean "doesn't require a valid vtable pointer," see my point above. If it is meant to mean "doesn't require a pointer pointing to a valid vtable," then the function has disclaimed all safety invariants. I do think we should decide that validity invariants are not stable (in the one direction); this would mean that this documentation is also wrong, and I'm okay with that. It should instead explicity state the assumptions it is making. |
Hmm, I think we should move the discussion about whether people can/will say "accept anything that meets the validity invariant" somewhere else, as I think it's independent of this particular issue. With respect to the question of leaving the validity invariant unspecified: I'd prefer not to. I think we need to make decisions at some point. However, we could make this a "recommendation" to be finalized by the (under discussion) opsem team. I'd be happy with that, since I think this is a good example of the kinds of decisions that team will make. The other point is that the validity invariant I am currently recommending (aligned, non-null) is made in part to be consistent with other validity invariants, so it makes sense to consider the group as a whole. My main immediate interest is unblocking the stabilization of dyn upcasts. We have plenty of accrued debt in the form of validity/safety invariants that have not been finalized. |
Removing nomination as this is in FCP and was discussed at prior lang team meetings. |
I am wondering if this solution was not dismissed too early. For reference, C++, another zero-cost abstraction language, does null check in Having a working |
And what should dyn calls (of safe functions) do when the vtable is null? If we want to say "UB" we'd have to make them unsafe somehow; otherwise the only option I can think of is to make them panic and that sounds bad.
|
The final comment period, with a disposition to merge, as per the review above, is now complete. As the automated representative of the governance process, I would like to thank the author for their work and everyone else who contributed. This will be merged soon. |
OK, so this FCP is complete. The question is, what do we do now to close the issue? :) Option A. Edit the reference or perhaps necronomicon. Option B. Edit API documentation -- somewhere? I'm not sure if we have an "API page" for "dyn types" in general. Option C. Open an issue on unsafe-code-guidelines repository and link to this? @RalfJung or @JakobDegen -- what do people typically do? Have we stabilized decisions about validity/safety conditions before and how did we document them? The other question was whether to make this a 'strong recommendation' for an opsem-like team, which I would still be ok with. I think the crucial ingredients are...
This leads you to a few options, and the one we ultimately chose was based on Ralf's recommendation beacuse it led to more consistency, so I'd be ok with saying "the opsem team will define the precise conditions, this is our recommendation", though I suppose that's a weakening of what we just FCP'd. :) |
We have #98919 but that doesn't really help here. https://doc.rust-lang.org/reference/behavior-considered-undefined.html is where we have all validity invariants, so that would be a Reference PR. We don't really have any place to put safety invariants though. Even specifying safety invariants precisely requires something like Iris so it's not like we can even hope to do that anywhere official... |
…ckh726 Mark `trait_upcasting` feature no longer incomplete. This marks the `trait_upcasting` feature no longer incomplete since rust-lang#101336 has been settled for a little while. r? `@jackh726`
…ckh726 Mark `trait_upcasting` feature no longer incomplete. This marks the `trait_upcasting` feature no longer incomplete since rust-lang#101336 has been settled for a little while. r? ``@jackh726``
…ckh726 Mark `trait_upcasting` feature no longer incomplete. This marks the `trait_upcasting` feature no longer incomplete since rust-lang#101336 has been settled for a little while. r? ``@jackh726``
…ckh726 Mark `trait_upcasting` feature no longer incomplete. This marks the `trait_upcasting` feature no longer incomplete since rust-lang#101336 has been settled for a little while. r? ```@jackh726```
…ckh726 Mark `trait_upcasting` feature no longer incomplete. This marks the `trait_upcasting` feature no longer incomplete since rust-lang#101336 has been settled for a little while. r? ````@jackh726````
…ckh726 Mark `trait_upcasting` feature no longer incomplete. This marks the `trait_upcasting` feature no longer incomplete since rust-lang#101336 has been settled for a little while. r? `````@jackh726`````
…ckh726 Mark `trait_upcasting` feature no longer incomplete. This marks the `trait_upcasting` feature no longer incomplete since rust-lang#101336 has been settled for a little while. r? ``````@jackh726``````
FWIW #117534 is documenting the safety invariant of We probably want a fixed term. However I am not sure if "safety invariant" is the best term -- we've been using it for a long time now, but I regret coining it; "library invariant" would be my preferred choice these days. (And "validity invariant" -> "language invariant". That goes well with "language UB" and "library UB". Though maybe those are not terms we want to use either?) Also I am not sure where to best define such a term. That would have to go together with an introduction to how we think about unsafe code wrapped behind safe abstractions and things like that. |
Closing as FCP completed. cc #134367 |
This issue proposes a resolution to the last outstanding question blocking
dyn
upcast. It is also available on the initiative repository.Background
We are trying to enable "upcasts" from a
dyn Trait
to its supertraits:The key factor for these upcasts is that they require adjusting the vtable. The current implementation strategy is that the vtables for the
Bar
trait embed pointers to aFoo
vtable within them:This way, given a
&dyn Bar
object, we convert itsBar
vtable to the appropriateFoo
vtable by loading the appropriate field.Although we don't want to commit to a particular implementation strategy, we do want to leave room for this strategy. One implication is that performing an upcast may require loading from the vtable, which implies that the vtable must be a valid pointer to an actual Rust vtable. Although
&dyn Bar
references would always contain a valid vtable, the same is not necessarily true for a raw pointer like*const dyn Bar
or*mut dyn Bar
.In the language today, we only support "noop upcasts" that don't affect the vtable, and these are safe (e.g., converting from
*const dyn Foo + Send
to*const dyn Foo
). If we extend the set of upcasts to permit vtable-adjusting upcasts, like*const dyn Bar
to*const dyn Foo
, this implies that, for safe code at least, all*const dyn Trait
values must have a valid vtable, so that we know we can safely load the required field and perform the upcast.On the other hand, we do not generally require raw
*mut T
pointers to point to valid data. In fact, we explicitly permit them to have any value, including null, and only require that they point to valid data when they are dereferenced. Because dereferencing a raw pointer is an unsafe operation, it has always been considered safe to expose an arbitrary raw pointer to unsafe code -- the unsafety arises when you take a raw pointer from an unknown source and dereference it, since unless you can trace the origin of that pointer you can't possible guarantee that it is valid to dereference.This brings us to the conflict:
*const dyn
values (e.g.,*const dyn Foo + Send
to*const dyn Foo
).*const dyn Foo + Bar
to*const dyn Foo
) that would require adjusting the vtable much like the upcasts currently being stabilized.Related future consideration: virtual method calls on raw pointers
There have been requests to extend traits with the option to include raw pointer methods:
These methods would be useful when writing unsafe code because having an
&self
method requires satisfying the validity conditions of an&
-reference, which may not be possible. If we did have such methods, however, it raises the question of whether it would be safe to invokeis_null
on a*const dyn PtrLike
reference. Just as with upcasting, invoking a method from the vtable requires loading from the vtable, and hence requires a valid vtable generated by the compiler.The solution we propose in this document also resolves this future dilemma.
Definitions: validity vs safety invariant
We adopt the terms validity and safety invariant from the unsafe code guidelines:
In short, the validity invariant defines a condition that must always be true, even in unsafe code, and the safety invariant defines an invariant that unsafe code must guarantee before a value can be released to be used by arbitrary code.
Contours of the solution space
We can fix this conflict in one of two basic ways:
First, we could make vtable-adjusting upcasts casts (and
*Self
method calls) unsafe. This is difficult to implement and would require changes to theCoerce
trait, which is already excessively complicated. In exchange, it offers at best marginal benefit: raw*dyn
pointers can be released to safe code, but safe code can't do anything interesting with them. For this reason, we do not recommend this option.If vtable-adjusting casts (and
*Self
method calls) are safe, then the safety invariant for*dyn
types must be that their metadata points to a fully valid vtable (i.e., a vtable created by the compiler). This ensures safe code can perform upcasts or dynamic dispatch. This also implies thatstd::ptr::null
(which is safe) cannot be extended toT
whereT: ?Sized
unless further changes are made, since we would need to provide a valid vtable (it would be possible to permit a sentinel value, like null, to be used, but that would imply that upcasting must have a branch, making it less efficient).There are, however, various options for the validity invariant, ranging from no invariant to requiring a fully valid vtable at all times. The strict invariants offer some benefits, such as the ability to have a niche for
*dyn
pointers. We survey the options here:*dyn
metadatastd::mem::zeroed
Explanations for the column titles:
*dyn
metadata -- describes the invariant that applies to the metadata for a*dyn
value.sizeof(Option<*const dyn Foo>) == sizeof(*const dyn Foo)
.std::mem::zeroed
-- true ifstd::mem::zeroed
can be used to create a valid value (very convenient). This makes it trivial to innitialize a*const dyn
with a dummy value, though the value cannot be released to safe code.*const dyn Foo
that satisfies the validity invariant, no matter the traitFoo
. This makes it easy to initialize a*const dyn
with a dummy value, though the value cannot be released to safe code.Other points:
*dyn
values currently have a niche.fn
and&
-references) are expected to have a validity invariant of word-aligned, non-null.Proposal
The proposal is as follows:
*dyn
metadata to be word-aligned and non-null.Vtable-adjusting upcasts are defined as:
dyn Bar
todyn Foo
from the introduction). In particular, upcasts that simply add or remove auto-traits are not vtable-adjusting (e.g.,dyn Debug + Send
todyn Debug
).This approach...
*const dyn
;fn
;The rules also imply that...
*dyn
pointer when*dyn
pointer is upcast (or invoke methods);*dyn
pointer is released to arbitrary code, because that code may upcast (or invoke methods).std::ptr::null
to permitT: ?Sized
would not be safe.Possible future changes
It may be possible to weaken the validity or safety invariants later, but we risk finding that people have written unsafe code that relies on them. For example, people could build data structures using unions that assume that the vtable pointer is non-null. If we then later permitted a null value, this code could create UB. This is particularly problematic for changes to the safety invariant, since the assumption is that one can take a value which meets the safety invariant and give it to arbitrary code without creating UB (if the value only meets the validity invariant, and not the safety invariant, then you are supposed to audit and control all the code which uses that value, so this is less of a problem).
Prior links
The text was updated successfully, but these errors were encountered: