-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
RFC: Never patterns #3719
base: master
Are you sure you want to change the base?
RFC: Never patterns #3719
Conversation
2011585
to
a71b873
Compare
a71b873
to
c489261
Compare
it might be a good idea to instead require an empty block (or maybe |
# Motivation | ||
[motivation]: #motivation | ||
|
||
Rust's unsafe semantics are access-based: the validity of data only matters when reading from/writing to a place. When pattern-matching, patterns only access the data (discriminants/values) they need to choose the right arm. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Rust's unsafe semantics are actually mostly not access-based. Constructing an invalid value is UB.
Only behind raw pointers and inside union fields do we use access-based semantics. So the text here is quite misleading as-is.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Niko's blog post from year 2018, linked in the RFC text, says that
While the details of our model around unsafe code are still being worked out (in part by this post!), there is a general consensus that we want an “access-based” model
...mentioning about &!
being possibly a valid value. Just to clarify, I presume this is now stale information and having a type &!
is considered UB? Also, the same article presents following code:
unsafe {
let x: Uninit<(u32, T)> = Uninit { uninit: () };
x.value.0 = 22; // initialize first part of the tuple
...
match x.value {
(v, _) => {
// access only first part of the tuple
}
}
...
}
Is that accepted by modern standards? Seems a bit sketchy that the union field is already accessed in the match expression, so the fields are not matched against inside the match. By extension, one should be able to do the following, which, by modern standards, is surely UB, right? (To be sure, the RFC presents code that matches against the union fields inside the match.)
unsafe {
let x: Uninit<(u32, T)> = Uninit { uninit: () };
x.value.0 = 22; // initialize first part of the tuple
...
let tmp = x.value; // move from x.value!
match tmp {
(v, _) => {
// access only first part of the tuple
}
}
...
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A match expression works on places, not values. So the two examples you show are not equivalent! The first never turns x.value
into a value, it only ever accesses its first field.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But values don't exist in memory, they only exist while reading or writing, right? And matches don't access compound values wholesale, they only inspect memory piecewise. In particular I can mess up some bytes of a place and still match on it if the match only inspects the non-messed-up bytes.
Is that not "access-based"? Am I incorrect in the above understanding?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In an access-based semantics, it is generally expected that e.g.
let b: bool = transmute(13u8);
is fine if we never "access" b
. So I think it is misleading to call Rust's semantics access-based.
But what you say about places is correct.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see. Is there a term of art for the particular flavor of "validity only matters when reading/writing" that rust has?
Otherwise I don't can do without this general statement. Would this be ok: "The semantics of pattern-matching around unsafe values are access-based: patterns only access the data (discriminants/values) they need to choose the right arm. The accessed data must be valid, but the rest may be invalid or uninitialized"?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not aware of a specific term. I would summarize it as "enforcing language invariants on every typed copy".
"The semantics of pattern-matching around unsafe values are access-based: patterns only access the data (discriminants/values) they need to choose the right arm. The accessed data must be valid, but the rest may be invalid or uninitialized"
"unsafe value" isn't a standard term. I've used "unsafe places" in the past but it is not standard either; the RFC should define it if it wants to use it.
I would say something like: Rust's pattern match semantics are access-based. Only the parts of the underlying value representation that actually matter for the pattern have to be valid.
} | ||
``` | ||
|
||
Here the absence of an arm plays the role of a "read discriminant" kind of operation. This is fine in this case, but around unsafe code that interacts with possibly-uninitialized data, accesses should be explicitly visible in the pattern. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here the absence of an arm plays the role of a "read discriminant" kind of operation. This is fine in this case, but around unsafe code that interacts with possibly-uninitialized data, accesses should be explicitly visible in the pattern. | |
Here the absence of an arm plays the role of a "read discriminant" kind of operation. | |
(If, instead, there was an arm with pattern `_`, then there would be no "read discriminant" operation.) | |
This is fine in this case, but around unsafe code that interacts with possibly-uninitialized data, accesses should be explicitly visible in the pattern. |
_ => unreachable_unchecked(), | ||
// or | ||
(_, x) => match x {} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
_ => unreachable_unchecked(), | |
// or | |
(_, x) => match x {} | |
(_, x) => match x {} |
The first example seems odd and orthogonal to the RFC.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe _ => unreachable!()
is better? That's just what people commonly write in this case
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It may be what people commonly write, but the intent of this PR is better described by match x {}
so IMO we should have only that.
|
||
Here the absence of an arm plays the role of a "read discriminant" kind of operation. This is fine in this case, but around unsafe code that interacts with possibly-uninitialized data, accesses should be explicitly visible in the pattern. | ||
|
||
Today, when matching empty types inside places that may contain uninitialized data, rust requires a dummy match arm: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Today, when matching empty types inside places that may contain uninitialized data, rust requires a dummy match arm: | |
Today, when matching empty types inside places that may contain uninitialized data, rust requires a match arm that explicitly reads the empty type: |
} | ||
impl<T: Copy> MyUnion<T> { | ||
unsafe fn assume_init(self) -> ! { | ||
match self { MyUnion { value: (_, !) } } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
match self { MyUnion { value: (_, !) } } | |
match self { MyUnion { value: ! } } |
# Guide-level explanation | ||
[guide-level-explanation]: #guide-level-explanation | ||
|
||
Patterns can be used on a partially uninitialized place; the basic rule is that a pattern only accesses data that is directly mentioned in the pattern: an `Ok(..)` pattern requires a discriminant read, a binding requires a full read, a wildcard `_` accesses nothing etc. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Patterns can be used on a partially uninitialized place; the basic rule is that a pattern only accesses data that is directly mentioned in the pattern: an `Ok(..)` pattern requires a discriminant read, a binding requires a full read, a wildcard `_` accesses nothing etc. | |
Patterns can be used on a partially uninitialized place; the basic rule is that a pattern only accesses data that is directly mentioned in the pattern: an `Ok(..)` pattern requires a discriminant read, a by-value binding requires a full read, a wildcard `_` accesses nothing etc. |
There are also ref
bindings, after all...
# Reference-level explanation | ||
[reference-level-explanation]: #reference-level-explanation | ||
|
||
We add `!` to the syntax of patterns. A `!` pattern is accepted for any type that is visibly uninhabited (i.e. uninhabited taking into account private fields and `#[non_exhaustive]` annotations) from the item containing the match expression. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are many possible definitions of "uninhabited" so it'd be good to spell this out. E.g., is &!
uninhabited?
What I would expect is that !
is allowed on all types whose library invariant ("safety invariant") cannot be satisfied (it is an explicit operation after all), so we can treat &!
as uninhabited.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh yeah, I had not considered the safety version of uninhabitedness. This does feel like the appropriate notion to use, I'll change that
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just to clarify, if we use this wider definition of "uninhabited" for when the !
pattern can apply (rather than a definition such as that of "empty" in rust-lang/rust#119612 (comment)), does this mean the following would be acceptable?
Such syntax was used in this comment from Ralf so I assume this might be the intention?
fn abc(value: Option<&!>) {
match value {
None => {},
!
}
}
I rather like this, but just to flag that it is a change of behaviour from:
- The current
never_patterns
feature which would require a more explicitSome(!)
pattern (playground link) - The syntax in the auto-never blog post which I believe would have required a
Some(&!)
pattern to draw attention that it accesses the&!
(and so that it asserts validity - if we replace references with pointers)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
More generally, would the !
pattern be able to match multiple uninhabited types in a wildcard manner?
To create an explicit example, would the following compile, on the justification that:
MultiNever::A(&MultiNever::B(Empty))
,MultiNever::A(&MultiNever::C(Empty))
,MultiNever::B(Empty)
andMultiNever::C(Empty)
are all uninhabited based on the above rules.- We may allow
!
to wildcard over all uninhabited types.
enum Empty {}
enum MultiNever<T> {
A(T),
B(Empty),
C(Empty),
}
fn abc(value: MultiNever<&MultiNever<u32>>) -> u32 {
match value {
MultiNever::A(MultiNever::A(value)) => *value,
!
}
}
If this does compile, I'm not sure if this necessarily fits with the motivation that "accesses should be explicitly visible in the pattern." - but possibly I've got the wrong end of the stick?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just revisiting my musing in the last comment, I missed that the motivation in the RFC actually says more fully that "around unsafe code that interacts with possibly-uninitialized data, accesses should be explicitly visible in the pattern" - which seems right to me. The motivation only cares about accesses which could cause UB being explicitly visible in the pattern, or slightly more loosely that "accesses which could cause UB without breaking a safety invariant should be explicit".
And so Ralf's justification is that, references have a safety invariant to point at valid data, and as such have a recursively-safe safety invariant. Therefore it should be okay to effectively view a never pattern as:
A never pattern acts as an explicit compile-time assertion that all unmatched types have an unsatisfiable safety invariant; and also as an explicit compile-time assertion that all those types are invalid at this position.
By contrast, e.g. pointers or union fields do not have such a safety invariant, so would not satisfy such a recursive safety-based definition of uninhabitedness, and would need to be handled explicitly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I prefer that the user write Some(!) explicitly because I find it more explicit and easier to specify/implement. This is open for discussion of course, that's the aim of the RFC.
Good point! Well, in the spirit of discussion, let me propose for debate the "wildcard !" or more precisely the "safety-uninhabitable binding" definition of the never pattern. Such a pattern would:
Apply against all unmatched branches (similarly to
_
or a variable bindinga
), and, both:
- Assert (on pain of a compile error) that the "leaf type" of each of these branches is safety-uninhabited. (Where by leaf type I'm being quite wooly, I mean the type you'd get if you re-interpret the leaf enum variant of the branch as a struct type)
- Assert (on pain of UB) that all such branches are not reachable (i.e. all such leaf types are validity-uninhabited in this position)
If this is well defined (and, to be fair, that's a big if), it appears to me that it's a strictly more powerful extension to this proposal - in the sense that it is allowed in a superset of places.
I was also wondering it it were too powerful, but I think on balance that it's likely OK. A !
at the root of a nested match is a declaration that "all unspecified branches are not possible (on pain of UB)".
It has benefits that it can mostly solve the ergonomics issues in this issue without auto-never, by only requiring a user to add a !
to their matches. Macros could do this in a blanket fashion, and pair this with something like an #[allow(unneeded_never_pattern)]
.
Users in unsafe land could be more careful, and use !
more explicitly, in place of individual types.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So just to clarify, I believe that under this RFC, the first 3 compile, but the fourth one does not:
correct!
If this is well defined (and, to be fair, that's a big if), it appears to me that it's a strictly more powerful extension to this proposal - in the sense that it is allowed in a superset of places.
I believe this is well-defined, and agree that this would be allowed in strictly more places than the RFC currently proposes.
It has benefits that it can mostly solve the ergonomics issues in rust-lang/rust#131452 without auto-never
If that's the main motivation I'd much rather have auto-never, because I'd much rather avoid safe users needing to add !
patterns in some matches and not others for hard-to-figure-out reasons (namely the presence of &
on the path to an empty type).
I tried to separate this RFC from auto-never discussions but maybe they're too entangled for that 🤔.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If that's the main motivation I'd much rather have auto-never, because I'd much rather avoid safe users needing to add ! patterns in some matches and not others for hard-to-figure-out reasons (namely the presence of & on the path to an empty type).
I agree that we need an auto-never equivalent... As you've said in the comment below, maybe we need to explore that first.
But I think a more flexible never pattern could theoretically be beneficial to people writing unsafe too. For example, in these quite hypothetical scenarios:
- Maybe in their current circumstance they know that all branches that are unsafe are invalid, and want to avoid writing out lots of distinct branches. (e.g. they're using something like an unsafe Trees That Grow pattern, if such a thing would even make sense... in which case, they could use
!
as a wildcard, like Ralf spelled out here. Although I'm not sure if that was intentional on Ralf's part, or accidental 🤷) - Or they know that some sub-enum should always be never, and want to assert that at compile time.
Whilst I'm naturally drawn to neat generalisations, I accept that the use case argument isn't super strong.
That said, (ignoring the implementation difficulty argument for now) I feel a good argument against a more general / flexible pattern should be motivated in why a restriction makes sense. For example:
- Do we feel that allowing wildcard
!
is a foot gun for a user to write the wrong thing? That they put!
as an easy fix because they don't understand the implications about it matching more variants than expected, or future new variants? (I'm not sure to what extent unsafe developers would benefit from such hand-holding) - Do we feel that allowing wildcard ! might be "less easy to understand" or less readable at code review time? Does a lone
!
look weird or eat up too much strangeness budget?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point, I didn't realize that we'd need more than just this RFC for that case.
My immediate gut feeling is that the conservative choice here is to accept less code. It's east to accept more code in the future, it's impossible to take back accepting wildcard !
if we add it now. So I'd be inclined to first require the !
pattern to be used in a position where the type is actually uninhabited, and possibly change the "automatic never pattern" rules for match
es on entirely safe places, and then revisit whether we really need wildcard !
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that's a good argument. Start conservative, without precluding expanding in future if it's still useful 👍
|
||
In terms of both semantics and exhaustiveness checking, a `!` pattern behaves like a binding. E.g. a `Some(!),` arm (absent match ergonomics) is semantically equivalent to `Some(x) => match x {}`. Indeed, they both indicate a load at a place of an uninhabited type, followed by an unreachable body. | ||
|
||
Never patterns interact with match ergonomics as expected: a `!` pattern is allowed on e.g. `&(T, !)`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would have expected &(T, !)
to be an uninhabited type (it is library-invariant-uninhabited, after all), so this sentence is odd.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we do go with the "safety invariant" version of inhabitedness then indeed this sentence isn't useful anymore. Will update when I get a moment.
# Rationale and alternatives | ||
[rationale-and-alternatives]: #rationale-and-alternatives | ||
|
||
Status quo alternative: we could do nothing, and consider that writing `Err(_) => unreachable!()` or `Err(x) => match x {}` is good enough. The drawback is more noticeable for or-patterns, e.g. `let (Ok(x) | Err(!)) = ...;` would have to become a `let .. else` or a full `match`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the first time the RFC mentions that let (Ok(x) | Err(!)) = ...
is even possible. If that is part of the RFC it should be spelled out above. I guess never patters are exempt from the usual rule that all variants of an or-pattern need to bind the same variables?
|
||
The only other language I (Nadrieril) am aware of that has the intersection of features that would make this necessary is Zig. They are in the process of clarifying the semantics of their empty types (https://github.com/ziglang/zig/issues/15909); they may or may not end up encountering the same explicitness problem as us. | ||
|
||
Ocaml and Adga have respectively [refutation cases](https://ocaml.org/manual/5.2/gadts-tutorial.html#s:gadt-refutation-cases) and [absurd patterns](https://agda.readthedocs.io/en/latest/language/function-definitions.html#absurd-patterns), both of which give a way to say "this pattern is impossible and I shouldn't have to write an arm for it". Their flavor of impossibility is related to types however, not runtime validity. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Their flavor of impossibility is related to types however, not runtime validity.
I don't understand this remark. Ours is also all about types?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hm yes this is badly phrased. I'm trying to express something like "in their case, the absurd pattern is purely a type-system hint, whereas in our case changing Err(_)
to Err(!)
can change the runtime semantics".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The key difference is that Adga doesn't have unsafe code and OCaml doesn't talk about the unsafe code it has.
} | ||
``` | ||
|
||
This RFC proposes a new `!` pattern that works as an explicit "access data" or "assert validity" operation for uninhabited types. The examples above become: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this line is the key point. It defines what the programmer means when they write !
.
However, in safe code (eg, when the scrutinee is a safe reference), this fact - "this is actually sane" - is something that should always be true and shouldn't need to be explained by the programmer.
ISTM that this is also a pedagogical problem. If this is needed in safe Rust, auithors of that safe Rust will need an explanation what this thing is for and what it means, but we definitley don't want to burden them thoughts about validity/safety invariants etc. Nor do we want to suiggest to them in discussion of match
that values might not be valid. But there is no way to explain this !
without supposing that &T
might not reference a proper T
. Authors of safe Rust are entitled to assume that &T
always does reference a valid T
.
Therefore, !
should not be required in safe Rust. What does that mean precisely? One possible answer:
!
is not required when dereferencing a safe reference, field, etc. (recursively)- Except, within an
unsafe
block, we emit a lint saying "something something uninhabited types validity something something requirement something"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we want to make syntax depend on whether something is syntactically inside or outside an unsafe block. That kind of context-dependence can become quite surprising.
So if we do anything like this, IMO it should be fully based on the type of the scrutinee place (including the types of its subexpressions, e.g. whether a union field projection is involved).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But just to check, would you consider a possible path forward looking something like:
- Never patterns match types with a certain property, independent of location (this property could be "safety-based uninhabitness").
- When creating a
match
statement on a type with a different property, the compiler inserts a never pattern at the bottom of the match, independent of location (this property could be, e.g. "doesn't contain pointers or union field projections or partially uninhabited tuples" - roughly as per the auto-never blog post)). - We add a lint for when the auto-never might cause issues - which could be dependent on location?
I think such a lint being location-sensitive would be justifiable in that it doesn't change syntax, but just adds extra checks? Or at least Niko was happy with that in 2018.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FWIW the path forward I intend to propose is simply to allow omitting arms for &!
and similar types (more generally, if the type is safely uninhabited). My understanding is that making unsafe references is a massive footgun anyway (because of e.g. auto(de)ref).
Unsafe code authors that do that anyway can write never patterns for such cases (they will not be marked as unreachable), and we may add a lint that requires them if users find this useful.
I have moved away from explaining things in terms of "auto-never" out of personal preference, but you could explain this by saying that we always auto-never except inside pointers and unions.
Never patterns match types with a certain property, independent of location
This is indeed precisely the intent of this here RFC.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Given ptr: *const &!
, I don't think match *ptr {}
should ever build. That should require match *ptr { ! }
.
IOW, this should depend on whether the matched-on place is a safe place or not.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Indeed, I include this in the exception that "we don't auto-never inside pointers". By "inside pointers" I mean in terms of places.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Given ptr: *const &!, I don't think match *ptr {} should ever build. That should require match *ptr { ! }.
IOW, this should depend on whether the matched-on place is a safe place or not.
Okay, understood - so in your view auto-never behaviour (if/when we have it or an equivalent) would need to vary be place, and shouldn't just be a static property of the matched type.
But never patterns, as syntax, should be place invariant. This makes sense.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If I may now stop discussion of auto-never and similar considerations: I'd like to keep the discussions on-topic of the never pattern syntax proposal, and discuss changes to which arms can be omitted etc to some other place. For example rust-lang/rust#131452.
# Unresolved questions | ||
[unresolved-questions]: #unresolved-questions | ||
|
||
- Does this cause problems with macros-by-example? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. It also causes problems for derive macros. As I write,
[It] sets macro authors up for failure. Every macro-generated match statement needs to have a spurious ! arm, or the macro will fail to compile, sometimes, with an empty enum.
This isn't just a problem for macro_rules macros. Derive macros can suffer from it too. (Users of derive-deftly or other approaches for making derive macros more easily are quite likely to run into this.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are two separate issues here: rust-lang/rust#131452 is entirely independent from this RFC, since this RFC is purely about adding a new pattern syntax (and not about changing anything else in the language). The other issue with macros is that a !
pattern doesn't take an arm; this is the point that's relevant to this RFC. That one I don't know how bad it is.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The other issue with macros is that a ! pattern doesn't take an arm; this is the point that's relevant to this RFC. That one I don't know how bad it is.
I think it would be a breaking change for libraries like syn, in the sense that all procedural macro crates would need to update to the latest syn
with a new AST to handle match patterns without arms - and this may require a major version bump of syn
...
Given that syn
is only on major version 2, it looks like such changes aren't very frequent - but maybe this isn't that much of a problem in practice.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's a good point, they'd have to majorly change their Arm
struct :/
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is already causing problems like:
fn h(a: Option<&!>) -> bool {
matches!(a, Some(!))
// error: a never pattern is always unreachable
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
syn
has to have some way of dealing with the Rust syntax evolving, I don't think its API choices should dictate our grammar.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
matches!(a, Some(!))
is kind of silly so arguably we should at the very least lint against it. Why not just write false
instead?
|
||
Never patterns interact with match ergonomics as expected: a `!` pattern is allowed on e.g. `&(T, !)`. | ||
|
||
A never pattern may be linted as "unreachable" if there is no well-behaved execution that can reach it. For example, if it was guaranteed that reading the discriminant of `Result<T, !>` could never return `Err`, the `Err(!)` pattern would be linted as unreachable. Never patterns are not otherwise linted as unreachable even when removing them would be accepted. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we view this as necessary? Under the psuedo-definition of ! in this comment, a never pattern against 0 types is trivially okay.
Some benefits of permitting an uneeded !
:
- It would be a win for macros (as per this comment), they could always include it without punishment.
- It would make auto-never a trivial sugaring where we choose to accept it (just stick
!
at the bottom of every match, which effectively operates as both a check that all remaining unmatched types have unsatisfiable safety invariants, and an assertion that they are all invalid)
A drawback could be that in this case, if we see !
, we can't assume that there exists at least 1 leftover variant/type in the match with an unsatisfiable safety invariant. But I'm not sure to what extent that is actually useful information, worthy of linting for?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is purely a convenient lint for users and helps learnability, exactly like today's "unreachable arm" lint. Macros can (and often do) silence lints to get more flexibility, so there'd be no win here. Similarly, the auto-never desugaring you have in mind would work the same; we wouldn't emit a lint there since that would make no sense.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That makes sense.
My comment was mostly applying to a world where we had wildcard !
... Unfortunately without wildcard !
, it's not valid to place a !
next to other variants, so a macro which outputs a match &self { /* ... */ }
still needs to handle/branch on there being 0 variants, to decide whether to insert a !
or not. This is rather unergonomic - particularly in declarative macros.
All that said, I think it's fine for this RFC to not cover that, and to need to wait for something equivalent to auto-never for this to be a "nice" story for macro writers.
# Drawbacks | ||
[drawbacks]: #drawbacks | ||
|
||
The main drawback is that this new pattern is unlike other patterns (e.g. it doesn't require an arm body) which adds some maintenance burden. It's otherwise a simple feature. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How bad would it be to this RFC if the "feature" of skipping the arm body be removed 🤷 Save some major changes to the syntax that messes with macros and syn
etc.
Just type-check that the arm body always has type const !
to replace that "a never pattern is always unreachable" error.
match x {
// note: using `const { unreachable!() }` instead of only `unreachable!()`
// to tell the compile actually enforce that the arm is never invoked
Some(!) => const { unreachable!() },
None => "ok",
}
TBH it is very strange that only the match
syntax has this omission feature. You can't do it in an if let
for instance. Given how #![feature(never_patterns)]
works today you need to write:
if let Some(!) = x {
// you still need a body here! otherwise you get "E0308 mismatched types"!
const { unreachable!() }
} else {
"ok"
}
(I don't think the "empty block" proposal in #3719 (comment) works because an empty block does not indicate the block is actually unreachable. The !
expression is cute but I think it may conflict with unary-not !(false)
.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The other syntax that has an equivalent is or-patterns: Ok(x) | Err(!)
is a valid pattern because the never case "doesn't count" for the purpose of ensuring that all alternatives have the same binding. Otherwise you're right, match arms are special.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we don't want the "bodiless arm" syntax, I'd propose that we allow this:
match x {
Some(!) => {}
None => "ok",
}
this would consider the whole arm body to diverge for the purposes of type-checking, similar to how we typecheck => { return; () }
or => { panic!(); () }
. In fact I already implemented something like this for function arguments:
enum Void {}
fn foo(!: Void) -> u32 {} // typechecks because the body diverges
There's one case where the "bodiless arm" is particularly nice though: the empty match. Today match *foo {}
is allowed for foo: *const !
, which violates the intent that matching on empty types inside pointers should be explicit. My intent after this RFC is to deprecate this case and eventually force users to write match *foo { ! }
instead. This is less nice if they have to write match *foo { ! => {} }
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The specialty of Ok(x) | Err(!)
manifests only after parsing which you have built the AST which then you can check for "E0408 variable is not bound in all patterns". The impact is not at the same level as the proposed syntax match expr { pat(!) }
that changes the grammar and also the resulting AST/HIR structure.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The
!
expression is cute but it think it may conflict with unary-not!(false)
.
how so? if you're trying to have a !
expression, you can just say it can't be followed by a function call or other ambiguous tokens, if you really want to call it, you'd have to write (!)(false)
(not that that would type check...)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
for this rather obscure feature is not a place we should spend the weirdness budget on.
ok, since {}
can be made to type check, I think using ! => {}
as the match arm is good enough.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think the need for never-patterns is common enough to justify changing the match syntax just to save that
=> {}
😅
Fair. I was hoping we could allow it just for this special case but idk if that's worth it.
I suppose that means the if let example not working for {} is a bug?
Yep
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My slight concern with a syntax like ! => {}
is that it reads to me like "types which match ! are valid in this position, please ignore them and return ()
" rather than "types matching !
are unreachable/UB".
I haven't thought too much about this, but would something like ! => !
type check? (if we temporarily ignore a dependency on the never type).
And would it possible to reject alternative arms somehow at compile time? e.g. require ! => !
or ! => { ! }
, but prevent e.g. ! => { x = x + 1; panic!("Side effect!"); }
which I would assume also has a type of !
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@dhedey => !
can certainly type-check but then we need to introduce the new "!
" expression in additional to the pattern and type.
but prevent e.g. ! => { x = x + 1; panic!("Side effect!"); } which I would assume also has a type of !?
A non-empty block (or non-!
value) should raise the unreachable_code lint like now:
// warning: unreachable statement
fn f(ptr: &mut u8, !: &!) {
// - any code following a never pattern is unreachable
*ptr += 1;
// ^^^^^^^^^^ unreachable statement
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we have to have an arm, IMO => !
makes most sense. It does still look quite funny, though...
EDIT: Oh, and it was already said above that it causes some parser issues. Hm.
I don't think I agree that these are separate. The RFC says that My own concern is primarily the possibility (which afaict is not explicitly excluded by this RFC text) that people writing safe code might be required to write Avoiding that means we need a distinction between safe and unsafe places. That distinction is also the thing that solves #131452 (which is about safe code). With that distinction this RFC will become clearer. It will also be more obvious what the alternatives are to never patterns (generally, I think, additional explicit dereferencing of the unsafe place?) In any case, before we have fixed #131452, the handling of So I think we must deal with #131452 first. |
I was trying to keep the conversations separate to make discussion easier, but it's becoming clear that this is not helping x) I'll go ahead with my plan for rust-lang/rust#131452 and come back to this afterwards. |
The most confusing bit of this RFC for me is exactly how it interacts with safety. Specifically, it doesn't clarify when the need would arise for I get that the goal here is to just provide a syntax, but there are too many unanswered questions about the motivation to really feel like the syntax is justified. As folks have mentioned, the lack of an arm means that this now has to be covered separately in macros, although I think that Additionally, would you be expected to mix uninhabited patterns with inhabited patterns? What does this mean? How does this play into where the unsafe block is located? Doesn't this mean that the unsafe block is now placed farther away from the unsafety, not localised to the specific unsafe bit (the never pattern) but the entire match block? I just don't really see why adding a specific never pattern is a good solution to any problem, and I also don't understand what problems require them. |
A never pattern is not an unsafe operation, since you can't cause UB by using it on safe values. It's exactly as safe as the The part that requires unsafe in these examples is the match scrutinee. We can't write
I don't know what you mean. You use patterns as usual, depending on the type you want to match on. It's just that for empty types there's now a new pattern you can use. You can mix and match as usual.
To be fair this is a rather subtle point of rust operational semantics. To rephrase the "motivation" section in the RFC: it solves the problem of having an explicit way to assert validity of data at a place of uninhabited type using a pattern. |
I guess I should have clarified on the mixing: specifically, imagine the case like this: match x {
Ok(x) => /* ... */,
Err(!),
} and this: match x {
Err(!),
Ok(x) => /* ... */,
} And you can imagine more than two arms for more complicated examples. If we allow more than one never pattern, then our macro-matching situation gets more complicated, and you have to explicitly use tt-matchers to handle both cases, which can get a lot more complicated than just having multiple arm matchers. However, I can see value in allowing multiple never patterns separated by commas, compared to one never pattern separated by pipes, for convenience and to allow sorting patterns better. Like, that is a rather sizeable complication to the syntax which needs to be explained, and should probably have style guidelines too. Adding an additional example for clarity: match x {
Ok(None) => /* ... */,
Ok(Some(!)),
Err(None) => /* ... */,
Err(Some(!)),
} Note that since it's not just all inhabited patterns followed by an optional never pattern, and instead perfectly allowed mixing, you can't use a single-rule matcher for macros any more: it has to be a tt-matcher with separate rules for the different arm cases. |
Adding a relevant bit here after discussing a bit in rust-lang/rust#131452: I really don't think that motivation is properly described in this RFC, at all. I understand the desire to remain brief, but there are a lot of very, very subtle details that do not feel clarified at all in the RFC. Even the "guide-level" explanation, which is supposed to be for people who don't understand these subtle details, uses very specific terminology like "place" that isn't properly described to the person reading. Like, my understanding is that the opsem team would prefer to define pattern-matching such that reads happen on a per-arm basis, meaning that you can always look at the arms of a match to determine what reads are happening, and thus the points at which UB can occur if a reference were not well-formed. This avoids the need for a "discriminant read" operation, since you can rely on pattern-matching instead. While syntactically matches of uninhabited types would generally be allowed to be left empty, with never patterns being added "automatically" by the compiler, semantically, the idea is that they would still be there, and still able to cause accesses to memory per the memory model. To help ensure that people don't get caught up by these "automatically added" arms, people can opt into using these patterns, either by denying a lint that checks for them or via a separate lint which will only fire when they are omitted in the presence of unsafe code, to be worked out later. This RFC is attempting to sidestep the operational semantics and simply propose a new syntax but I, genuinely do not think that's a good idea. Because, well, the semantics do matter: Like, now that I actually understand the motivation for this change, I really do get it. I just don't really know if the proposed solution is the correct one: since the semantics aren't really negotiable, that means either we add an explicit syntax for never patterns and recommend that they sometimes be used in unsafe code, or, we keep the semantics for these patterns being added implicitly and don't add any special syntax for them. I'm not going to argue whether the syntax is worth the downsides it brings, but I do think that the semantic side plays a lot into making that justification, and I don't think it's adequately stated here. |
Summary
A
!
pattern indicates a type with no valid values. It is used to indicate an impossible case when matching an empty type in unsafe code.Note: this RFC is purely about a new pattern syntax. It does not propose changes to exhaustiveness checking or to the operational semantics of uninhabited types.
Rendered