Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Extensible enums #757

Closed

Conversation

nikomatsakis
Copy link
Contributor

A mechanism for declaring an enum to be extensible, which prevents exhausting matching in downstream crates.

Rendered text.

@sfackler
Copy link
Member

👍

@emberian
Copy link
Member

Easy, solves a real problem, I've wanted this before. 👍

@Kimundi
Copy link
Member

Kimundi commented Jan 28, 2015

It seems weird to have pattern matching behave differently depending on whether the enum is external to the crate or not - afaik apart from coherence we don't have such behavior anywhere else, and it would hinder refactoring of code that moves an enum into its own crate.

So I'd be for consistently treating such an enum as non-exhaustivly matchable even in the crate that defines it.

@seanmonstar
Copy link
Contributor

:+9000:

@DiamondLovesYou
Copy link

Ditto @seanmonstar

@reem
Copy link

reem commented Jan 28, 2015

Sounds great.

I wonder if this could be extended to allow an enum which can be extended by its users, i.e. they can add variants to an existing enum. I'm unsure how this would work in terms of representation, but it would be really useful for generic error types, where the only real solution today is using Any and downcasting.

# Detailed design

Enum grammar is extended to permit a list of variants to be terminated
with `..`. An enum declared with `..` is considered *extensible*.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Excuse the bikeshedding, but I'd like to propose ... rather than .. because a) .. is much more heavily overloaded in Rust already than ... and b) ... actually means 'elided things' in English writing.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess .. is consistent with elided fields in struct patterns (fwiw, I don't like that either, especially now we have .. for ranges).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 for ...
On Jan 29, 2015 7:50 AM, "Nick Cameron" notifications@github.com wrote:

In text/0000-extensible-enums-and-structs.md
#757 (comment):

  • _ => ...,
    +}
    +```

+Due to the presence of this wildcard, you can safely add variants
+without fear of breaking source compatibility with downstream clients.
+
+Note that extensibility is ignored within the current
+crate. Therefore, your library may internally write a match that
+exhaustively covers all sources of error. You don't need to worry
+about source compatibility with yourself, after all.
+
+# Detailed design
+
+Enum grammar is extended to permit a list of variants to be terminated
+with... An enum declared with.. is considered extensible.

Excuse the bikeshedding, but I'd like to propose ... rather than ..
because a) .. is much more heavily overloaded in Rust already than ...
and b) ... actually means 'elided things' in English writing.


Reply to this email directly or view it on GitHub
https://github.com/rust-lang/rfcs/pull/757/files#r23735568.

@gsingh93
Copy link

I don't completely understand. If a library adds more errors to an enum, I don't want my code to compile until I handle all of the errors, right? Isn't putting a catch all _ -> ... dangerous?

@flaper87
Copy link

👍 It doesn't sound invasive and it provides more flexibility to one of the types that's broadly used.

@kennytm
Copy link
Member

kennytm commented Jan 29, 2015

Maybe the downstream crate really wants to match every variant. Using _ => {} makes it easy to miss one match when updating code. Perhaps it should be something like .. => {}, and emits a warning if this .. arm is reachable.

@nikomatsakis
Copy link
Contributor Author

@kennytm hmm, that's an interesting thought. I do agree that it'd be nice to give downstream crates the option to try and exhaustively match everything, even if they must provide a backup.

@nikomatsakis
Copy link
Contributor Author

@reem I think that enums which are extensible by end-users is kind of a separate feature. That is typically the role of a trait object. There might be some overlap with virtual structs. Certainly, though, to support user-defined variants, we would need to guarantee that the base type is considered unsized so that it is only manipulated by pointer (which was a key ingredient in most of the virtual struct designs). Those designs however were typically targeted at non-extensible use cases, though it seems that there is demand for extensible cases as well.

@nikomatsakis
Copy link
Contributor Author

@gsingh93 there is certainly some tension between the desire for stability -- your code keeps compiling even with newer versions of my library -- and your desire for completeness. I think perhaps something like what @kennytm is suggesting offers a compromise path, but we'd need to find an acceptable syntax.

@nikomatsakis
Copy link
Contributor Author

(Using .. as a substitute for _ is clearly ambiguous, e.g. Foo(..) vs Foo(_) have distinct meanings today.)

@nikomatsakis
Copy link
Contributor Author

@nick29581 the reason I chose .. is just that it's what we use everywhere else (e.g., Foo(..) in patterns). I think using ... feels like a kind of random inconsistency. I'd rather restrict the use of ... to inclusive ranges of various kinds (or, alternatively, restrict the use .. to exclusive ranges, but that ship has sailed). Point is, I don't want to have to guess how many dots!

@nikomatsakis
Copy link
Contributor Author

Updated the RFC to include an alternative design (no exhaustive matching) and unresolved question (can we have a nice syntax for "wildcard that should never match").

Regarding the first point -- whether to draw a distinction between crate-local matches and external ones -- I am on the fence. I like the extra expressive power of being able to do exhaustive matches locally, but I am not sure if real-world use cases (like IoError) will ever take advantage of it.

@glaebhoerl
Copy link
Contributor

In ye olde draft RFC (under the more sophisticated variant) abstract enum types would have this behavior (though per-mod, not per-crate).

@nikomatsakis
Copy link
Contributor Author

@glaebhoerl yes, I considered per-module, but per-crate seems superior to me, since the enum is public and hence it seems likely that matches against that enum will occur outside the module where it was declared (moreover, per-crate is the largest scope you can get without affecting semver guarantees what-so-ever). There is probably a use case for one module within a crate "defending itself" against another adding variants, but it seems less important, and is counterbalanced by the fact that being able to write exhaustive matches is legitimately useful for maintaining overall consistency within a package (it happens somewhat regularly that I find bugs due to over-eager use of _).

@jfager
Copy link

jfager commented Jan 29, 2015

There's still source incompatibility when I forget to make an enum extensible and then break downstream when I realize I might need to add variants in the future, right? So people should always make enums extensible unless they're really really sure the current set of variants is exhaustive? What's the perception of how frequently one or the other is the case? Should 'extensible' be the default and something like 'sealed' get added instead?

Another alternative is to allow downstream consumers to explicitly protect themselves by not throwing an 'unreachable pattern' error for terminal _ match arms. This allows consumers who want to break when their match is no longer exhaustive to just not provide that catch-all.

@sfackler
Copy link
Member

@jfager It seems to me that most enums would probably end up "closed" or sealed or whatever it'd be called. For example, we're never going to want to add any more variants to Option, Result, Ordering, etc. The most common use case for extensible enums that I can see is for error handling like IoError or SqlState. Having to break backwards compatibility to "extensify" an enum seems like not that much of a problem in practice, since you're presumably doing it because you want to add more variants, so you can just fold the extensification change in with the addition of more variants.

@jfager
Copy link

jfager commented Jan 29, 2015

@sfackler Option, Result, Ordering, and many others in the std lib are basic, core enums that I don't think are representative of the sort of enums people usually create in their own code. I don't think that sealed enums would be rare by any stretch, but I think it's questionable whether they're more common than extensible ones:

  • Errors are going to get created in user code all the time.
  • Standards, like http status codes, which grow over time.
  • Games might decide to add new monster types or whatever.
  • Supported machine architectures, operating systems, browser types, etc.

To your point about just folding it into adding more variants: sure, but that's still part of a breaking change. The point being that there's no way to make an enum extensible w/o breaking downstream, so maybe the default needs to get thought through more.

@Diggsey
Copy link
Contributor

Diggsey commented Jan 29, 2015

@nikomatsakis Would having an arm such as this work:

_ => static_assert!(false)

When the _ case is unreachable it could disable asserts within it. I'm not sure if there's an equivalent macro for warnings.

@mahkoh
Copy link
Contributor

mahkoh commented Jan 29, 2015

As it's written right now this RFC turns a compile-time error into a potential run-time bug. -1 for this reason. Making it a warning doesn't solve anything. Downstream often does not know if a warning is serious or not and upstream will have to fix the warning for the same reason.

Any change that aims to make Rust more semver compatible should be part of a comprehensive solution that addresses all points in rust-lang/rust#17152.

@gsingh93
Copy link

My point is the same as @mahkoh's, which is why I'm still against this.

@gsingh93
Copy link

@Diggsey, if you did that it wouldn't compile, and you'd be right back where you started.

@codyps
Copy link

codyps commented Jan 29, 2015

While I'd really like extensibility, I'm not seeing how we can safely allow the crate defining the enum to have exhaustive matches:

If another crate B extends an enum in A, and then some code in B passes an instance of one of the new variants in that enum into the original defining crate A, A will need to have wild card matches.

@reem
Copy link

reem commented Jan 29, 2015

@jmesmon That's a different feature that is mostly unrelated - my bad for bringing it up, really. This feature only allows crate A to add new variants backwards compatibly.

@Manishearth
Copy link
Member

I implemented this as a plugin.

It's not perfect yet, but that can be fixed.

Of course, this has the obvious flaw that it only works if the user imports the plugin. Library authors can reregister the lint pass, however this still requires the library to be imported with #[plugin]

@ghost
Copy link

ghost commented Feb 5, 2015

I am in favour. I was a little hesitant about this feature being baked-in to the language but I do see the appeal.

I am not sure I agree we should be satisfied with how structs would (not) be affected by the same problem - I think for data types such as models in Web applications, the similar issue will prove itself to be as significant. Using a convention such as an extra private field is not something I would expect user code to follow easily.

@nielsle
Copy link

nielsle commented Feb 7, 2015

Perhaps you could introduce enum inheritance. That allows you to distinguish between Foo and ExtendedFoo, but I am not sure if it covers all use cases.

enum Foo {A, B}

// ExtendedFoo inherits from  Foo
fn bar<ExtendedFoo:  Foo>(x: ExtendedFoo) {

    match x {
        Foo::A => ...,
        Foo::B => ...,
        _ => ...,
    }
}

@bombless
Copy link

bombless commented Feb 7, 2015

I think there would be pretty strong pressure against making enums extensible without good
reason.

Agree with that.

-1 for enum inheritance, if you really need to check another type, just check it.

For example when I handle C language data structrues:

struct Struct(HashMap<String, (usize, usize)>);
struct Union(HashMap<String, (usize, usize)>);
enum Type { 
    Struct(Struct), 
    Union(Union), 
    Primitive(usize), 
    Pointer(Rc<Type>), 
    Unknown(TypeName) 
} 

This is a scenario where you think inheritance is needed at first glance. And if Rust allow enum inheritance I'll probably do that, which is wrong.

When you modify an enum's definition, almost everywhere it is used should be pointed out by compiler, this behavior is significant and should not be changed.

@pnkfelix
Copy link
Member

pnkfelix commented Feb 8, 2015

@jfager I don't understand: With your proposal, how would a library author force its clients to write the extra _ => ... match arm?

(I am not opposed to your proposal, but it does not seem to address the full problem here.)

@pnkfelix
Copy link
Member

pnkfelix commented Feb 8, 2015

@bombless FYI: you left some important context out of your quotation.

At least to me, the way you have written that quotation, it could be interpreted as "there will be strong pressure against the language change from this RFC without better motivation."

But the author of the quote is the author of the RFC, and the sentence preceding that quote was "Nobody is proposing removing exhausting matching."; I believe @nikomatsakis 's intention was to say that "there will be strong pressure against a library making one of its enum definitions extensible." I.e., you're only supposed to do it in scenarios like the one laid out in the RFC itself.

(I do not think @bombless was actively trying to distort niko's words; the quote just struck me as a difficult to understand, and so I am trying to fix that.)

@jfager
Copy link

jfager commented Feb 8, 2015

@pnkfelix In current Rust, an enum client can either exhaustively match the current set of variants and break when new variants are added, or partially match the current set of variants and not break when new variants are added. There is no way to exhaustively match the current set of variants and not break when new variants are added. I'm saying we should add that ability.

I'm further claiming that if we add that ability, the need for this rfc and its attendant complexity is reduced, as clients then have a way to make themselves source compatible with upstream changes if that's what they want. While it's true that doesn't give library authors a way to force clients to stay source compatible, I think that's fine, as it's completely reasonable for clients to want to break regardless of the library author's intent.

If some way for the library author to signal intent is still desired, I think an attribute and a lint as @mahkoh suggested would be preferable to a full-blown language feature, especially if that language feature is going to come with a bunch of extra baggage like different behavior around crate boundaries and new counter-features to continue allowing people to do exhaustive matching.

@nrc
Copy link
Member

nrc commented Feb 11, 2015

Having chewed this over in my sub-concious a little more, I'm thinking that an attribute + lint (as suggested by @markoh) would be a good approach, rather than adding new syntax. To make things more concrete: adding #[extensible] to an enum would allow match statements which match it to have a wildcard arm, even if the other arms are exhaustive. A lint, error by default, would check that all uses in a different crate have such an arm.

I like this because the extensible-ness doesn't affect the dynamic semantics of the enum and the only way it affects the static semantics (by forcing a wildcard arm) can't cause a soundness issues, i.e., if type checking is wrong, there still won't be an error. Together these signal to me that the feature should not have first class syntax.

I like a lint because it seems a reasonable position to take to value exhaustiveness more highly than avoiding future errors (although I believe the other preference is more reasonable), So, it seems justifiable to allow this switch to be flicked. I believe we can justify any future backwards compatibility issues by including error-by-default-lints as errors for the sake of compatibility.

@emberian
Copy link
Member

Making it a lint would make it near-useless for the stdlib unless
disabling this lint required a feature gate, because otherwise you'd be
able to subvert our stability guarantees.

One thing I've become increasingly disturbed about with this RFC is the
inability to exhaustively match against an extensible enum. I want the
ability to opt-in to instabilty for the sake of robustness.

On Wed, Feb 11, 2015, at 00:20, Nick Cameron wrote:

Having chewed this over in my sub-concious a little more, I'm thinking
that an attribute + lint (as suggested by @markoh[1]) would be a good
approach, rather than adding new syntax. To make things more concrete:
adding #[extensible] to an enum would allow match statements which
match it to have a wildcard arm, even if the other arms are
exhaustive. A lint, error by default, would check that all uses in a
different crate have such an arm.

I like this because the extensible-ness doesn't affect the dynamic
semantics of the enum and the only way it affects the static semantics
(by forcing a wildcard arm) can't cause a soundness issues, i.e., if
type checking is wrong, there still won't be an error. Together these
signal to me that the feature should not have first class syntax.

I like a lint because it seems a reasonable position to take to value
exhaustiveness more highly than avoiding future errors (although I
believe the other preference is more reasonable), So, it seems
justifiable to allow this switch to be flicked. I believe we can
justify any future backwards compatibility issues by including
error-by-default-lints as errors for the sake of compatibility.

— Reply to this email directly or view it on GitHub[2].

Links:

  1. https://github.com/markoh
  2. Extensible enums #757 (comment)

@sfackler
Copy link
Member

@cmr Stable Rust releases will ship with unstable features disabled explicitly to avoid allowing people to "opt into instability" because if you allow people to opt in, they'll do it. How is this case materially different?

@nrc
Copy link
Member

nrc commented Feb 11, 2015

@cmr, being able to subvert the stability guarantees doesn't seem like an issue to me - you can always do it if you are determined enough. The fact that you have to opt in, seems like its enough of a disclaimer to me.

I'm not sure I understand your second point, that seems to be asking for exactly the lint that is proposed.

@aturon
Copy link
Member

aturon commented Mar 5, 2015

Closing as postponed; RFCs issue

@aturon aturon closed this Mar 5, 2015
@aturon aturon added the postponed RFCs that have been postponed and may be revisited at a later time. label Mar 5, 2015
@nikomatsakis
Copy link
Contributor Author

@cmr we decided to postpone this for the time being, but let me leave just one quick note. I agree making this into a lint has the potential to make it near useless, but this is already a concern we should address. One of the changes I've been meaning to propose to address this particular point is that rustc should have a command line flag to override and cap all lints at "warn" status so that a lint can never block compilation. cargo can then use this for dependencies. This would ensure that while you get the full benefit of the lints in the current crate you are working on, you are not blocked from using other people's work. Note also that the way the lint was proposed, you would still be required to put a _ case, so there would be some defined fallback and behavior if additional variants are added, even if you also get a compilation warning.

UPDATE: Edited slightly for clarity.

@Virtlink
Copy link

Enums being non-extensible by default, with opt-in extensibility, is a good idea.

However, for structs it's the wrong way round. Adding an implementation detail such as a private field to an all public struct mustn't cause third-party code to break.

Instead, by default structs should be extensible and not allow public instantiation. Then adding a private field to a struct is never a breaking change.

Of course, there would be an opt-in that makes the struct non-extensible and allows public instantiation. This opt-in also requires all fields to be public. Adding a private field would cause a compile-time error.

One way to implement this: the compiler by default adds a zero-sized hidden dummy field to every struct. When the user opts-in, the dummy field is not added and all fields must be public.

@pnkfelix
Copy link
Member

@Virtlink would you also make destructuring bind and FRU illegal without the opt-in? See e.g.

rust-lang/rust#21407

Seems like a deal-breaker to me. But maybe I am missing something

An analogous #[extensible] attribute for (all pub) structs that enforces your rule via a lint (and lint-warns on destructuring bind and FRU too) seems like a way around that, and is is a non-breaking change to boot.

@bombless
Copy link

It's funny that in Rust we already can build a enum that always has potential to grow.

mod some_module {
    enum _Enum { A, B, _Dummy }
    pub type Enum = self::_Enum;
    pub use self::_Enum::{A, B};
}
fn to_string(e: some_module::Enum) -> String {
    match e {
        some_module::A => 'A'.to_string(),
        some_module::B => 'B'.to_string(),
        _ => "_not_supported".to_string()
    }
}

I think it's related to rust-lang/rust#22261 though.

@Virtlink
Copy link

@pnkfelix Indeed, function record updates outside the module would only be allowed with the opt-in, as public FRU would break as soon as a private field is added. If the author wants to add a private field to the struct later, then it wasn't his intention to allow FRU in the first place. Or if it was, removing the opt-in makes the author aware he's now making a breaking change.

Destructuring outside the module is still possible with a wildcard, as this won't break if a private field is added:

pub struct Foo { pub value: u32, }

// Somewhere outside the module:
let Foo { value: v, .. } = Foo::new();

Again, only with the opt-in has the author stated that no private fields will ever be added and is it safe to destructure without the wildcard.


In short my proposal is to treat every struct as if it contains private fields (even empty structs and all-public structs), unless explicitly stated (opt-in) that the struct will never contain private fields. Everything else (destructuring, FRU, instantiation) follows from that.

sgrif added a commit to diesel-rs/diesel that referenced this pull request Nov 29, 2015
This is going to be the "generic thing which can fail in a myriad of
ways" type. I want to be able to have specific useful cases which you
can recover from (e.g. no `Box<std::error::Error>`), but I don't want
people to exhaustively match on it.

Techincally adding a variant to this is still a violation of
[RFC #1105](https://github.com/rust-lang/rfcs/blob/master/text/1105-api-evolution.md),
but if you're actually matching against `__Nonexhaustive` and expected
it not to break eventually, you knew what was coming.

If some form of [RFC #757](rust-lang/rfcs#757)
ever lands, we'll switch to using that instead.
@petrochenkov petrochenkov added T-lang Relevant to the language team, which will review and decide on the RFC. and removed postponed RFCs that have been postponed and may be revisited at a later time. labels Feb 24, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
T-lang Relevant to the language team, which will review and decide on the RFC.
Projects
None yet
Development

Successfully merging this pull request may close these issues.