Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RFC for anonymous variant types, a minimal ad-hoc sum type #2587

Closed
wants to merge 8 commits into from

Conversation

eaglgenes101
Copy link

Add anonymous variant types, a natural anonymous parallel to enums much like tuples are an anonymous parallel to structs.

This RFC is intentionally minimal to simplify implementation and reasoning about interactions, while remaining amenable to extensions through the ecosystem or through future proposals.

Rendered

Thanks to everyone that helped me identify points that I may have missed in the Internals thread and on Reddit.

We will go to space today!
@Centril Centril added the T-lang Relevant to the language team, which will review and decide on the RFC. label Nov 5, 2018
@newpavlov
Copy link
Contributor

newpavlov commented Nov 5, 2018

UPD: See this Pre-RFC.

To not undermine the work put into this RFC, but I think that the proposed solution is quite sub-optimal and we should pursue the proper "anonymous union-types" (i.e. for which (u64 | u32 | u64) produces the same type as for (u32 | u64) and (u64 | u32)), with a proper ergonomic matching syntax additions. Of course it will require to present a solution for the generic code matching problem.

One possibility for how this functionality could look is:

struct Err1;
struct Err2(u32);
fn foo() -> (u32 | () | A | B | C) { .. }
fn bar() -> Result<(), (Err1 | Err2)> { .. }

match foo() {
    // if result has type A, the value will be stored in the `a`
    a: A => { .. }
    // we can match with a value as well
    1: u32 => { .. }
    // we probably can allow omitting explicit type if it can be inferred
    () => { .. }
    // `b` will have type (A | B | C), probably we don't want to diverge too much
    // from how matching works today
    b => { .. }
}

match bar() {
    Ok(()) => { .. },
    Err(Err1) => { .. },
    Err(e: Err2) => { .. },
}

For generic matching problem I think we should just specify that match arms are tested in order, so if on monomorphization of matching on (U | V) both types will be the same (say u32), then we will get the following code:

match uv_enum {
    v: u32 => { .. },
    v: u32 => { .. },
}

In other words only the first match arm will be executed, and the second arm will be removed. (though compiler should probably emit unreachable_patterns warning) Yes, it could lead to some bugs, but I think this behaviour will be easy to understand and find thanks to the warning.

Regarding memory representation of this type (A | B | C) could desugar to something like this:

union __AnonUnionPayloadABC { f1: A, f2: B, f3: C }
struct __AnonUnionABC {
    discriminant: TypeId,
    payload: __AnonUnionPayloadABC,
}

It will make converting from (u32 | f32) to (u32 | f32 | f64) quite easy for compiler (make sure that destination is equal or bigger and just copy bytes), and matching will be desugared to comparison of TypeIds, but as a drawback you will always have 64 bit discriminant, while in the most cases 8 bits will be more than enough.

Regarding how types like ((A | B) | (C | D)) should be handled, from the user perspective ideally it should be equivalent to (A | B | C | D), but I am not sure how exactly it should be implemented in compiler.


// And then match on it
match x {
(_ | _)::0(val) => assert_eq!(val, 1_i32),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: (_) => is already stabilized as a valid (though warned-about) pattern in nightly, so I suspect this needs to be <(_ | _)>::0(val) => for the same reason you can't do ()::clone(&()).

Copy link
Author

@eaglgenes101 eaglgenes101 Nov 5, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed, especially since the angled brackets are also required for tuple associated items. I'm not particularly fond of the kirby-boss syntax, but consistency helps ease implementation, which is my primary concern.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed back, it's now an unresolved question, with me leaning towards the syntax depicted above.

@Centril
Copy link
Contributor

Centril commented Nov 5, 2018

Assuming that we do want to provide structural coproducts...

As I noted on the internals thread I think that the most natural way to provide structurally typed coproducts is to take enums and then try to move minimally from them. To me that seems beneficial both for reuse of compiler machinery and to make the learning distance smaller. What does that entail practically? Keep the existing syntax and type system notions of variants but allow them to be summed together in an ad-hoc structural fashion. Two candidate syntaxes for that are (1):

type Foo = Bar | Bar(u8, f32) | Quux { field: String };
// This variant is inspired by or-patterns to give a sense of disjunction.
// Tho this syntax is likely ambiguous if we have `pat ::= ... | pat ":" type ;`
// so we will instead need:
type Foo = (Bar | Bar(u8, f32) | Quux { field: String });

and (2):

type Foo = enum { Bar, Baz(u8, f32), Quux { field: String } };

Here (2) is the most minimally different syntax from nominally typed enums. The only difference between (2) and the enums we have today is that the name has been dropped. For those who are concerned by the syntactic complication of Rust this should be appealing or at least least worst.

Another benefit of both (1) and (2) is that only the type grammar changes. Zero changes need to happen to the expression and pattern grammars; this is beneficial for making the language easier to learn. To pattern match and construct things, you simply write:

let a_foo: Foo = Baz(1, 1.0);

match a_foo {
    Bar => expr,
    Baz(x, y) => expr,
    Quux { field } => expr,
}

Both of (1) and (2) also have nice properties:

  1. The order of the variants commute. This is nice in a mathematical beauty way... but what does it buy us practically? If you can reorder the variants freely, then it is more refactoring-friendly.

  2. Because the variants have names, you can have different unit variants, e.g.

fn foo() -> enum { A, B } { 
    ...
}

This is useful for the structural nature because it allows you to invent new "flag conditions" freely.

  1. It allows easy refactoring towards a nominal type; An IDE should easily be able to take enum { A, B } and make it into a normal enum because everything beside the name of the type is already there, so there's very little to fill in.

@H2CO3
Copy link

H2CO3 commented Nov 5, 2018

@newpavlov I think that should be a different proposal on its own. I also think that sum types are way less problematic than union types exactly because they don't try to second guess the user and filter out duplicates structurally. That filtering quickly becomes a highly nontrivial issue as soon as you start adding e.g. generics (which was pointed out on the internals forum as a possibility).

@H2CO3
Copy link

H2CO3 commented Nov 5, 2018

That said, even for anonymous variants / anonymous sum types, I don't find the motivation convincing enough and the gains in convenience sufficient compared to the extent to which it grows the language. I have already explained why in the internals thread. However, it seems that for opinions to be counted in an RFC at all, they have to be re-iterated here, so there we go.

@eaglgenes101
Copy link
Author

eaglgenes101 commented Nov 5, 2018

A large factor in me deciding to use positions rather than identifiers to identify variants was that Rust currently lacks the ability to have placeholders for names and to generify over name, and to add that would require extra groundwork to make happen. And similar proposals which had just a few more conveniences than this proposal have been shot down for complexity!

Here, if you don't believe me: Similar proposal, which is almost the same as this one besides being less detailed and using a more ergonomic syntax

@mcy
Copy link

mcy commented Nov 5, 2018

A quick read of the grammar indicates that parenteses are not necessary. Is this correct?

Furthermore, we should be clear that a sum can contain unnameble summands, yes?

fn foo() -> impl Copy {
  if cond {
    <_>::0(|| 0)
  } else {
    <_>::1(1)
  }
}

@eaglgenes101
Copy link
Author

eaglgenes101 commented Nov 5, 2018

The parentheses are not necessary, but I put an eye on the possibility on later extensions, and if multi-field variants become a thing and commas are used to separate the fields of a variant, I don't want ambiguity to result from things like this:

(f32 | i32, i32 | f64); //  Tuple of two anonymous variant types, or an anonymous variant type whose second variant has multiple fields? 

And I'm pretty sure if you make the number of variants clear and the type of each variant unambiguous, it should work. Your example as is wouldn't, but this would, as it specifies that there are two variants in one of the match arms:

fn foo() -> impl Copy {
  if cond {
    <(_|_)>::0(|| 0)
  } else {
    <_>::1(1)
  }
}

@solarretrace
Copy link

And similar proposals which had just a few more conveniences than this proposal have been shot down for complexity!

Here, if you don't believe me: Similar proposal, which is almost the same as this one besides being less detailed and using a more ergonomic syntax

I don't think that the previous proposals were rejected on the grounds of complexity per se, but on the grounds that the complexity was too high with respect to the advantages offered. Personally, I don't think this proposal changes that in any significant way. (Certainly not if the intention is for follow-up proposals to put things into roughly the same state. If you can't afford the car, offering to purchase the parts and assembly separately does not make it cheaper.)

@eaglgenes101
Copy link
Author

eaglgenes101 commented Nov 6, 2018

If you can't afford the car, offering to purchase the parts and assembly separately does not make it cheaper.

That analogy kind of implies that the proposal will be useless unless all the doodads are in place, which is not the case here. I counter-propose the analogy of a car mortgage, where the payment might be somewhat more in the long run, but spreading out the costs makes it more affordable than paying it all at once. And this proposal is designed to be easily extendible by the ecosystem even in the short term, so the doodads can be hashed out by competeing ecosystem solutions rather than being stuck in RFC discussions for months at a time and risking nonacceptance.

@eddyb
Copy link
Member

eddyb commented Nov 6, 2018

I think the problem with this feature, and a few others, including more union-y one, is that it tries to preserve pattern-matching as a primitive even for unknown types.

A + B strictly monotonically increases information.
That is, if we use |A| to denote the number of possible values of a type, with |!| = 0 and |()| = 1, then |A + B| = |A| + |B|, which is greater than either A or B, if neither is 0 (uninhabited).
This is because you can always tell them apart, e.g.: |T + T| = |(bool, T)| = 2 * |T|.


What I want to use an anonymous "choice of one type from several" for, is not pattern-matching, but static trait dispatch - which would be done automatically by the compiler, with an enum-like tag.
Note that it's still possible to allow pattern-matching if the types are known to be disjoint.

That is, A | B where |(A | B)| = |A ∪ B| = |{ x | x ∈ A ∨ x ∈ B }|
(with the {...} in that part being set notation).
For T | T, that's just T, each value shows up once, there's no semantic duplication or space waste.

We can probably even make e.g. if x { y } else { z } have type typeof(y) | typeof(z) that collapses for all the code that compiles today, into just one type (the current one).

EDIT: to give a concrete example of the usecase I'm talking about:

fn foo() -> impl Iterator<Item = X> {
    if cond() {
        bar()
    } else {
        baz()
    }
}

I want this to work without using Either (which doesn't scale beyond two cases).
The impl Trait examples in this thread look unergonomic to the point where it might be easier to come up with a library-based solution that encodes a number into a tree of Eithers.

@eaglgenes101
Copy link
Author

eaglgenes101 commented Nov 6, 2018

Yes, I admit as much that this proposal doesn't reach into that area. It's similar to problems from not being able to unsize into a dynamic trait object an enum whose variants consist of a single field, all of which implement a particular trait, automatically. (At least not without the help of macros of some kind, the most recent reasonably popular one of which was implemented via macro_rules! and last updated over two years ago. Yes, the fact that there aren't more recent proc macro alternatives still confounds me.)

Anonymous variant types, by virtue of their similar semantics, should also benefit from any features added to enums. In particular, unsizing on enums into a trait object implemented by all of its variant fields will also help anonymous variant types resolve their most noticeable deficiency: the inability to dispatch over an anonymous variant type as a whole.

Perhaps this might be a feature that is worth the extra complexity to implement initially, but as of now, I'm not convinced that it won't sink the rest of the proposal under its own weight.

@eddyb
Copy link
Member

eddyb commented Nov 6, 2018

Yes, I admit as much that this proposal doesn't reach into that area.

Fair enough.

I just don't see the point of a related RFC that doesn't tackle -> impl Trait ergonomics, I guess.

@burdges
Copy link

burdges commented Nov 6, 2018

I donno if I understand @eddyb but yes traits sound key here: We could have enum Trait behave exactly like dyn Trait except that (a) all variants must be clear at compile time and (b) std does not expose the vtable pointer. As a result, rustc could eventually implement enum Trait as variants, not vtables. After that works then one could explore weakening trait object restrictions for enum Trait.

@eddyb
Copy link
Member

eddyb commented Nov 6, 2018

FWIW, I never meant vtables, you'd still have tags but only where needed.

@eaglgenes101
Copy link
Author

eaglgenes101 commented Nov 6, 2018

I've changed the syntax of variants (for both calling and matching) from stuff like (_|_)::0 to stuff like <(_|_)>::0 for consistency with type-associated identifiers and to simplify the grammar, but I've been since told that the second would require extra work to be able to work with inferred type placeholders. Should I change it back? (I never felt good about the kirby-boss syntax anyway, and the first one I liked better for its consistency with enums. However, it would entail a bit of extra rules on the grammar, or so I've been told. Perhaps a technical trait is in order?)

@H2CO3
Copy link

H2CO3 commented Nov 6, 2018

What I want to use an anonymous "choice of one type from several" for, is not pattern-matching, but static trait dispatch - which would be done automatically by the compiler, with an enum-like tag.
Note that it's still possible to allow pattern-matching if the types are known to be disjoint.

That is, A | B where |(A | B)| = |A ∪ B| = |{ x | x ∈ A ∨ x ∈ B }|

Ie. you want union types, not sum types. That is fine. They are problematic in a language with real, parametric generics though, for a number of reasons, and even in the absence of generics, they can carry surprises..

We can probably even make e.g. if x { y } else { z } have type typeof(y) | typeof(z) that collapses for all the code that compiles today, into just one type (the current one).

And then basically every if expression with two arbitrary (different) types in its two cases would typecheck? That sounds Bad™.

@Centril
Copy link
Contributor

Centril commented Nov 6, 2018

@H2CO3 so to clarify, I believe @eddyb didn't want to add these union types to the surface language but rather as an implementation strategy behind the scenes for -> impl Trait.

@H2CO3
Copy link

H2CO3 commented Nov 6, 2018

Aaah, okay – sorry, misunderstood that. That, I would support.

@eddyb
Copy link
Member

eddyb commented Nov 6, 2018

@H2CO3 I'm actually be curious of what you mean by interactions between union types and parametricity if you can't pattern-match on them (unless you have a proof of disjointness)?

In fact, we can rely on lifetime parametricity to even do Invariant<'a> | Invariant<'static> (since no code that's generated could possibly depend on either of the cases being "active").
It's then effectively exists 'x.Invariant<'x>, which you could pass to some function/closure F: for<'a> FnOnce(Invariant<'a>).

@H2CO3
Copy link

H2CO3 commented Nov 6, 2018

@eddyb I'm not sure what you mean about pattern matching. I've explained the problem in the post I linked. What should the following code do?

fn foo<T>(x: T | bool) {}

foo::<bool>(false);

I.e., the question is: if bool | bool == bool, then how can the compiler determine that and generate correct code (or an appropriate diagnostic) without performing additional type checking after monomorphization?

(I believe the codegen issue is much less serious if these union types are not exposed at the language level, because I could then imagine just "not doing anything", i.e. duplicating dispatch logic for every variant of the same type, which could take up more space but it would be otherwise 100% correct wrt. semantics, and probably let MIR optimization get rid of it.

But if you mean that the language should actually expose these types in a manner that they can be spelled out, other than behind an impl Trait like eg. closures, then the question of e.g. diagnostics and type checking in general still stands.)

@eddyb
Copy link
Member

eddyb commented Nov 6, 2018

@H2CO3 foo::<bool> takes a bool because (T | bool)[bool/T] = (bool | bool) = bool.
Further checking isn't needed as long as you've already checked foo under the assumption that T could be anything, including bool.
So, e.g. you can't allow foo to check whether x is T. You can, however, let foo call x.clone() if you add a T: Clone bound because (T | bool): Clone can automatically be implemented.

The conditions for <A as Trait<B>> to be auto-implemented when e.g. B = T | U are:

  • <A as Trait<T>>::X = <A as Trait<U>>::X for all associated consts/types X
  • all the required methods of Trait have a signature with:
    • exactly one occurrence of B in args' types, and at most one in the return type
    • the argument is of type B, &B, &mut B or Pin<&mut B>
    • the return type (if it includes B) is exactly B
      • longer-term, it would also be possible to support by-value wrappers e.g. Option<B>, similar to how we could have Option<T> coerce to Option<U> if T can coerce to U
  • default methods can still have a conforming signature, in which case the compiler can ignore the default (e.g. Iterator::size_hint, which would be useful for performance)

I believe that includes Iterator and Future, which are the primary usecases here.

@H2CO3
Copy link

H2CO3 commented Nov 6, 2018

foo::<bool> takes a bool because (T | bool)[bool/T] = (bool | bool) = bool

Certainly. I know that and you know that. But how does the union operation itself manifest at the language surface if you are not allowed to look at the type signature after instantiating foo::<T> with T = bool?

As mentioned before, I'm not concerned about trait implementations alone. The low-level details can certainly be implemented in a number of ways, including the naive approach I described above. The problem is with situations which were, again, mentioned in the internals thread, for example if spelling out the union of a type with itself is to be disallowed. Enforcing that condition can only happen after generic instantiation.

@eaglgenes101
Copy link
Author

eaglgenes101 commented Nov 13, 2018

It seems like the quadratic text space required, while worse than what could be, shouldn't be a concern in practice. For small numbers of cases, typing the correct number of underscores and vertical bars shouldn't take up much time and space. For larger number of cases, one can synthesize a proc macro with relative ease that generates a anonymous variant type placeholder from the number of variants desired, which clamps down the text space usage of a full match to linearithmic. If that isn't enough to not flood the screen with type declaration macros, then I say you're getting to the point where you should probably rethink what in the world you're doing with so many variants, and maybe consider refactoring to a proper named enum, a trait, or a struct/tuple of orthogonal enumerable parts.

@glaebhoerl
Copy link
Contributor

Yeah. Another thing that occurred to me right after I posted my previous comment was that there already is a quadratic, or at least n*k blowup, which exerts downward pressure on the number of components in tuples as well as anonymous sums -- namely you have to write out all of the components each time in type signatures. Unlike with named types. Now, you could introduce a type synonym for it, but if you're writing a type definition then you may well as write a struct or enum definition.

@Centril Centril added A-typesystem Type system related proposals & ideas A-structural-typing Proposals relating to structural typing. A-sum-types Sum types related proposals. A-syntax Syntax related proposals & ideas A-patterns Pattern matching related proposals & ideas A-expressions Term language related proposals & ideas labels Nov 22, 2018
@Centril Centril assigned Centril and nikomatsakis and unassigned Centril Jan 3, 2019
@graydon
Copy link

graydon commented Jan 12, 2019

Opposed. High cost addition -- implementation and cognitive load -- minimal win over the sums we have. Also we already had anonymous sums early on and removed them. This is revisiting a reduction intentionally made in the past.

@eaglgenes101
Copy link
Author

Also we already had anonymous sums early on and removed them.

Can you show me these early Rust anonymous sums? Nothing of this sort came up when I dug into previous proposals for anonymous sum types in Rust.

@nikomatsakis
Copy link
Contributor

@rfcbot fcp postpone

I'm going to move to postpone this proposal. While I do think that there is potential utility to this feature, I also think that the time is not ripe. Although the roadmap is not set, I think it very likely that our focus is going to be on "closing out" many of the language additions that are already in flight (e.g., specialization and so forth) and not on adding a whole new base form of type. Moreover, we already have troubles with the lack of variadic generics for things like tuples, and I am reluctant to add another "open ended" form that might bring on similar complications.

@rfcbot
Copy link
Collaborator

rfcbot commented Jan 24, 2019

Team member @nikomatsakis has proposed to postpone this. The next step is review by the rest of the tagged team members:

No concerns currently listed.

Once a majority of reviewers approve (and none object), this will enter its final comment period. If you spot a major issue that hasn't been raised at any point in this process, please speak up!

See this document for info about what commands tagged team members can give me.

@rfcbot rfcbot added proposed-final-comment-period Currently awaiting signoff of all team members in order to enter the final comment period. disposition-postpone This RFC is in PFCP or FCP with a disposition to postpone it. labels Jan 24, 2019
@eaglgenes101
Copy link
Author

eaglgenes101 commented Jan 25, 2019

Fair enough. I'll bide my time, and see if I can help move RFCs that might make swallowing this RFC later easier in the meantime. I'll aim for after the 2021 edition release and see what's changed from now.

@rfcbot
Copy link
Collaborator

rfcbot commented Jan 26, 2019

🔔 This is now entering its final comment period, as per the review above. 🔔

@rfcbot rfcbot added final-comment-period Will be merged/postponed/closed in ~10 calendar days unless new substational objections are raised. and removed proposed-final-comment-period Currently awaiting signoff of all team members in order to enter the final comment period. labels Jan 26, 2019
@rfcbot
Copy link
Collaborator

rfcbot commented Feb 5, 2019

The final comment period, with a disposition to postpone, as per the review above, is now complete.

By the power vested in me by Rust, I hereby postpone this RFC.

@rfcbot rfcbot added finished-final-comment-period The final comment period is finished for this RFC. postponed RFCs that have been postponed and may be revisited at a later time. and removed final-comment-period Will be merged/postponed/closed in ~10 calendar days unless new substational objections are raised. disposition-postpone This RFC is in PFCP or FCP with a disposition to postpone it. labels Feb 5, 2019
@rfcbot rfcbot closed this Feb 5, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-data-types RFCs about data-types A-expressions Term language related proposals & ideas A-patterns Pattern matching related proposals & ideas A-structural-typing Proposals relating to structural typing. A-sum-types Sum types related proposals. A-syntax Syntax related proposals & ideas A-typesystem Type system related proposals & ideas finished-final-comment-period The final comment period is finished for this RFC. postponed RFCs that have been postponed and may be revisited at a later time. T-lang Relevant to the language team, which will review and decide on the RFC.
Projects
None yet
Development

Successfully merging this pull request may close these issues.