Proposal: Add completeness checking to pattern matching draft specification #486

gafter · 2017-04-21T22:45:12Z

gafter
Apr 21, 2017

@agocke commented on Fri Jan 30 2015

Background

As noted in issue dotnet/roslyn#180, many modern programming programs are data-focused, especially distributed applications which tend to store, manipulate, and move sets of data between different storage and computation points. One solution proposed to deal with this issue is a combination of records and pattern matching. Records provide a simple way to declare and structure the data types and pattern matching provides a way to destructure and manipulate the data.

Problem

Records provide a great way to represent the data and pattern matching provides a great way to manipulate the data, but there is currently no mechanism in the dotnet/roslyn#180 proposal to ensure that the data and the logic remain in sync. The nature of records and pattern matching is that the data declaration code is often far from the data consumption code. In a distributed system it's even more likely that a single data structure will be consumed and manipulated in various parts of the code base. If the data structure is ever modified, there is currently no mechanism in the draft to alert the programmer that all instances of manipulation logic must be updated.

Solution

Add completeness checking to certain switch statements on certain record types. The core of this proposal is to provide a warning when a switch statement does not handle every possible match on a type hierarchy. This proposal features two possible designs for this idea, presented in order of increasingly intrusive modification to the language.

Design 1

This design actually features no new syntax or semantics beyond that of proposal dotnet/roslyn#180. The suggestion is to create a C# type heirarchy which can be guaranteed 'complete' with existing language features. In this case, complete means that it is not possible for a new subclass of the root member of the type hierarchy, so the compiler can be sure that any and all subclasses of the chosen switching type are visible in the current compilation.

We can construct this type hierarchy in existing C# with the following rules:

All subclasses of the root type must be sealed, preventing any subclassing of any existing leaf types in the hierarchy.
All constructors of the root type must be private, prevent any subclassing of the root type.
- As a consequence, all subclasses must be inner classes and thus must be in source in the current compilation

Here's an example of the structure of this type hierarchy:

abstract class C
{
  private C() {}
  public sealed class C_1 : C {}
  public sealed class C_2 : C {}
      ...
  public sealed class C_n : C {}
}

This guarantees that switching on an instance of type C which explicitly matches C_1...C_n has matched against every possible instance of C. The only thing which changes about the language specification is a requirement that the compiler produce a warning when not all cases are matched.

Design 2

There are a few disadvantages to Design 1:

The mandated structure is complicated and brittle. Forgetting to mark any of the subclasses as sealed or adding any public constructors won't produce a compiler error or warning, but the compiler will now silently skip the completeness check.
The structure is verbose -- most of the sealed or private markers are mostly part of the 'incantation' of completeness and are not directly related to the task at hand.
The relevant record instances are all nested classes, so all references require an extra layer of naming indirection.

Design 2 attempts to fix these problems by replacing much of the boiler plate with a new combination of modifiers on a type -- abstract + sealed. Under Design 2, marking the root type of a hierarchy as abstract sealed will cause the structure from Design 1 to be generated by the compiler in lowering. The following example demonstrates what the structure from Design 1 looks like with an abstract sealed type:

abstract sealed class C {}
public class C_1 : C {}
public class C_2 : C {}
   ...
public class C_n : C {}

In this case, most of the problems with Design 1 are solved, but new semantics are required to be added to the language.

@alrz commented on Wed Sep 30 2015

This can be another option as well (based on Design 2):

public abstract case class C {}
public case class C_1 : C {}
public case class C_2 : C {}
   ...
public case class C_n : C {}

So case classes can only inherit from other case classes. Also, this helps to distinguish between case classes and regular ones, if they were defined in different files.

@gafter commented on Wed Sep 30 2015

@alrz I don't get it. Would it be an error to extend C_1 in another assembly? Or is there some other rule that would allow the compiler to know that it sees all of the cases?

@alrz commented on Wed Sep 30 2015

@gafter Quoting yourself, "It's a closed hierarchy of types" (like discriminated unions in F# but more flexible since you can inherit from other cases) so why should it be extendable in another assembly?

@gafter commented on Wed Sep 30 2015

@alrz so case would mean "not extendable in another assembly"? Sort of a strange choice of keyword for that meaning.

@alrz commented on Wed Sep 30 2015

@gafter Same keyword is used in Scala for exactly this purpose — implying that each case class corresponds to a case statement, I guess.

@gafter commented on Wed Sep 30 2015

@alrz no, case in Scala does not restrict inheritance.

@alrz commented on Sun Oct 04 2015

@gafter Yes, case has nothing to do with inheritance restriction in Scala, but my suggestion is not exactly what Scala is offering. Actually it's something in between of its syntax:

sealed abstract class C
case object C_1 extends C
case object C_2 extends C

and the proposed one:

sealed abstract class C {}
class C_1 : C {}
class C_2 : C {}

I'm trying to say that I think this is a more expressive syntax compared to above examples, for this specific use case:

abstract case class C {} // or sealed abstract case class?
case class C_1 : C {}
case class C_2 : C {}

Marking all classes with a unified keyword, indicating that these classes belong to a closed hierarchy.

PS: Although, this is just another option to consider. Except for this little concern, sealed abstract looks good to me.

@orthoxerox commented on Tue Oct 06 2015

Will this support multiple levels of inheritance?

abstract sealed class C {}
abstract sealed class D : C {}
public class C_1 : C {}
public class C_2 : C {}
public class D_1 : D {}
public class D_2 : D {}

So now the compiler will look for either C, (C_1 + C_2 + D) or (C_1 + C_2 + D_1 + D_2) when checking for completeness.

@gafter commented on Tue Oct 06 2015

@orthoxerox Yes, we do not plan to make one level of inheritance any more special that a second level of inheritance. The compiler will have to build a decision tree and ensure that every path reachable from the root is handled.

@orthoxerox commented on Tue Oct 06 2015

@gafter great, I missed that in F#.

@gafter commented on Tue Nov 03 2015

See the excellent comparison of difference languages by @jonschoning in dotnet/roslyn#5154 (comment)

@dsaf commented on Sun Nov 29 2015

Would C#7/8 promote having more of multiple types per file or the IDE will be capable of grouping them automatically?

@danfma commented on Fri Jan 15 2016

Hey guys,

I'm not a designer language or expect in the subject, but as a user, I didn't like the super verbose spec on:

abstract sealed class C {}
abstract sealed class D : C {}
public class C_1 : C {}
public class C_2 : C {}
public class D_1 : D {}
public class D_2 : D {}

Maybe some existing syntax, like in F#, or just a more friendly keyword like "sealed group" or something like that...

"abstract sealed" does not say me nothing about the class, because a new programmer could just assume that it can't inherit from that class, just because it is "sealed" (and classes abstract and sealed are the actual "static" classes if I didn't wrong).

How many levels of inheritance can I have? Just one? Because, if is just one, we could use something similar to the enum declaration, like:

public group class C {
  public class One(string Name);
  public class Two(string Name, string LastName);
  public class Three(bool youGotIt);
}

@danfma commented on Fri Jan 15 2016

Or with multiple levels:

public group class C {
  public class One(string Name);
  public class Two(string Name, string LastName);
  public class Three(bool youGotIt);
  public group class Option {
    public class Some(object Value);
    public class None;
  }
}

With this syntax, the new record types and syntax will fit perfectly.

@danfma commented on Fri Jan 15 2016

Exactly like in that, dotnet/roslyn#6739!

@Shiney commented on Tue Jan 19 2016

Is there any reason why this proposal only talks about classes? Couldn't this be useful for interfaces, which would allow you to choose one of the contracts in the set of types, to implement and then you would be hooked into a pattern matching infrastructure rather than having to create a copy of your type as an instance of a given ADT.

Also given that inheritance is often used for code sharing, it might be good to be able to decouple the implementation from the interface.

@agocke commented on Tue Jan 19 2016

@Shiney Interfaces do not provide any way of restricting inheritance, so the compiler would be unable to statically verify that all cases are checked.

@Shiney commented on Tue Jan 19 2016

Is that restriction just a CLR restriction? Could the compiler throw an error if you tried to only implement the base interface if it is abstract? Then you could still get all the nice compile time checking as long as someone isn't doing something to try to get around it.

@Shiney commented on Tue Jan 19 2016

Also adding support for interfaces would allow covariance and contravariance for generic ADTs.

@agocke commented on Tue Jan 19 2016

@Shiney There are no abstract or concrete interfaces, just interfaces. I'm not sure what you're asking. Can you provide an example?

@Shiney commented on Tue Jan 19 2016

Something a bit like this (copying one of the syntaxes from above).

abstract sealed interface IA {}
public interface I_1 : IA { int Foo();}
public interface I_2 : IA { string Bar();}

public class C1: I_1
{
  Foo() => 3;
}

public class C2 : I_2
{
 Bar() => "hello";
}

//Should not compile as the interface is "abstract"
public class BadClass: IA
{
}

Maybe calling the interface abstract is the wrong word for it, but there would be some sort of concept of an interface base type that you wouldn't be allowed to directly implement you would have to choose one of it's subclasses.

@agocke commented on Wed Jan 20 2016

@Shiney That's an example of the source code, but not of the emitted assembly. What would that compile to in existing C# code? The problem that I see with interfaces is that there is no way to restrict their implementation -- if you can reference it you can implement it.

@Shiney commented on Wed Jan 20 2016

The interface IA would have a special attribute [DoNotImplementDirectlyAttribute] added to it which if a compiler were to see, it would only let you implement that interface if you are implementing a sub interface in the same assembly. If a new version of the CLR were to be released it would do this restriction natively.

@agocke commented on Wed Jan 20 2016

@Shiney That's not a very good restriction though, because anyone using the current compiler could implement that interface and your code would blow up at runtime, right?

@Shiney commented on Wed Jan 20 2016

Yes it would blow up at run time.

I'd argue that it wouldn't be that bad for it to blow up at runtime, given that someone using the current compiler and making that error is doing something just as bad as

var a = new int[100];
var b = a[-1]; // Run time error that could have been caught at compile time with enough cleverness.

In that they are using the array API wrongly, and someone implementing that interface would be using the API to the new algebraic data type wrongly.

@agocke commented on Wed Jan 20 2016

@Shiney I don't think we should do it with this big of a hole. The problem isn't that the person who uses an old compiler has their app blow up, it's that someone who does everything right but references a bad library will have it blow up. The person who does everything right should have the guarantee.

@Shiney commented on Thu Jan 21 2016

Wouldn't it be possible for the compiler to check if a library is bad (in this specific way) before using it? This wouldn't break any existing code as the attribute didn't exist before.

Of course this wouldn't protect against runtime creation of types, so if that is an important use case to protect against this sort of runtime error then it shouldn't be done.

Also aren't there bigger issues to worry about if you are referencing bad libraries, you are putting that code into your process after all.

Even if this isn't implemented in the next version is it possible to ensure that the ADT syntax could be sensibly extended to be applied to interfaces?

@qrli commented on Wed Jan 27 2016

For design 2: I think the syntax worth some more thought.
Currently each subclass has its own accessibility modifier, so it is possible to create some strange case. e.g.:

abstract sealed class C {}
public class C_1 : C {}
internal class C_2 : C {}

Is this case meaningful or not? I cannot find a usecase for it. So I think it may be better to enforce a single access modifier for all related classes. E.g. only allow access modifier on class C but not on C_n.

@qrli commented on Wed Jan 27 2016

Another question: does Design 2 require C_n classes in the same source file as class C or not?
Its syntax does not enforce this. So it looks like I can have one source file for each class, which is the typically standard way to organize.

If it does not require the same source file, then it feels like there is similar solution with existing feature: Just mark the constructor of base class C as internal.

If it does require same source file, C# also has partial class feature, which allows it to be split into multiple files. So it looks like still possible to use above solution...

@DavidArno commented on Wed Feb 10 2016

As I understand this proposal, it would allow the creation of discriminated unions, such as

public abstract sealed class Option<T>
{
    class None();
    class Some<T>(T value);
}

And, because the compiler knows this is a complete hierarchy, I could pattern match as follows:

public string SomeFunction(Option<int> optionalValue) =>
    optionalValue switch
    {
        case None(): "It's a None!",
        case Some(var value): $"It's a {value}"
    };

In other words, the default (or * if we stick with that notation) case wouldn't be required.

Have I understood this correctly?

@gafter commented on Wed Feb 10 2016

@DavidArno Yes, that is precisely correct.

@DavidArno commented on Sun Apr 03 2016

@gafter,

Is this functionality implemented in any of the feature branches yet?

@KalitaAlexey commented on Sat Apr 02 2016

I suggest ADT with the following syntax:

void Process(int | List<int> v)
{
    switch (v)
    {
        case int i:
            ProcessInt(I);
            break;
        case List<int> l:
            ProcessList(l);
    }
}

And to make ADT like type the following syntax:

// For structs
struct Type = int | bool;
// For classes or both structs and classes
class Type = int | List<int>;

Roslyn may enforce a user to write class when one of specified types is not a struct.

@svick commented on Sun Apr 03 2016

@KalitaAlexey I don't think you can all that ADT, it's more like union type. In ADTs, the cases have names and can be recursive (I can't tell if your proposal would allow that or not).

@KalitaAlexey commented on Sun Apr 03 2016

I like how it is done in Rust. I think we could inherit their enum ADT.
In Haskell

data Expression = Number Int
                | Add Expression Expression
                | Minus Expression
                | Mult Expression Expression
                | Divide Expression Expression

I'd like to have In C#

enum Expression {
    Number(int Number),
    Add(Expression LeftOperand, Expression RightOperand),
    Minus(Expression Expression),
    Mult(Expression LeftOperand, Expression RightOperand),
    Divide(Expression LeftOperand, Expression RightOperand),
}

And pattern matching like

void ProcessExpression(Expression expression)
{
    switch (expression)
    {
    case Expression::Number n:
        ProcessNumber(n);
        break;
    case Expression::Add a:
        ProcessAdd(a);
        break;
    case Expression::Minus m:
        ProcessMinus(m);
        break;
    case Expression::Mult m:
        ProcessMult(m);
        break;
    }
    case Expression::Divide d:
        ProcessDivide(d);
        break;
}

@HaloFour commented on Sun Apr 03 2016

@KalitaAlexey #6739

@KalitaAlexey commented on Sun Apr 03 2016

@HaloFour Thanks. What's the difference then?

@wekempf commented on Tue Apr 12 2016

The discussion around this concept is scattered everywhere, so forgive me if I just repeat something said elsewhere. Option 1 has serious problems but is on the right track. Option 2 I don't care for because it scatters the declarations and muddles the concept. What we're modeling here is a discriminated union. @KalitaAlexey gets close to the syntax I'd prefer, but just using "enum" is at least confusing, if it doesn't actually cause parsing problem. I'd suggest (and saw others do so as well in other threads) "enum class".

public enum class Expression {
    Number(int Number),
    Add(Expression LeftOperand, Expression RightOperand),
    Minus(Expression Expression),
    Mult(Expression LeftOperand, Expression RightOperand),
    Divide(Expression LeftOperand, Expression RightOperand),
}

There's still lots of open questions after deciding on rough syntax like this, however. For instance, in this thread it was suggested a DU could have a type that's also a DU. I'm not sure that makes sense and would suggest not allowing that, knowing you can always add this feature in the future if that turns out to be the wrong decision but you can't remove a feature, ever. I just don't know when such a feature would actually be useful, and without a compelling use case it seems best to err on the conservative side.

I've seen other posts where the syntax is very similar to the above but "abstract sealed" is used instead of "enum class". Frankly, I think "abstract sealed" is highly confusing and gives no indication that one is building a DU, while "enum class" is intuitive.

fgfmichael · 2021-03-31T02:42:27Z

fgfmichael
Mar 31, 2021

Would love to see this in C#10 or C#11. My number one missing feature when I switch between Rust and C# is this

6 replies

iam3yal Apr 10, 2024

@Phyyl They are working on it.

p.s. I understand it's your opinion but I really don't understand how these types of comments help anything... yes DUs is important concept but when you say that you have "trouble going back to C#" and I understand it's a figure of speech it still doesn't make sense without providing the required context, it's like saying I have trouble going back to taking the train after getting a car but then it's an option not an obligation and sometimes taking the train is actually better.

Phyyl Apr 10, 2024

@iam3yal What I meant is that DUs help safety and expressiveness of code, and it's something that I find is lacking in C#. I think it's important to add your voice to features you care about, and that's what I'm doing here. I truly love C#, but it's not perfect and I want to see it improve in meaningful ways. This is an issue I have had for years (not the lack of DUs but the completeness check).

saint4eva Apr 11, 2024

@Phyyl are there features or productivity convenience which are in C# that you miss while writing Rust ?

Phyyl Apr 11, 2024

@saint4eva That's a really good question that I'm having a hard time answering to be completely honest. They both have a lot of advantages depending on what you need to accomplish.

For me specifically, having dotnet at my fingertips is actually the most convenient thing. That's obviously not a language feature, but the stdlib in rust feels a bit lacking in some areas. The ability to write an "unsafe" program without having to .unwrap() everywhere is quite convenient as well. I know this is a fundamental difference in the language and I wouldn't want Rust to change. It is its own thing and has its own use cases. I also much prefer C# syntax, but again, that's very subjective.

There are many features I would like C# to include in the coming years, most of them inspired from Rust or C/C++:

Unions
Discriminated unions
Generic size arguments + fixed size arrays: struct Data<T, int size> { public T[size] Array; }
Traits just like in Rust
TSelf, allowing to reference the Self type statically: class Person : ICloneable<TSelf> //
Shapes, allow a callee to receive an object that has a certain shape, even if there's a cost involved via a jump table or something: shape Clonable { TSelf Clone(); }
Unbound generic delegate types (not even sure this would be possible, but I encountered this "need" before): delegate void Walker<T>(T value);
More power to the constructor generic constraint: where T : new(string)
macros?! will probably never happen.
many more things that are probably not feasable or useful but sound good to my smooth brain

BreyerW Apr 13, 2024

@Phyyl Fixed size arrays technically are available already via InlineArray atrribute but it has some unfortunate warts that prevent widespread adoption (they are good enough for internal data transfer though). Also theres working prototype of const generics in dotnet runtime like A<int Size> using existing metadata so theres chance it will happen in not too distant future

And more powerful constructor generics can be emulated using static abstract methods on interfaces

agocke · 2024-04-10T17:48:50Z

agocke
Apr 10, 2024
Collaborator

If anyone wants a solution that works right now, I have written a blog post about emulating DUs with completeness checking in current C# using abstract records and an analyzer library that I wrote. https://www.commentout.com/closed-records.html

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Proposal: Add completeness checking to pattern matching draft specification #486

{{title}}

Replies: 2 comments 6 replies

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

Select a reply

Proposal: Add completeness checking to pattern matching draft specification #486

gafter Apr 21, 2017

Background

Problem

Solution

Design 1

Design 2

Replies: 2 comments · 6 replies

fgfmichael Mar 31, 2021

iam3yal Apr 10, 2024

Phyyl Apr 10, 2024

saint4eva Apr 11, 2024

Phyyl Apr 11, 2024

BreyerW Apr 13, 2024

agocke Apr 10, 2024 Collaborator

gafter
Apr 21, 2017

Replies: 2 comments 6 replies

fgfmichael
Mar 31, 2021

agocke
Apr 10, 2024
Collaborator