Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Discussion: pattern matching versus switch statement #4944

Closed
gafter opened this issue Sep 2, 2015 · 40 comments
Closed

Discussion: pattern matching versus switch statement #4944

gafter opened this issue Sep 2, 2015 · 40 comments

Comments

@gafter
Copy link
Member

gafter commented Sep 2, 2015

We had really hoped that we could extend the existing switch statement to support pattern-matching (#206) in an upward compatible way. Most of the potential semantic mismatches between the existing switch statement and its extension to pattern matching can be erased, in the sense that the pattern-matching version can reasonably be defined to match the existing switch statement syntax and semantics precisely. However there is one issue I’m not sure what to do about.

In a switch statement today

switch (expression)

the expression is required to be of one of a particular set of types (sbyte, byte, short, ushort, int, uint, long, ulong, bool, char, string, or an enum type) or its corresponding nullable type. Alternately it may be of a user-defined type that can be uniquely converted to one of these types via a user-defined implicit conversion.

If we remove this user-defined conversion condition completely, existing code that relies on the user-defined conversion will break.

If we do not remove this condition, types that have such a conversion won’t really be capable of being the governing expression of a switch statement with any nontrivial pattern-matching semantics, because the expression would immediately be converted to one of these governing types. And it would not be very intuitive that simply adding the conversion to a type breaks your ability to do pattern-matching (via switch) on values of that type.

I'm not sure what to do about this, exactly, other than possibly leave the existing switch alone and instead invent a new match statement just for use with pattern-matching.

/cc @AnthonyDGreen @terrajobst @jaredpar @Pilchie @ljw1004 @MadsTorgersen @mattwar @gafter @stephentoub @vancem @semihokur

@gafter
Copy link
Member Author

gafter commented Sep 2, 2015

If we do have to add a new statement form there would be a tension between (a) making as much like switch as possible to make it easy to remember how to use it, and (b) making it different from switch to improve it. Improvements could include addressing the things people don't like about switch (e.g. fall-through) or because of the value that we could add for pattern matching. We could require, for example, that all match statements handle every possible value of the control expression in some arm of the match.

Before we "give up" and define a new statement form, is there something we can do to address the problem described above?

@orthoxerox
Copy link
Contributor

Don't the case labels have to be a constant value of a particular set of types (sbyte, byte, short, ushort, int, uint, long, ulong, bool, char, string, or an enum type) or its corresponding nullable type as well? So as soon as you see that one of the labels isn't, you can be sure that this switch is a pattern-matching switch. If all of them are, it's a regular switch with an implicit conversion.

@gafter
Copy link
Member Author

gafter commented Sep 2, 2015

In other words adding a switch case changes the meaning the the control expression? Yecch.

@orthoxerox
Copy link
Contributor

The two use cases are literally incompatible. If you have class Foo that converts to int, your case labels must be constant ints in a regular switch, and you can't mix ints with patterns of Foo in a pattern switch, you can't just add a new switch case.

P.S. I still think new match statement and expression are the better option.

@MrJul
Copy link
Contributor

MrJul commented Sep 2, 2015

I think this problem can be solved with the following rules:

  • The type of the control expression is never implicitly converted.
  • The implicit conversion becomes an implicit is operator.
class Foo
{
  public int Value { get; set; }
  public implicit operator int(Foo foo) => foo.Value;
}
class Bar : Foo { }

Foo fooInstance = ...;
switch (fooInstance /* expression of type Foo, not int */)
{
  case Bar: ... // ok, type pattern matching
  case 2: ... // ok, implicitly 'matched' to int and verified that it equals 2
}

Decomposition:

implicit operator Target(Source source) => source.X;

becomes the same as

bool operator is(Source source, out Target target) {
  target = source.X;
  return true;
}

To get back to the example, case 2 is rewritten as:
case int i when i == 2

Then, the optimizer can optimize a switch containing only cases of this type to the good old switch.

Now, what should happen if you define an implicit operator for a type and a matching operator is? I think this should be a compile-time error.

(Note: I personally would like a new match expression introduced. As a statement? It's too close to switch and doesn't add much value IMHO.)

@dsaf
Copy link

dsaf commented Sep 2, 2015

@MrJul

(Note: I personally would like a new match expression introduced. As a statement? It's too close to switch and doesn't add much value IMHO.)

Surely both are useful? How do I match to different actions (not functions) without having to return some void or unit type (which is not available at the moment)?

@MrJul
Copy link
Contributor

MrJul commented Sep 2, 2015

@dsaf With switch :) It's using match for both statements and expressions that I find confusing, but maybe I misunderstood something here.

@dsaf
Copy link

dsaf commented Sep 2, 2015

Also wouldn't existing switch code already potentially break due to new exhaustiveness checks?

@AdamSpeight2008
Copy link
Contributor

I believe that match expression would be better than modifying the switch statement,
C#'s switch is different to VB's Select Case as it associated with switch-ing execution based on a value,.
Where as Select Case has an association with select-ing what to do next, depending on the case (expression) being true. I.e. it is more suited to being used for pattern-matching. True the most common usage is choosing based on a value, so justifies the use of a reduced syntax.

For example this is currently possible in vb.net but not C#

Select Case True
  Case a < 0
  Case a = 0
  Case a > 0
End Select

Select Case a
  Case Is < 0
  Case Is = 0
  Case Is > 0
End Select

Select Case a
  Case Integer.MinValue To -1
  Case 0
  Case 1 To Integer.MaxValue
End Select

and thus better suited to being extended to include pattern-matching capabilities.

Select Case (t1, t2)
  Case (a As Integer, b As Integer) When a < b
  Case (a As Double , b As Double ) When a < b
  Case Else
End Select

If I was to add pattern-matching to C# I would use a new construct (match), so I could us new syntax without affecting existing code.
Currently (IFAIK) c# doesn't require that default(switch case) must be specified for usage of a switch statement. Whereas a match expression is required to always have default.

match ( (t1,t2) )
{
  ¦: (int a, int b)       when ( a <= b ) => return ;
  ¦: (double a, double b) when ( a <= b ) => return ;
  ¦: __ => 
}

Remove the fall-through capability of the "case's", by supporting multiple patterns per a match. eg

match ( (t1,t2) )
{
  ¦: (int a, int b)       when ( a <= b ),
     (double a, double b) when ( a >  b ) => return ;
  ¦: __ => 
}

I'm basing the syntax on an existing .net language that have pattern matching F# and Nemerle. but use a different symbol ¦: instead g | so it isn't to be confused with the logical / bitwise or operator.

@gafter
Copy link
Member Author

gafter commented Sep 2, 2015

@MrJul

I think this problem can be solved with the following rules:

  • The type of the control expression is never implicitly converted.
  • The implicit conversion becomes an implicit is operator.

That's not the way that operator is is used. It is only applied in the case that one uses the pattern syntax Type ( Pattern, ... )

But I like the direction you're taking. Here is my take on it:

  • Depending on the pattern in the case either the original (unconverted) value is used or the converted value is used:
    • If the unconverted type could match the pattern, use the original value.
    • If the converted type could match the pattern, use the converted value.
    • Otherwise an error

I think this is nicely compatible because the existing "patterns" (switch cases) could never match a value of a user-defined type except by a converted value. And the compiler is already required to decide if a value of a given static type "could" match a pattern or not (3 is string s is a compile-time error).

For example, for a control expression of type X that could be converted to int:

  • the pattern 3 uses the converted value
  • the pattern int x uses the converted value
  • the pattern object x uses the unconverted value

This should retain the existing semantics for all existing switch statements, but still enable pattern-matching to be used in all its glory even for types that have such a conversion.

There remains a separate question about whether or not we want an expression-based match - I think we do - but that question really should be kept separate.

@gafter
Copy link
Member Author

gafter commented Sep 2, 2015

I am pretty satisfied that the above works as a solution. @MadsTorgersen suggested the alternate solution that user-defined implicit conversions be allowed on the value in all pattern matching contexts as part of the matching operation. I'm concerned about how that may interact with user-defined pattern-matching operators, but it is worth exploring. We have at least one or two solutions.

Here's another problem. We want a pattern-matching-based switch statement to give an error when some case arm is subsumed by the totality of the previous case arms given the actual value. Unfortunately that turns a current warning into an error:

        switch (1)
        {
            case 1:
                break;
            case 2: // error: impossible to match
                break;
        }

worse, it turns some code currently accepted with no diagnostics into an error:

        switch (1)
        {
            case 1:
            case 2: // error: impossible to match
                break;
        }

Do we give up on this checking, take a breaking change, or do something else?

@orthoxerox
Copy link
Contributor

I guess this check could be downgraded to a warning. This way legacy code will compile for everyone except those who treat warnings as errors, and this kind of people will probably gladly fix their code.

@dsaf
Copy link

dsaf commented Sep 2, 2015

ReSharper generates warnings in such situations:

Warnings
In addition to compiler errors and warnings, ReSharper displays its own warnings that don't prevent your code from compiling but may nevertheless represent serious coding inefficiencies. For example, ReSharper informs you about redundant casts, incorrect format strings, declared but never used local variables, etc. Constructs that have associated warnings are emphasized with either grayed text or a blue curly underline.

Incorrect format string will cause more problems than redundant or ignored case statements.

@dsaf
Copy link

dsaf commented Sep 2, 2015

Another route is taking a breaking change and shipping it with opt-in nullability as a C#7 - Super Strong Edition that I would personally gladly use :).

@gafter
Copy link
Member Author

gafter commented Sep 2, 2015

Even a new warning is a breaking change for enough customers (who use warnings-as-errors) that we must treat it as a breaking change.

@orthoxerox
Copy link
Contributor

Is it possible to link this check to the target .net version or some option that will be set for new projects by default? So, for example, if I continue to compile my program against 4.6 or lower, I won't get this error, but if I upgrade to 4.7 or 5.0 or whatever the new framework version is, I will?

Or we can just all agree that a separate match statement is a better option.

@KrisVandermotten
Copy link
Contributor

There seems to be the assumption that a code construct must be either an expression, either a statement, which is false. For example, x = 1 is an expression (with a side effect), that can be used as a statement.

Similarly, lambda's allow for expression bodies, as well as statement bodies.

So, if a match expression is being considered, can it be made in such a way that it can also be used as a statement? Or is there some fundamental reason that this would be impossible?

@HaloFour
Copy link

HaloFour commented Sep 3, 2015

@KrisVandermotten C# inherits that from C/C++ where not all code constructs are expressions, such as control statements like switch.

I'd agree that a match expression would be very useful for the pattern matching proposal, both because it avoids the issues of trying to expand on switch as well as because it avoids the legacy requirements of switch such as scoping and fall-through. Whereas switch probably couldn't be converted into an expression-producing form it would be trivial to keep that in consideration for a new match construct by simply requiring that none of the cases "return" a value. That would probably eliminate the usefulness of switch supporting pattern matching, which if that would introduce other problems or breaking changes probably isn't a bad thing.

@gafter
Copy link
Member Author

gafter commented Sep 3, 2015

@orthoxerox No, we do not use the platform target to change the language we're accepting. We have a language switch for that. We're not going to publish a language specification that says that different versions of "the platform" have different required diagnostics.

@KrisVandermotten Of course we could make the new switch one of the expression forms allowed in an expression-statement, but I suspect you would not find it as useful as you think. Do you find yourself wishing the ? :, ??, &&, or || expression forms were allowed as statements? I suspect not.

@HaloFour I do not believe that the existence of an expression form of switch would reduce the value of a statement form at all.

@HaloFour
Copy link

HaloFour commented Sep 3, 2015

@gafter I completely agree. What I meant was that it's pretty unlikely to see switch further extended to support an expression form, but if a new construct such as match were introduced that could support being used as an expression that there isn't much that would be needed to have it also support a statement form. Given that a new construct would not be burdened with the legacy behavior of switch or any of the ambiguity or breaking-changes concerns brought up in this thread that it might be worth not enhancing switch and moving to that new construct instead.

To note, I'm not advocating abandoning the changes to switch. I like the idea of switch supporting pattern matching as I think the syntax is pretty attractive and should be intuitive for existing C# developers. It gives me pause to think that there may be concerns with existing code by doing so, although it sounds like you have them largely worked out. But even if switch gets these enhancements there is still likely value in a statement form that isn't burdened by the legacy syntax of switch.

@dsaf
Copy link

dsaf commented Sep 3, 2015

What use will be switch if it doesn't change at all by the way? Auto-generated high-performance C# code? http://stackoverflow.com/questions/11617091/in-a-switch-vs-dictionary-for-a-value-of-func-which-is-faster-and-why

@gafter
Copy link
Member Author

gafter commented Sep 3, 2015

@HaloFour I don't think you would find a new match expression very useful as a statement-expression. You would not be able to throw, return, try-finally, or do any other statementy things in it. That's why I think we need to both extend the switch statement and add an expression form. The topic of this issue is how to do the former in the most useful way with the least disruption.

@dsaf Are you suggesting the (existing) switch statement isn't useful? 😉

@dsaf
Copy link

dsaf commented Sep 3, 2015

@gafter No, I am thinking that existing switch statement will become redundant if there will be a new match construct supporting a statement form. The statement form of whatever-it-is-called is definitely useful.

@gafter
Copy link
Member Author

gafter commented Sep 3, 2015

@dsaf I'm hoping we can extend the existing switch statement to support pattern-matching instead of introducing a new statement form for that.

@HaloFour
Copy link

HaloFour commented Sep 3, 2015

@gafter Well I assume that depends on the exact syntax. Granted the match ideas tossed around would at least make returning from the enclosing method difficult/impossible, but I don't see why you couldn't throw or use try/finally?

string result = match (exp) {
    case Foo(x) => "foo",
    case Bar(x) => "bar",
    case Baz(x) => {
        SomeClass c = new SomeClass();
        try {
            c.SomeMethod();
            return "baz";
        }
        finally {
            c.Close();
        }
    }
    default => throw InvalidOperationException()
};

This probably belongs in another proposal, though.

If an expression match construct is added it might be confusing that it couldn't be used in a statement form:

match (exp) {
    case Foo(x) => Console.WriteLine("foo"),
    case Bar(x) => Console.WriteLine("bar")
};

@dsaf
Copy link

dsaf commented Sep 3, 2015

@gafter Understood.

Do we give up on this checking, take a breaking change, or do something else?

What if another level of inspection severity is added? It could be intended either to address this sort of situations specifically or for general use. Additionally there could be several pre-configured IDE/compiler profiles e.g. "Max Compatibility" and "Max Strictness". The latter would treat hints as errors and allow breaking changes in future.

(ReSharper as inspiration again)
image

@gafter
Copy link
Member Author

gafter commented Sep 3, 2015

@HaloFour

If an expression match construct is added it might be confusing that it couldn't be used in a statement form:

Yes, and it also might be confusing that the branches are restricted to expressions, and the match is required to be complete (thus your example is probably an error).

@dsaf

What if another level of inspection severity is added?

You can configure individual warnings something like this in VS2015. But you can't turn off errors that are required by the language specification because - well, they are required in order for the compiler to be implementing the C# programming language.

@MgSam
Copy link

MgSam commented Sep 4, 2015

Although extending switch is attractive from a compatibility perspective, I agree with the sentiment others have expressed in this thread that a statement and expression form of match is a better option. Not being burdened with the having to write break, and being able to have the compiler enforce completeness checks would both be big wins for a new match statement. You'd also get rid of the additional problem switch has for strings, which is that it secretly allocates a Dictionary, completely invisible to the user.

Given the diagnostics and code fixes Roslyn now allows I also think you could make it pretty easy for people to convert their switch statements to match in a new version of the language.

switch is an ugly hold-over from C, let's leave it in the past and design a feature for a modern language.

@HaloFour
Copy link

HaloFour commented Sep 4, 2015

@gafter Yes, I realize it would be an error. I guess I'd need a default or wildcard case in there, or at least Baz if that would represent the complete discriminated union.

In an expression form each branch would be required to return the same type anyway, right? So would it be that confusing if that type could be void and in those cases match would return no value?

@gafter
Copy link
Member Author

gafter commented Sep 4, 2015

@MgSam If we added a statement form of match I believe we would make it as similar as possible to switch to make it easy for users to use both statements or to switch between the two without gratuitous differences. Those differences would grate more than the annoyance of having to break from your case blocks.

@HaloFour In a match expression the compiler would be required to compute the common type, the same way it does for a lambda expression with multiple return statements and also for the :? expression.

@dsaf
Copy link

dsaf commented Sep 4, 2015

@gafter

...error when some case arm is subsumed by the totality of the previous case arms given the actual value... Do we give up on this checking, take a breaking change, or do something else?

Let's imagine you go for the breaking change, wouldn't it mean that the following should start giving an error as well - for parity?

if (true)
{
}
else //C#6 - OK, C#7 - error, impossible to reach given the actual value.
{
}

@MgSam
Copy link

MgSam commented Sep 4, 2015

@MgSam If we added a statement form of match I believe we would make it as similar as possible to switch to make it easy for users to use both statements or to switch between the two without gratuitous differences. Those differences would grate more than the annoyance of having to break from your case blocks.

@gafter I strongly disagree. As a smart man recently said in a comment on a different thread about the same topic:

Yes, when you start seeing a new syntax you have to take time to learn it instead of forging ahead without understanding. Hopefully you would not have to learn a new syntax every time you see it for the rest of your life.

This comment applies equally well to a new statement form of match without the need for break.

It would be a very poor choice to constrain new features just so that it slightly reduces the learning curve. If you always were limited by such constraints, lambda expressions would require parameters to be explicitly typed so that they look more like methods/anonymous delegates. I think we can all agree that we're glad that didn't happen.

And again, you have the ability to have much better tooling with code assistance now. If someone accidentally writes break in a match statement that doesn't require it, you can give it a squiggle and offer a quick fix.

@HaloFour
Copy link

HaloFour commented Sep 4, 2015

I agree, a statement form of match should not inherit the legacy baggage from switch, and that includes the scoping rules and fall-through (with the requirement of break.) That's why I like the lambda-like syntax, it sets the expectation differently but in a way that C# developers should be used to by now.

As mentioned before perhaps this conversation belongs in a different thread/proposal.

@gafter
Copy link
Member Author

gafter commented Sep 4, 2015

@dsaf No, the specification for the if statement has special dispensation allowing "unreachable" code without a diagnostic.

@MgSam The differences you suggest for a match statement are gratuitous, and have nothing to do with the expressiveness added by the statement. The problem is not just the learning curve - the two different syntax forms serve nearly identical purposes (dispatch on input data) and (you suggest) would have very different syntax, and would appear side-by-side in code bases, but refactoring from one to the other or vice versa would be nontrivial. It would be an ongoing pain point.

@dsaf
Copy link

dsaf commented Sep 4, 2015

@gafter

...given the actual value...

What is the real-world use case for such code? I can only imagine two cases: exotic infinite loop using goto and testing some code (where such compilation error would be annoying).

@axel-habermaier
Copy link
Contributor

@gafter: I'm not sure I get your point. What is the problem with automatically refactoring between the old switch and the proposed match? If match doesn't allow break, not all switch statements can be converted, of course, but I'd argue that over time most code would move to using match anyway. I really don't see why that would be a compelling reason to burden a new match with all of the problems of switch. In particular, your argument would also hold for expression-bodied members, yet they were added to the language and they have quickly become a feature that I really like and use often.

@gafter
Copy link
Member Author

gafter commented Sep 4, 2015

@dsaf

What is the real-world use case for such code?

What code? My words you quoted are intended to require an error when you try to match an expression like "foo" to a pattern like int x or 3 (it can never match), or when you match an expression of type string and have cases null and string s (thus "consuming" all possible inputs); any following pattern should be an error.

@MgSam
Copy link

MgSam commented Sep 4, 2015

@gafter I would say not having to write lots of extra verbiage certainly increases expressiveness. Also, why would it be an on-going pain point? A new keyword would indicate that there is a new concept people have to learn. Again, as you said in your own words; once you learn the new syntax it is no longer a pain point because presumably you don't have to keep re-learning it over and over again.

I think the situation is highly analogous to the introduction of lambdas. I'm certain they co-existed with anonymous delegates in the same codebases and probably confused people at first and caused bugs when they were misused. Did that mean that the feature was a mistake to add?

In fact, I rarely ever see switch in real-word C# code or use it myself, precisely because it is so unnecessarily verbose. It takes up a lot of screen real estate and adds little expressiveness over an equivalent set of if statements, IMHO. The match syntax HaloFour gave an example of would be more more pleasant to both write and read, and therefore would be much more likely to be used.

@gafter
Copy link
Member Author

gafter commented Sep 4, 2015

@axel-habermaier Adding support for pattern matching hasn't yet proven any need for a new statement form. The corner cases that we've run into integrating them into switch don't seem too troublesome. Adding a new statement form is, so far, a largely orthogonal feature request. You're welcome to discuss it, of course, but I created this issue to discuss integrating pattern matching with switch, and I intend to continue to use this issue/thread that way. If the issues integrating them seem too hard for you to contribute realistic solutions, you're welcome to make your contributions in any other way that works for you. Either way, I'm continuing to work on switch until and unless we reach a point where that proves unworkable.

@gafter
Copy link
Member Author

gafter commented Sep 4, 2015

@MgSam

You'd also get rid of the additional problem switch has for strings, which is that it secretly allocates a Dictionary, completely invisible to the user.

No, starting in C# 6 (VS2015) it does not do that any longer.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

9 participants