Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Proposal: Positional deconstruction based on existing constructors and properties #8415

Closed
MadsTorgersen opened this issue Feb 5, 2016 · 23 comments
Assignees
Labels
Area-Language Design Feature Request Language-C# New Language Feature - Pattern Matching Pattern Matching Resolution-Won't Fix A real bug, but Triage feels that the issue is not impactful enough to spend time on.
Milestone

Comments

@MadsTorgersen
Copy link
Contributor

Positional deconstruction based on existing constructors and properties

Proposals for pattern matching such as this (previously #206) allow a positional form of recursive pattern Type(p1,...,pn), as in

if (o is Person("Alan", var last)) { WriteLine(last); }

So far we have assumed that some special declaration around Person is needed to allow this: either Person is a special kind of type (a "record"), or it has some form of "extractor" declared on it, e.g. a method of a certain shape, or a special new kind of is operator declaration.

This comes from the need to get at the values of the object by order instead of by names of its properties. An extractor-based approach would interpret the positional pattern match somewhat like this:

if (o is Person p)
{
  var values = p.#extractor#;
  if (values.Item1 = "Alan")
  {
    var last = values.Item2;
    WriteLine(last);
  }
}

The need for a special declaration means that no existing types would benefit from this out of the gate. This seems a shame, in that most existing "record-like" types do specify an order - in the form of their constructor parameters:

public class Person
{
  public Person(string firstName, string lastName) 
  { 
    FirstName = firstName; 
    LastName = lastName; 
  }
  public string FirstName { get; }
  public string LastName { get; }
}

Ignoring the different casing imposed by naming conventions, it should be pretty obvious that the property FirstName corresponds to the constructor parameter firstName and LastName to lastName. If we assume that the order of constructor parameters is what the type author thought of as "the natural order" of the type's constituent values, it would be fair to expect a positional match to just use the same order (FirstName, LastName). The code

if (o is Person("Alan", var last)) { WriteLine(last); }

would then have the much simpler translation

if (o is Person p && p.FirstName == "Alan") { WriteLine(p.LastName); }

Not only simpler, this is also how any human would have written the code if they didn't have positional matching available.

Proposal

The common case for this would be really simple, but here's a set of more general rules, in case there are multiple constructors. This is just a first whack at the rules; I'd love to hear feedback on the general idea even more so than the details:

1. Identify the set of "record constructors"

A constructor is a "record constructor" if each of its parameters has a corresponding property with the same type and a name that is "similar enough". The precise meaning of that is up for discussion; here are some options:

  1. exact match only. This would exclude the above example, and any that use the common naming conventions.
  2. exact match modulo casing of first letter (FirstName would match firstName)
  3. exact match modulo casing (FirstName would match firstname)
  4. exact match modulo casing and interspersed _ (FirstName would match first_name)

2. Identify the set of "eligible constructors"

A record constructor is "eligible" for a positional match if it is accessible and the subpattern in each position is applicable to the type of the corresponding constructor parameter (by the same rules that would lead us to flag a match as illegal at compile time because it could never succeed)

3. Identify the best constructor

Use the rules of betterness in overload resolution to decide which of the eligible constructors is the best. If none, error.

4. Encode the match

Encode the pattern matching logic such that a subpattern in a given position is matched against the property corresponding to the parameter in the equivalent position of the selected constructor.

Issues

  • This proposal would use a more lax rule for name matching than elsewhere in the language, in order to compensate for naming conventions that have hitherto not affected the language itself.
@gunndabad
Copy link

Personally I'm not a fan of the fuzzy name matching and would prefer an explicit method like unapply() in Scala

@asik
Copy link

asik commented Feb 5, 2016

I am not sure I like positional pattern matching for records. For tuples, yes.

Consider the Person example above; if the order of properties change, the pattern-matching code still compiles but doesn't do the right thing anymore. I would expect using records instead of tuples to shield me from this kind of accidental mistake.

F# names record properties in its pattern matching syntax:

 match record with | { First="Alan" Last=last } -> ...

@HaloFour
Copy link

HaloFour commented Feb 5, 2016

@asik It would be based on the order of constructor arguments, not properties. If the constructor arguments were to be reordered you'd have lots of other problems as well. I do agree that trying to fuzzy match property names to argument names sounds like it could be an issue. I figure that this is to avoid having to write a custom is operator on "record-like" classes.

You would still have explicit property patterns:

switch (record) {
    case Person { First is "Alan", Last is var last }: ...
}

Perhaps they could be shorted to omit the type name when the type is known at compile time?

@stepanbenes
Copy link

This proposed language feature in fact relies on the brittle assumption, that the author of the class Person was sane and polite and assigned in the constructor the proper parameters to corresponding properties.

@alexpolozov
Copy link

What about the following:

A constructor is a "record constructor" if it definitely assigns (directly or indirectly) each of its parameters to exactly 1 property of the same type. In other words, at all non-exceptional exit points from the constructor each of the constructor parameters has been definitely assigned to exactly 1 matching property.

Names/casing shouldn't matter - we will never design a fuzzy matching scheme that will satisfy everybody. For example, what about C++-inspired m_first naming?

@HaloFour
Copy link

HaloFour commented Feb 5, 2016

@apskim The compiler wouldn't be able to verify that with existing types in other assemblies, short of analyzing the raw IL of the constructors and all of the properties to try to match field assignments.

@stepanbenes
Copy link

I agree with @apskim The compiler could perform some sort of flow analysis of constructor, find which parameter is assigned to which property and write this info into special attribute added to the constructor. This attribute could be then used in pattern matching to pair the arguments with properties. This could solve the naming issue. This of course requires recompilation of existing assemblies.

@asik
Copy link

asik commented Feb 5, 2016

@HaloFour Sorry, I misread the proposal. I assumed this was based on new record declaration syntax that didn't include an explicit constructor, something like class Person(string First, string Last); as proposed here #206 . If that idea has been temporarily shelved it's still something to think about, in case it ever happens in a future version.

@HaloFour
Copy link

HaloFour commented Feb 5, 2016

@asik I think that this is specific to classes that aren't "records" but are really close to them. I assume that "record" classes defined with primary constructors would define a proper is operator making this kind of fuzzy matching unnecessary. But for existing types like Point it might be nice to have the same constructor pattern syntax since it is so much less verbose than the property matching syntax.

@svick
Copy link
Contributor

svick commented Feb 5, 2016

Some questions about the details:

  1. What culture would be used for case-insensitive name matching?

    I don't think the compiler should depend on the culture of the system it's running on, but would than mean non-English identifiers are now "less supported" by the compiler?

    A similar issue was considered in Encapsulate Field refactoring can result in incorrect case conversions depending on the user's locale #5524, where I believe the answer was "always use en-US".

  2. Does the type have to match exactly? I'm not sure if this is common in record-like types, but I can imagine e.g. the constructor using int? while the property is int. Or constructor using IEnumerable<T> and property List<T>.

@CyrusNajmabadi
Copy link
Member

I would propose we'd use the same comparison operator VB uses for names. Which you can see here:
https://github.com/dotnet/roslyn/blob/master/src/Compilers/Core/Portable/CaseInsensitiveComparison.cs

I personally like this proposal though not in isolation. Specifically, i think this is a sensible proposal that will get pattern matching working for a huge swath of existing code out there. On top of this i'd like some system to be made available for the types that this pattern doesn't work on. If we have both, then i think that's going to make pattern matching very viable with very low friction.

@alrz
Copy link
Member

alrz commented Feb 6, 2016

I believe this might be ok for VB, but while the result is something that would worth it, it doesn't really fit into C# strictness and case-sensitivity.

I would suggest support "extension is operators" (#6136) and move the fuzzy name matching to an analyzer that could generate is operators for existing types based on those rules.

Also, for record-like types like Expression and others, with this you still depend on the type checking, while there is an ExpressionType that could be used. With extension is operators that wouldn't be an issue.

@MgSam
Copy link

MgSam commented Feb 6, 2016

I agree with @asik. I don't like positional matching, it is brittle. We shouldn't be adding new ways to write brittle code into the language.

Named parameters were a god-send for C# - they let you write self-documenting code that you know would always work correctly regardless if someone reordered parameters in a method. This feels like a step backwards from that.

@BreyerW
Copy link

BreyerW commented Feb 6, 2016

I like idea to make existing but non-record assemblies compatible with pattern match but definitely dont like current proposed implementation which is fragile to rename and assume sane programmer or correct pattern in class definition. I like short code but as soon as short code mean have error prone code then to me this is no-no (to me being fragile is acceptable only if feature bring significant value or performance boost but there is just syntax sugar).

EDIT "removed" later comment since to me seems that proposed pattern is already on table with if anyway

To avoid this maybe use named parameter like this if (o is Person(FirstName: "Alan",LastName: var last)) { WriteLine(last); } //i think var isnt needed there since we directly point to property which have explicit definition anyway, so optionally this could look like this: ``` if (o is Person(FirstName: "Alan",LastName: last)) { WriteLine(last); } //the only problem might be that could confuse in some situation, like if (o is Person(FirstName: "Alan",LastName: last)) { WriteLine(last); } //here 'last' is new variable var last="Wake"; if (o is Person(FirstName: "Alan",LastName: last)) { WriteLine(last); }//here 'last' is variable declared ealier or new variable? ``` Looks longer but no more problem with renaming since refactor will take care for us to rename all fields/props and everything is explicit. Additionally, could work with any class that have public props/fields but completely dont look like record class; optional syntax inline with pattern match is using `{}` obviously, like ``` if (o is Person{FirstName is "Alan",LastName is var last}) { WriteLine(last); } ``` but as far as i understand this exist already but only in switch/match context?

@alrz
Copy link
Member

alrz commented Feb 6, 2016

@BreyerW I think the point is to be able to use positional deconstruction, property patterns already would work with any type.

@paulomorgado
Copy link

I like the idea of trying to make pattern matching work with "legacy" types.

The constructor parameter to Property matching proposal seems reasonable, however they aren't base on any existing C# specs. Just conventions not followed by every library's developer.

I guess this fuzzy matching is outweighed by the usefulness of the feature.

@orthoxerox
Copy link
Contributor

I don't think I can like this. Enabling pattern matching on existing types could be done using extension is operators #5165, this would be a more robust solution than reversing constructor logic. Of course, reversing constructor logic could be extracted into a lightbulb action that generated a matching is operator.

@chrisaut
Copy link

chrisaut commented Feb 8, 2016

I don't like this. I'd rather have things be explicit than the compiler guessing based on some ordering.

In the example
"would then have the much simpler translation"

if (o is Person p && p.FirstName == "Alan") { WriteLine(p.LastName); }

What's wrong with writing this as is? It's much much clearer IMO.

I really feel like this doesn't fit for C#

@Richiban
Copy link

Richiban commented Feb 8, 2016

While I'd love the idea of existing types getting use in pattern matching, isn't this very dangerous, though? I'm referring to the idea of assuming that a property and a constructor argument with the same name actually refer to the same data. Can the compiler really ensure that the data passed in has not been transformed in any way, and actually has been assigned to the property? This could create huge developer surprise; consider the following object:

public class XCoordinate
{
    public XCoordinate(int value)
    {
        if (value < 0) Value = 0;
        else Value = value;
    }

    public Value { get; }
}

Let's ignore the fact that this class violates a few design pricipals; it's still valid C#. You still have the crazy situation where (new XCoordinate(-1)) is XCoordinate(-1) evaluates to false.

You can imagine many other things that might be done to an object before it's assigned (strings may be trimmed or have their case changed) or some might only be assigned given a guard condition is met. How could the compiler fix these cases? I'm sure their are examples in the core library that would suffer from this.

@paulomorgado
Copy link

@Richiban, just for the record, what design principals are being violated?

This proposal does not intend to mean that is XCoordinate(-1) should match an XCoordinate instance that was constructed with the value -1 for the value parameter but one that, according to the proposal, has the value -1 for the value of the Value property. And, depending on the logic of the constructor and the property, they might not be the same.

@Richiban
Copy link

Richiban commented Feb 8, 2016

Principally (a ha.) the principal of least surprise. I wouldn't like to work in a domain where (new Foo { X = "hello world" }).X == "Goodbye, cruel world".

I think there's a real issue with giving ordinary classes a record-like syntax without enforcing record-like behaviour (i.e. you get out what you put in). Can't we have if (o is Person p { FirstName = "Alan" }) { WriteLine(p.LastName); } for normal property access instead? You've really not lost much and you gain a lot in language consistency.

@HaloFour
Copy link

HaloFour commented Feb 8, 2016

@Richiban Agreed. If the property pattern syntax was a little less verbose and ... wonky, I don't think there'd be much issue using that instead of the record pattern or is operator syntax. I had tossed out the following comment with a very similar syntax to yours but I don't think it got much traction:

#206 (comment)

@Richiban
Copy link

Richiban commented Feb 8, 2016

@HaloFour Eugh, yeah; "is var" is certainly wonky.

One of the things that's both pleasing and helpful about F#'s pattern matching is the syntactical symmetry you get when constructing and destructuring. If we could do the same in C# then we'd get the easy-to-understand let Person { Name = name } = Person { Name = "Alan" } (pseudo C#).

It's a shame your comment on #206 seemed to go by unnoticed.

@gafter gafter modified the milestone: 2.0 (Preview) Feb 18, 2016
gafter added a commit to gafter/roslyn that referenced this issue Feb 23, 2016
reverse-engineering the relationship between positions and properties
by examining the names of constructor parameters.
See also dotnet#8415
@gafter gafter modified the milestones: 2.0 (RTM), 2.0 (Preview) Feb 25, 2016
@gafter gafter added the Resolution-Won't Fix A real bug, but Triage feels that the issue is not impactful enough to spend time on. label May 7, 2016
@gafter gafter closed this as completed May 7, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Area-Language Design Feature Request Language-C# New Language Feature - Pattern Matching Pattern Matching Resolution-Won't Fix A real bug, but Triage feels that the issue is not impactful enough to spend time on.
Projects
None yet
Development

No branches or pull requests