Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

C# 7.x: Patterns #61

Closed
wants to merge 13 commits into from
153 changes: 153 additions & 0 deletions standard/expressions.md
Original file line number Diff line number Diff line change
Expand Up @@ -37,6 +37,155 @@ Most of the constructs that involve an expression ultimately require the express
- The value of a property access expression is obtained by invoking the *get_accessor* of the property. If the property has no *get_accessor*, a compile-time error occurs. Otherwise, a function member invocation ([§11.6.6](expressions.md#1166-function-member-invocation)) is performed, and the result of the invocation becomes the value of the property access expression.
- The value of an indexer access expression is obtained by invoking the *get_accessor* of the indexer. If the indexer has no *get_accessor*, a compile-time error occurs. Otherwise, a function member invocation ([§11.6.6](expressions.md#1166-function-member-invocation)) is performed with the argument list associated with the indexer access expression, and the result of the invocation becomes the value of the indexer access expression.

### §patterns-new-clause Patterns and pattern matching
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently, this subclause (and its subordinate subclauses) is under §11.2 Expression classifications. Perhaps it belongs elsewhere, maybe as a new 11.x topic, somewhere before §11.7 Primary expressions.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This section doesn't belong under 11. Expressions at all. It should be a new heading at the same level as expressions. It should probably be moved to a new chapter after expressions.


#### §patterns-new-clause-general General

A ***pattern*** is a syntactic form that can be used with the `is` operator ([§11.11.11](expressions.md#111111-the-is-operator)) and in a *switch_statement* ([§12.8.3](statements.md#1283-the-switch-statement)) to express the shape of data against which incoming data is to be compared. A pattern is tested in the context of a switch expression or a *relational_expression* that is on the left-hand side of an `is` operator. Let us call this a ***pattern input value***.
Copy link
Member

@gafter gafter May 11, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Regarding "A pattern is tested in the context of a switch expression"... this is correct for this version of the specification, but as soon as we start specifying nested patterns it will have to be revised because the value being tested doesn't appear as an expression in the program.


A pattern may have one of the following forms:

```ANTLR
pattern
: declaration_pattern
| constant_pattern
| var_pattern
;
```

A *declaration_pattern* and a *var_pattern* can result in the declaration of a local variable. The scope of such a variable is as follows:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should probably be lifted out to where scopes are generally described, and shared with the out var and deconstruction specs.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe we've agreed that such a thing should happen. Therefore I'm not reviewing this part very deeply.


- If the pattern is a case label ([§12.8.3](statements.md#1283-the-switch-statement)), then the scope of the variable is the associated *case block*.
- Otherwise, the variable is part of the pattern that is the right-hand operand of the `is` operator ([§11.11.11](expressions.md#111111-the-is-operator)), and the variable’s scope is based on the construct immediately enclosing the `is` expression containing the pattern, as follows:
- If the expression is in an expression-bodied lambda, the variable's scope is the body of the lambda.
- If the expression is in an expression-bodied method or property, the variable's scope is the body of the method or property.
- If the expression is in a `when` clause of a `catch` clause, the variable's scope is that `catch` clause.
- If the expression is in an *iteration_statement*, the variable's scope is just that statement.
- If the expression is in a *constructor_initializer*, the variable's scope is the part of that *constructor_initializer* following the expression, and the body of the constructor.
- If the expression is in a *variable_initializer* of a field, the variable's scope is the part of that *variable_initializer* following the expression.
- If the expression is in a *query_expression* that is translated into the body of a lambda, the variable's scope is that *query_expression*.
- Otherwise if the expression is in some other statement form, the variable's scope is the scope containing the statement.

For the purpose of determining the scope, an *embedded_statement* is considered to be in its own scope.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see the following reflected in the specification:

In C# 7.3 we added the following contexts in which a pattern variable may be declared:

  • If the expression is in a constructor initializer, its scope is the constructor initializer and the constructor's body.
  • If the expression is in a field initializer, its scope is the equals_value_clause in which it appears.
  • If the expression is in a query clause that is specified to be translated into the body of a lambda, its scope is just that expression.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@gafter Lines 64-66 cover those 7.3 additions.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All three are incorrect. I believe what I wrote in the comment above is correct. But there is another PR for that stuff.

Neither this (nor the other PR) handles variables declared in a case_guard.

#### §declaration-pattern-new-clause Declaration pattern

A *declaration_pattern* is used to test that a value has a given type and if the test succeeds, to cast that value to that type.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should introduce the terminology of a pattern possibly matching a value, and use it consistently.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe matching is trying, and it succeeds or fails?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
A *declaration_pattern* is used to test that a value has a given type and if the test succeeds, to cast that value to that type.
A *declaration_pattern* is used to test that a *pattern input value* ([§??.??.??](#patterns-new-clause-general)) has a given type. The match succeeds if the test succeeds, in which case at runtime the value is cast to that type and assigned to the declared variable.


```ANTLR
declaration_pattern
: type simple_designation
;
simple_designation
: single_variable_designation
;
single_variable_designation
: identifier
;
```

The runtime type of the value is tested against the *type* in the pattern. If it is of that runtime type (or some subtype), the result of the `is` operator is `true`. A pattern input value with value `null` never tests true for this pattern.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rather than saying "the result of the is operator is true" ... because the pattern might not be part of an is operator ... we should say that the pattern matches. And for the is-pattern operator, we should say that it returns true if and only if the pattern matches on the input.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or maybe suceeds

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@gafter – Go with your first thought. I think “pattern matches” is the more widely used phrasing and we should stick with it. Though you might also see “if the pattern matching succeeds/is successful”… ;-)


Given a pattern context expression (§patterns-new-clause) *e*, if the *simple_designation* is the *identifier* `_`, it denotes a discard (§discards-new-clause) the value of *e* is not bound to anything. (Although a declared variable with the name `_` may be in scope at that point, that named variable is not seen in this context.) If *simple_designation* is any other identifier, a local variable ([§9.2.8](variables.md#928-local-variables)) of the given type named by the given identifier is introduced, and that local variable is definitely assigned ([§9.4](variables.md#94-definite-assignment)) with the value of the pattern context expression when the result of the pattern-matching operation is true.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The reference to discards-new-clause here is causing one of the two warnings.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The phrase "pattern context expression" is no longer defined.

The problem is that there isn't a pattern context "expression" when there are nested patterns. There is a value, but not an expression. That was why we introduced the term "pattern input value". It isn't an expression. So I don't know what expression "e" corresponds to in this paragraph. It needs to be reworded to address that.


Certain combinations of static type of the pattern input value and the given type are considered incompatible and result in a compile-time error. A value of static type `E` is said to be ***pattern compatible*** with the type `T` if there exists an identity conversion, an implicit reference conversion, a boxing conversion, an explicit reference conversion, or an unboxing conversion from `E` to `T`, or if either `E` or `T` is an open type ([§8.4.3](types.md#843-open-and-closed-types)). It is a compile-time error if a value of type `E` is not pattern compatible with the type in a type pattern with which it is matched.

> *Note*: The support for open types can be most useful when checking types that may be either struct or class types, and boxing is to be avoided. *end note*
<!-- markdownlint-disable MD028 -->

<!-- markdownlint-enable MD028 -->
> *Example*: The declaration pattern is useful for performing run-time type tests of reference types, and replaces the idiom
>
> ```csharp
> var v = expr as Type;
> if (v != null) { /* code using v */ }
> ```
>
> with the slightly more concise
>
> ```csharp
> if (expr is Type v) { /* code using v */ }
> ```
>
> *end example*

It is an error if *type* is a nullable value type.

> *Example*: The declaration pattern can be used to test values of nullable types: a value of type `Nullable<T>` (or a boxed `T`) matches a type pattern `T2 id` if the value is non-null and the type of `T2` is `T`, or some base type or interface of `T`. For example, in the code fragment
>
> ```csharp
> int? x = 3;
> if (x is int v) { /* code using v */ }
> ```
>
> The condition of the `if` statement is `true` at runtime and the variable `v` holds the value `3` of type `int` inside the block. *end example*

#### §constant-pattern-new-clause Constant pattern

A *constant_pattern* is used to test the value of a pattern input value (§patterns-new-clause) against the given constant value.

```ANTLR
constant_pattern
: constant_expression
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not quite right. In an is-pattern-expression, the constant pattern is restricted to being a relational expression. So you can write e is a && b and the pattern is a, not the expression a && b. This needs to be spelled out, but I'm not quite sure how or where.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

An example is a is b & c. The pattern is b.

;
```

Given a pattern input value *e* and a *constant_expression* *c*, if *e* and *c* have integral types, the pattern is considered matched if the result of the expression `e == c` is `true`. Otherwise, the pattern is considered matched if `object.Equals(e, c)` returns `true`. In this case it is a compile-time error if the static type of *e* is not pattern compatible (§declaration-pattern-new-clause) with the type of the constant.

> *Example*:
>
> ```csharp
> public static decimal GetGroupTicketPrice(int visitorCount)
> {
> switch (visitorCount) {
> case 1: return 12.0m;
> case 2: return 20.0m;
> case 3: return 27.0m;
> case 4: return 32.0m;
> case 0: return 0.0m;
> default: throw new ArgumentException(…);
> }
> }
> ```
>
> *end example*

#### §var-pattern-new-clause Var pattern

A match of any pattern input value (§patterns-new-clause) to a *var_pattern* always succeeds.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we decided to go with "matches" instead of "succeeds", in which case this should say that "A pattern input value (§patterns-new-clause) always matches a var_pattern.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also think we did, strange that :-)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or maybe this is better.


```ANTLR
var_pattern
: 'var' designation
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This formulation will cause a headache in the future, when recursive patterns are introduced. This should be

var_pattern
    : 'var' simple_designation
    ;

;
designation
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

designation should be introduced when needed for later (C# 8) pattern changes, rather than now, and should be used both in var_pattern and declaration_pattern in that future diff.

: simple_designation
;
```

Given a pattern input value *e*, if *designation* is the *identifier* `_`, it denotes a discard (§discards-new-clause), and the value of *e* is not bound to anything. (Although a declared variable with that name may be in scope at that point, that named variable is not seen in this context.) If *designation* is any other identifier, at runtime the value of *e* is bound to a newly introduced local variable ([§9.2.8](variables.md#928-local-variables)) of that name whose type is the static type of *e*, and that local variable is definitely assigned ([§9.4](variables.md#94-definite-assignment)) with the value of the pattern input value.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The link to discards-new-clause is causing the second warning.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is definitely assigned when true when the pattern matches. Not unconditionally definitely assigned as written here.

This really belongs in the definite assignment section of variables rather than here.


It is an error if the name `var` binds to a type.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is an error if the name var binds to a type where used in a var_pattern.


> *Example*:
>
> ```csharp
> static bool IsAcceptable(int id, int absLimit) =>
> SimulateDataFetch(id) is var results
> && results.Min() >= -absLimit
> && results.Max() <= absLimit;
> static int[] SimulateDataFetch(int id)
> {
> var rand = new Random();
> return Enumerable
> .Range(start: 0, count: 5)
> .Select(s => rand.Next(minValue: -10, maxValue: 11))
> .ToArray();
> }
> ```
>
> *end example*

## 11.3 Static and Dynamic Binding

### 11.3.1 General
Expand Down Expand Up @@ -3624,6 +3773,7 @@ relational_expression
| relational_expression '<=' shift_expression
| relational_expression '>=' shift_expression
| relational_expression 'is' type
| relational_expression 'is' pattern
| relational_expression 'as' type
;

Expand Down Expand Up @@ -3952,6 +4102,8 @@ where `x` is an expression of a nullable value type, if operator overload resolu

### 11.11.11 The is operator

When the form `is` *pattern* is used, it is a compile-time error if the corresponding *relational_expression* does not designate a value or does not have a type.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to say that the result is true iff the value matches the pattern.


The `is` operator is used to check if the run-time type of an object is compatible with a given type. The check is performed at runtime. The result of the operation `E is T`, where `E` is an expression and `T` is a type other than `dynamic`, is a Boolean value indicating whether `E` is non-null and can successfully be converted to type `T` by a reference conversion, a boxing conversion, an unboxing conversion, a wrapping conversion, or an unwrapping conversion.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We also permit it to match when there is no conversion at compile-time but either the type of the expression or the type is an open type, and the input object is of the given type or is of a derived type.


The operation is evaluated as follows:
Expand Down Expand Up @@ -5948,6 +6100,7 @@ Constant expressions are required in the contexts listed below and this is indic
- `goto case` statements ([§12.10.4](statements.md#12104-the-goto-statement))
- Dimension lengths in an array creation expression ([§11.7.15.5](expressions.md#117155-array-creation-expressions)) that includes an initializer.
- Attributes ([§21](attributes.md#21-attributes))
- In a *constant_pattern* (§constant-pattern-new-clause)

An implicit constant expression conversion ([§10.2.11](conversions.md#10211-implicit-constant-expression-conversions)) permits a constant expression of type `int` to be converted to `sbyte`, `byte`, `short`, `ushort`, `uint`, or `ulong`, provided the value of the constant expression is within the range of the destination type.

Expand Down
15 changes: 9 additions & 6 deletions standard/lexical-structure.md
Original file line number Diff line number Diff line change
Expand Up @@ -64,13 +64,14 @@ The productions for *simple_name* ([§11.7.4](expressions.md#1174-simple-names))
>
> *end example*

If a sequence of tokens can be parsed (in context) as a *simple_name* ([§11.7.4](expressions.md#1174-simple-names)), *member_access* ([§11.7.6](expressions.md#1176-member-access)), or *pointer_member_access* ([§22.6.3](unsafe-code.md#2263-pointer-member-access)) ending with a *type_argument_list* ([§8.4.2](types.md#842-type-arguments)), the token immediately following the closing `>` token is examined. If it is one of
If a sequence of tokens can be parsed (in context) as a *simple_name* ([§11.7.4](expressions.md#1174-simple-names)), *member_access* ([§11.7.6](expressions.md#1176-member-access)), or *pointer_member_access* ([§22.6.3](unsafe-code.md#2263-pointer-member-access)) ending with a *type_argument_list* ([§8.4.2](types.md#842-type-arguments)), the token immediately following the closing `>` token is examined, to see if it is
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is correct and the edit to this text in #44 is not correct.

Copy link
Member

@BillWagner BillWagner Oct 2, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From @RexJaeschke notes:

Note to TG2 members: for rationale on the following replacements, see "Changes to syntactic disambiguation" at https://github.com/dotnet/csharplang/blob/main/proposals/csharp-7.0/pattern-matching.md#proposed-change-to-the-disambiguation-rule.


```csharp
( ) ] : ; , . ? == !=
```
- One of `( ) ] } : ; , . ? == != | ^ && || & [`; or
- One of the relational operators `< > <= >= is as`; or
- A contextual query keyword appearing inside a query expression; or
- In certain contexts, we treat *identifier* as a disambiguating token. Those contexts are where the sequence of tokens being disambiguated is immediately preceded by one of the keywords `is`, `case` or `out`, or arises while parsing the first element of a tuple literal (in which case the tokens are preceded by `(` or `:` and the identifier is followed by a `,`) or a subsequent element of a tuple literal.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

“we” don’t treat identifier as anything, the Standard defines. I expect there will be more of this kind of thing due to the likely origin of the material for these PRs.


then the *type_argument_list* is retained as part of the *simple_name*, *member_access*, or *pointer_member_access* and any other possible parse of the sequence of tokens is discarded. Otherwise, the *type_argument_list* is not considered part of the *simple_name*, *member_access*, or *pointer_member_access*, even if there is no other possible parse of the sequence of tokens.
If the following token is among this list, or an identifier in such a context, then the *type_argument_list* is retained as part of the *simple_name*, *member_access* or *pointer_member-access* and any other possible parse of the sequence of tokens is discarded. Otherwise, the *type_argument_list* is not considered to be part of the *simple_name*, *member_access* or *pointer_member_access*, even if there is no other possible parse of the sequence of tokens. (These rules are not applied when parsing a *type_argument_list* in a *namespace_or_type_name* [§7.8](basic-concepts.md#78-namespace-and-type-names).)

> *Note*: These rules are not applied when parsing a *type_argument_list* in a *namespace_or_type_name* ([§7.8](basic-concepts.md#78-namespace-and-type-names)). *end note*
<!-- markdownlint-disable MD028 -->
Expand Down Expand Up @@ -101,10 +102,12 @@ then the *type_argument_list* is retained as part of the *simple_name*, *member_
> x = y is C<T> && z;
> ```
>
> the tokens `C<T>` are interpreted as a *namespace_or_type_name* with a *type_argument_list* due to being on the right-hand side of the `is` operator ([§11.11.1](expressions.md#11111-general)). Because `C<T>` parses as a *namespace_or_type_name*, not a *simple_name*, *member_access*, or *pointer_member_access*, the above rule does not apply, and it is considered to have a *type_argument_list* regardless of the token that follows.
> the tokens `C<T>` are interpreted as a *namespace_or_type_name* with a *type_argument_list* due to being on the right-hand side of the `is` operator ([§11.11.11](expressions.md#111111-the-is-operator)). Because `C<T>` parses as a *namespace_or_type_name*, not a *simple_name*, *member_access*, or *pointer_member_access*, the above rule does not apply, and it is considered to have a *type_argument_list* regardless of the token that follows.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is no longer correct. The right-hand-side of an is operator can now be either an expression (constant pattern) or a type, so it is indeed ambiguous. See HERE for a demonstration of this.

>
> *end example*
Copy link
Member

@gafter gafter May 11, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The following examples are worth adding, I think:

The expression (A < B, C > D) is a tuple with two elements, each a comparison.

The expression (A<B,C> D, E) is a tuple with two elements, the first of which is a declaration expression.

The invocation M(A < B, C > D, E) has three arguments.

The invocation M(out A<B,C> D, E) has two arguments, the first of which is an out declaration.

The expression e is A<B> C uses a declaration pattern.

The case label case A<B> C: uses a declaration pattern.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Spaces have no effect.


A *relational_expression* ([§11.11.1](expressions.md#11111-general)) can have the form "*relational_expression* `is` *type*" or "*relational_expression* `is` *constant_pattern*," either of which might be a valid parse of a qualified identifier. In this case, an attempt is made to bind to the type; however, if that fails, the first thing found (which must be either a constant or a type) is bound.

## 6.3 Lexical analysis

### 6.3.1 General
Expand Down
Loading