Revise expression parsing #75553

CyrusNajmabadi · 2024-10-17T21:21:13Z

Recent forways into fixing expression parsing issues have not been fun. The core expression precedence parser is written in a very convoluted and hard to follow way. In particular, it tends to use a lot of local state variables for different purposes, with lack of clarity about the state tehy represent. Portions of that operation also mutate the state of the parser (sometimes consumign only parts of hte input, with the later locations required to figure that out and do the right thing). There are often relationships between variables which are hard to track and reason about.

This is a lot of complexity for what is really just an iterative looping algorithm for precedence parsing. This PR keeps the iterative nature of parsing here (we do not want recursive decent for expressions as we absolutely will blow the stack), while greatly simplifying the individual steps that parsing needs to perform.

Comments inline as to the changes and reasons for them.

CyrusNajmabadi · 2024-10-17T21:22:10Z

src/Compilers/CSharp/Portable/Parser/LanguageParser.cs

@@ -10937,9 +10937,6 @@ private ExpressionSyntax ParseSubExpression(Precedence precedence)

        private ExpressionSyntax ParseSubExpressionCore(Precedence precedence)
        {
-            ExpressionSyntax leftOperand;
-            Precedence newPrecedence = 0;


an example of very confusing local state. inner blocks would set this, to use only within that block. But the value was never read afterwards. Removed entirely as having multiple precedence values to have to understand in a precedence parser makes things much harder to reason about :)

CyrusNajmabadi · 2024-10-17T21:22:51Z

src/Compilers/CSharp/Portable/Parser/LanguageParser.cs

-                leftOperand = _syntaxFactory.PrefixUnaryExpression(opKind, opToken, operand);
-            }
-            else if (tk == SyntaxKind.DotDotToken)
+            return ParseExpressionContinued(parseUnaryOrPrimaryExpression(precedence), precedence);


extracted out common patters to make things much clearer. The core algorithm is now simple to reason about. We first parse out a unary/primary expression. Then, we parse out what can continue that at the current precedence we're at.

CyrusNajmabadi · 2024-10-17T21:24:38Z

src/Compilers/CSharp/Portable/Parser/LanguageParser.cs

+                    return _syntaxFactory.PrefixUnaryExpression(
+                        opKind,
+                        this.EatToken(),
+                        this.ParseSubExpression(GetPrecedence(opKind)));


moved to a model where instead of assinging to a variable (and then having to see what happens to it), we have a local function which just returns the value in question. Making it much easier to see that this is just responsible for parsing out that construct and nothing else.

also, moved to the much clearer parsing pattern we try to use int eh rest of hte parser where the production A -> B C D is parsed as return A(ParseB(), ParseC(), ParseD()). this lowers the number of locals to keep track of and ensures not accidentally using them for an innapropriate purpose.

src/Compilers/CSharp/Portable/Parser/LanguageParser.cs

CyrusNajmabadi · 2024-10-17T21:28:03Z

src/Compilers/CSharp/Portable/Parser/LanguageParser.cs

                // Not a unary operator - get a primary expression.
-                leftOperand = this.ParseTerm(precedence);
+                return this.ParsePrimaryExpression(precedence);


base (primary) case now falls out at the bottom once unary cases

CyrusNajmabadi · 2024-10-17T21:29:01Z

src/Compilers/CSharp/Portable/Parser/LanguageParser.cs

+            if (this.CurrentToken.Kind == SyntaxKind.QuestionToken && precedence <= Precedence.Conditional)
+                return consumeConditionalExpression(expandedExpression);
+
+            return expandedExpression;


the entirety of hte ParseExpressionCOntineud method is now much simpler. You start with the initial unary/primary expression passed in, and your current precedence. It then proceeds to try to expand the expression as long as it can, finally bottoming out with ? : parsing if allowed. Much simpler to understand, esp. in comparison to our actual spec.

CyrusNajmabadi · 2024-10-17T21:29:27Z

src/Compilers/CSharp/Portable/Parser/LanguageParser.cs

                var tk = this.CurrentToken.ContextualKind;

-                bool isAssignmentOperator = false;


example of ambient state that was hard to keep track of. we can compute this on demand trivially in the one place later that needs this.

CyrusNajmabadi · 2024-10-17T21:29:59Z

src/Compilers/CSharp/Portable/Parser/LanguageParser.cs

+                // check for >>, >>=, >>> or >>>=
+                if (tk == SyntaxKind.GreaterThanToken
+                    && this.PeekToken(1) is { Kind: SyntaxKind.GreaterThanToken or SyntaxKind.GreaterThanEqualsToken } token1
+                    && NoTriviaBetween(this.CurrentToken, token1)) // check to see if they really are adjacent


instead of having > be handled by 'IsBinaryOperator' and then undoing its logic. we just check for the fun cases up front, and then only handle it as a real > if it doesn't match that.

CyrusNajmabadi · 2024-10-17T21:30:51Z

src/Compilers/CSharp/Portable/Parser/LanguageParser.cs

                }

                var newPrecedence = GetPrecedence(opKind);

-                // check for >>, >>=, >>> or >>>=
-                int tokensToCombine = 1;


also a difficult value to reason about, with complex semantics about hwo it gets set. no need for it at all. We have opKind. So we know precisely what tokens we have and if we have to merge them.

CyrusNajmabadi · 2024-10-17T21:31:26Z

src/Compilers/CSharp/Portable/Parser/LanguageParser.cs

@@ -11349,7 +11352,7 @@ private ExpressionSyntax ParseIsExpression(ExpressionSyntax leftOperand, SyntaxT
            };
        }

-        private ExpressionSyntax ParseTerm(Precedence precedence)
+        private ExpressionSyntax ParsePrimaryExpression(Precedence precedence)


clarifying name.

CyrusNajmabadi · 2024-10-17T21:42:02Z

src/Compilers/CSharp/Portable/Parser/LanguageParser.cs

-                    break;
+                    // Something that doesn't expand the current expression we're looking at.  Bail out and see if we
+                    // can end with a conditional expression.
+                    return false;
                }


note: we could consider extracing this into a tryGetOperatorKind helper, to get similar benefits on single-exit logic. not sure if it is needed thoughl

What I am finding as a reviewer is that the repeated "if-return-if-return-if-return"s are much easier to understand at a glance than the repeated "if-else-if-else-if-else-if". Both here and below, where the leftOperand is assigned.

This is in part because the mental compiler needs to locate where we are going to jump to after exiting a large if-else block, scroll to it, decide whether any additional work is being done, decide whether we are in a consistent state for doing that work, etc.

Also, I don't think there's anyway to "fix" this, but, I am finding that scrolling between corresponding places in the old and new versions, to find where logic has moved to, is quite challenging, to the point that I might suggest to the second reviewer to just skim thru the deleted lines, then review the new change practically as new code.

CyrusNajmabadi · 2024-10-17T21:42:51Z

src/Compilers/CSharp/Portable/Parser/LanguageParser.cs

+            //
+            // Note that postfix operators (like ++) are still primary expressions, even though their prefix equivalents (`++x`) are unary.
+
+            return parsePostFixExpression(parsePrimaryExpressionWithoutPostfix(precedence));


Inlined these methods since no one else shoudl call them. now it's clearer that parsing a primary is just prsing the initial part, and the the postfix portion.

CyrusNajmabadi · 2024-10-17T21:43:38Z

src/Compilers/CSharp/Portable/Parser/LanguageParser.cs


-                        return expr;
+            ExpressionSyntax parsePostFixExpression(ExpressionSyntax expr)


inlined into ParsePrimaryExpression.

CyrusNajmabadi · 2024-10-17T21:43:48Z

src/Compilers/CSharp/Portable/Parser/LanguageParser.cs

                        }
+                }
+            }



bad github diff. ParseBaseExpression, IsPossibleDeconstructionLeft and IsPossibleAnonymousMethodExpression are entirely unchanged. they just moved below the code that was now inlined into this method.

CyrusNajmabadi · 2024-10-17T21:45:11Z

@dotnet/roslyn-compiler This is ready for review. Changes that really helped make it easier to reason about expression parsing while i was fixing the incremental parsing bug with .. expressions in #75549 and #75532. I have extracted out so we can independently take the incremental parsing fix, while also getting the benefits of clearer and easier to maintain parsing.

RikkiGibson

Change LGTM but it would be nice to see this code pushed even farther toward making it easy to understand.

src/Compilers/CSharp/Portable/Parser/LanguageParser.cs

RikkiGibson · 2024-10-17T23:43:59Z

src/Compilers/CSharp/Portable/Parser/LanguageParser.cs

-                    break;
+                    // Something that doesn't expand the current expression we're looking at.  Bail out and see if we
+                    // can end with a conditional expression.
+                    return false;
                }


What I am finding as a reviewer is that the repeated "if-return-if-return-if-return"s are much easier to understand at a glance than the repeated "if-else-if-else-if-else-if". Both here and below, where the leftOperand is assigned.

This is in part because the mental compiler needs to locate where we are going to jump to after exiting a large if-else block, scroll to it, decide whether any additional work is being done, decide whether we are in a consistent state for doing that work, etc.

Also, I don't think there's anyway to "fix" this, but, I am finding that scrolling between corresponding places in the old and new versions, to find where logic has moved to, is quite challenging, to the point that I might suggest to the second reviewer to just skim thru the deleted lines, then review the new change practically as new code.

CyrusNajmabadi · 2024-10-18T17:18:08Z

Interested in finding out the result of this.

I was slightly incorrect. expr .. is picked up in the 'contuation part'. But we still need something to handle the base case of .. starting an expr.

Note: the logic is still a little incorrect. Specifically, we should only take the 'range' here if precedence allows it. That said, no one calls into ParseSubExpressionCore with a precedence tighter than 'Range', so it never matters in practice. I'm considering cleaning this up further in a future pass, but would like to leave this as is for now. I have commented to the code explaining things.

CyrusNajmabadi · 2024-10-18T18:13:59Z

src/Compilers/CSharp/Portable/Parser/LanguageParser.cs

-                {
-                    throw ExceptionUtilities.UnexpectedValue(tokensToCombine);
+                    return _syntaxFactory.BinaryExpression(
+                        operatorExpressionKind, leftOperand, operatorToken, this.ParseType(ParseTypeMode.AsExpression));


in a similar vein, these also just became simple returns, instead of assignments to track.

src/Compilers/CSharp/Portable/Parser/LanguageParser.cs

Co-authored-by: Rikki Gibson <rikkigibson@gmail.com>

…/roslyn into expressionParsing

src/Compilers/CSharp/Portable/Parser/LanguageParser.cs

cston · 2024-10-23T19:01:56Z

src/Compilers/CSharp/Portable/Parser/LanguageParser.cs

-                        isAssignmentOperator = true;
-                        tokensToCombine = 2;
-                    }
+                var (operatorTokenKind, operatorExpressionKind) = getOperatorTokenAndExpressionKind();


operatorTokenKind, operatorExpressionKind

Consider calling these opToken and opKind to reduce the differences.

i intentionally changed as the old names were more confising. opToken was not a token, but was a kind. opKind was not the kind of the op token itself, but of the op expression.

The new names are much clearer (both are kinds) and indicate which domain they're describing (the token domain or the expression domain).

cston · 2024-10-23T19:32:10Z

src/Compilers/CSharp/Portable/Parser/LanguageParser.cs

                }
                else
                {
-                    Debug.Assert(IsExpectedBinaryOperator(tk));
-                    leftOperand = _syntaxFactory.BinaryExpression(opKind, leftOperand, opToken, this.ParseSubExpression(newPrecedence));
+                    // Normal operator.  Eat as a single token (converting cases contextual words 'with' to a keyword)


converting cases contextual words

Perhaps "converting contextual keyword".

src/Compilers/CSharp/Portable/Parser/LanguageParser.cs

CyrusNajmabadi added 4 commits October 17, 2024 13:29

Simplify expression parsing

e57e967

Extract helper

e6eb643

simplify

4f5e6fc

simplify

b8c14af

dotnet-issue-labeler bot added Area-Compilers untriaged Issues and PRs which have not yet been triaged by a lead labels Oct 17, 2024

CyrusNajmabadi commented Oct 17, 2024

View reviewed changes

simplify

cc0efdb

CyrusNajmabadi commented Oct 17, 2024

View reviewed changes

src/Compilers/CSharp/Portable/Parser/LanguageParser.cs Show resolved Hide resolved

CyrusNajmabadi commented Oct 17, 2024

View reviewed changes

inline

28fad18

CyrusNajmabadi commented Oct 17, 2024

View reviewed changes

CyrusNajmabadi marked this pull request as ready for review October 17, 2024 21:45

CyrusNajmabadi requested a review from a team as a code owner October 17, 2024 21:45

RikkiGibson self-assigned this Oct 17, 2024

RikkiGibson approved these changes Oct 17, 2024

View reviewed changes

CyrusNajmabadi added 2 commits October 18, 2024 11:00

Simplify token merging

16497d4

Fix

c5f6fb7

CyrusNajmabadi commented Oct 18, 2024

View reviewed changes

Simplify

1976c00

RikkiGibson reviewed Oct 19, 2024

View reviewed changes

src/Compilers/CSharp/Portable/Parser/LanguageParser.cs Outdated Show resolved Hide resolved

RikkiGibson requested a review from a team October 21, 2024 17:28

RikkiGibson mentioned this pull request Oct 21, 2024

Do not lex .. as a single DotDotToken #75549

Merged

jaredpar requested a review from cston October 22, 2024 12:22

CyrusNajmabadi and others added 3 commits October 22, 2024 09:10

Update src/Compilers/CSharp/Portable/Parser/LanguageParser.cs

7ff8920

Co-authored-by: Rikki Gibson <rikkigibson@gmail.com>

Merge remote-tracking branch 'upstream/main' into expressionParsing

b2145be

Merge branch 'expressionParsing' of https://github.com/CyrusNajmabadi…

7874ffb

…/roslyn into expressionParsing

cston reviewed Oct 23, 2024

View reviewed changes

src/Compilers/CSharp/Portable/Parser/LanguageParser.cs Outdated Show resolved Hide resolved

cston reviewed Oct 23, 2024

View reviewed changes

src/Compilers/CSharp/Portable/Parser/LanguageParser.cs Outdated Show resolved Hide resolved

CyrusNajmabadi added 3 commits October 23, 2024 11:49

Merge remote-tracking branch 'upstream/main' into expressionParsing

fe0c172

Tweak comment

804f7a6

Grammar

caf82c5

cston reviewed Oct 23, 2024

View reviewed changes

CyrusNajmabadi commented Oct 23, 2024

View reviewed changes

src/Compilers/CSharp/Portable/Parser/LanguageParser.cs Outdated Show resolved Hide resolved

Update src/Compilers/CSharp/Portable/Parser/LanguageParser.cs

52190a9

CyrusNajmabadi enabled auto-merge (squash) October 23, 2024 19:42

cston approved these changes Oct 23, 2024

View reviewed changes

Lower value slightly

cc4ce54

cston approved these changes Oct 23, 2024

View reviewed changes

CyrusNajmabadi merged commit a02225c into dotnet:main Oct 23, 2024
24 checks passed

CyrusNajmabadi deleted the expressionParsing branch October 23, 2024 20:54

dotnet-policy-service bot added this to the Next milestone Oct 23, 2024

akhera99 modified the milestones: Next, 17.13 P1 Oct 28, 2024

This was referenced Oct 29, 2024

[Automated] PRs inserted in VS build main-35428.98 #75658

Closed

[Automated] PRs inserted in VS build feature.debugger.main-35428.265 #75662

Closed

dotnet-bot mentioned this pull request Nov 19, 2024

[Automated] PRs inserted in VS build feature.debugger.shadowDebug-35518.109 #75965

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Revise expression parsing #75553

Revise expression parsing #75553

CyrusNajmabadi commented Oct 17, 2024

CyrusNajmabadi Oct 17, 2024

CyrusNajmabadi Oct 17, 2024

CyrusNajmabadi Oct 17, 2024

CyrusNajmabadi Oct 17, 2024

CyrusNajmabadi Oct 17, 2024

CyrusNajmabadi Oct 17, 2024

CyrusNajmabadi Oct 17, 2024

CyrusNajmabadi Oct 17, 2024

CyrusNajmabadi Oct 17, 2024

CyrusNajmabadi Oct 17, 2024

RikkiGibson Oct 17, 2024

CyrusNajmabadi Oct 17, 2024

CyrusNajmabadi Oct 17, 2024

CyrusNajmabadi Oct 17, 2024

CyrusNajmabadi commented Oct 17, 2024 •

edited

Loading

RikkiGibson left a comment

RikkiGibson Oct 17, 2024

CyrusNajmabadi commented Oct 18, 2024

CyrusNajmabadi Oct 18, 2024

cston Oct 23, 2024

CyrusNajmabadi Oct 23, 2024

cston Oct 23, 2024

		var tk = this.CurrentToken.ContextualKind;

		bool isAssignmentOperator = false;


		return expr;
		ExpressionSyntax parsePostFixExpression(ExpressionSyntax expr)

Revise expression parsing #75553

Revise expression parsing #75553

Conversation

CyrusNajmabadi commented Oct 17, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

CyrusNajmabadi commented Oct 17, 2024 • edited Loading

RikkiGibson left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

CyrusNajmabadi commented Oct 18, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

CyrusNajmabadi commented Oct 17, 2024 •

edited

Loading