-
Notifications
You must be signed in to change notification settings - Fork 4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Revise expression parsing #75553
Revise expression parsing #75553
Conversation
@@ -10937,9 +10937,6 @@ private ExpressionSyntax ParseSubExpression(Precedence precedence) | |||
|
|||
private ExpressionSyntax ParseSubExpressionCore(Precedence precedence) | |||
{ | |||
ExpressionSyntax leftOperand; | |||
Precedence newPrecedence = 0; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
an example of very confusing local state. inner blocks would set this, to use only within that block. But the value was never read afterwards. Removed entirely as having multiple precedence values to have to understand in a precedence parser makes things much harder to reason about :)
leftOperand = _syntaxFactory.PrefixUnaryExpression(opKind, opToken, operand); | ||
} | ||
else if (tk == SyntaxKind.DotDotToken) | ||
return ParseExpressionContinued(parseUnaryOrPrimaryExpression(precedence), precedence); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
extracted out common patters to make things much clearer. The core algorithm is now simple to reason about. We first parse out a unary/primary expression. Then, we parse out what can continue that at the current precedence we're at.
return _syntaxFactory.PrefixUnaryExpression( | ||
opKind, | ||
this.EatToken(), | ||
this.ParseSubExpression(GetPrecedence(opKind))); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
moved to a model where instead of assinging to a variable (and then having to see what happens to it), we have a local function which just returns the value in question. Making it much easier to see that this is just responsible for parsing out that construct and nothing else.
also, moved to the much clearer parsing pattern we try to use int eh rest of hte parser where the production A -> B C D
is parsed as return A(ParseB(), ParseC(), ParseD())
. this lowers the number of locals to keep track of and ensures not accidentally using them for an innapropriate purpose.
// Not a unary operator - get a primary expression. | ||
leftOperand = this.ParseTerm(precedence); | ||
return this.ParsePrimaryExpression(precedence); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
base (primary) case now falls out at the bottom once unary cases
if (this.CurrentToken.Kind == SyntaxKind.QuestionToken && precedence <= Precedence.Conditional) | ||
return consumeConditionalExpression(expandedExpression); | ||
|
||
return expandedExpression; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the entirety of hte ParseExpressionCOntineud method is now much simpler. You start with the initial unary/primary expression passed in, and your current precedence. It then proceeds to try to expand the expression as long as it can, finally bottoming out with ? :
parsing if allowed. Much simpler to understand, esp. in comparison to our actual spec.
var tk = this.CurrentToken.ContextualKind; | ||
|
||
bool isAssignmentOperator = false; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
example of ambient state that was hard to keep track of. we can compute this on demand trivially in the one place later that needs this.
// check for >>, >>=, >>> or >>>= | ||
if (tk == SyntaxKind.GreaterThanToken | ||
&& this.PeekToken(1) is { Kind: SyntaxKind.GreaterThanToken or SyntaxKind.GreaterThanEqualsToken } token1 | ||
&& NoTriviaBetween(this.CurrentToken, token1)) // check to see if they really are adjacent |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
instead of having >
be handled by 'IsBinaryOperator' and then undoing its logic. we just check for the fun cases up front, and then only handle it as a real > if it doesn't match that.
} | ||
|
||
var newPrecedence = GetPrecedence(opKind); | ||
|
||
// check for >>, >>=, >>> or >>>= | ||
int tokensToCombine = 1; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
also a difficult value to reason about, with complex semantics about hwo it gets set. no need for it at all. We have opKind. So we know precisely what tokens we have and if we have to merge them.
@@ -11349,7 +11352,7 @@ private ExpressionSyntax ParseIsExpression(ExpressionSyntax leftOperand, SyntaxT | |||
}; | |||
} | |||
|
|||
private ExpressionSyntax ParseTerm(Precedence precedence) | |||
private ExpressionSyntax ParsePrimaryExpression(Precedence precedence) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
clarifying name.
break; | ||
// Something that doesn't expand the current expression we're looking at. Bail out and see if we | ||
// can end with a conditional expression. | ||
return false; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
note: we could consider extracing this into a tryGetOperatorKind helper, to get similar benefits on single-exit logic. not sure if it is needed thoughl
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What I am finding as a reviewer is that the repeated "if-return-if-return-if-return"s are much easier to understand at a glance than the repeated "if-else-if-else-if-else-if". Both here and below, where the leftOperand
is assigned.
This is in part because the mental compiler needs to locate where we are going to jump to after exiting a large if-else block, scroll to it, decide whether any additional work is being done, decide whether we are in a consistent state for doing that work, etc.
Also, I don't think there's anyway to "fix" this, but, I am finding that scrolling between corresponding places in the old and new versions, to find where logic has moved to, is quite challenging, to the point that I might suggest to the second reviewer to just skim thru the deleted lines, then review the new change practically as new code.
// | ||
// Note that postfix operators (like ++) are still primary expressions, even though their prefix equivalents (`++x`) are unary. | ||
|
||
return parsePostFixExpression(parsePrimaryExpressionWithoutPostfix(precedence)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Inlined these methods since no one else shoudl call them. now it's clearer that parsing a primary is just prsing the initial part, and the the postfix portion.
|
||
return expr; | ||
ExpressionSyntax parsePostFixExpression(ExpressionSyntax expr) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
inlined into ParsePrimaryExpression.
} | ||
} | ||
} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
bad github diff. ParseBaseExpression, IsPossibleDeconstructionLeft and IsPossibleAnonymousMethodExpression are entirely unchanged. they just moved below the code that was now inlined into this method.
@dotnet/roslyn-compiler This is ready for review. Changes that really helped make it easier to reason about expression parsing while i was fixing the incremental parsing bug with |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Change LGTM but it would be nice to see this code pushed even farther toward making it easy to understand.
break; | ||
// Something that doesn't expand the current expression we're looking at. Bail out and see if we | ||
// can end with a conditional expression. | ||
return false; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What I am finding as a reviewer is that the repeated "if-return-if-return-if-return"s are much easier to understand at a glance than the repeated "if-else-if-else-if-else-if". Both here and below, where the leftOperand
is assigned.
This is in part because the mental compiler needs to locate where we are going to jump to after exiting a large if-else block, scroll to it, decide whether any additional work is being done, decide whether we are in a consistent state for doing that work, etc.
Also, I don't think there's anyway to "fix" this, but, I am finding that scrolling between corresponding places in the old and new versions, to find where logic has moved to, is quite challenging, to the point that I might suggest to the second reviewer to just skim thru the deleted lines, then review the new change practically as new code.
I was slightly incorrect. Note: the logic is still a little incorrect. Specifically, we should only take the 'range' here if precedence allows it. That said, no one calls into ParseSubExpressionCore with a precedence tighter than 'Range', so it never matters in practice. I'm considering cleaning this up further in a future pass, but would like to leave this as is for now. I have commented to the code explaining things. |
{ | ||
throw ExceptionUtilities.UnexpectedValue(tokensToCombine); | ||
return _syntaxFactory.BinaryExpression( | ||
operatorExpressionKind, leftOperand, operatorToken, this.ParseType(ParseTypeMode.AsExpression)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
in a similar vein, these also just became simple returns, instead of assignments to track.
Co-authored-by: Rikki Gibson <rikkigibson@gmail.com>
…/roslyn into expressionParsing
isAssignmentOperator = true; | ||
tokensToCombine = 2; | ||
} | ||
var (operatorTokenKind, operatorExpressionKind) = getOperatorTokenAndExpressionKind(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i intentionally changed as the old names were more confising. opToken was not a token, but was a kind. opKind was not the kind of the op token itself, but of the op expression.
The new names are much clearer (both are kinds) and indicate which domain they're describing (the token domain or the expression domain).
} | ||
else | ||
{ | ||
Debug.Assert(IsExpectedBinaryOperator(tk)); | ||
leftOperand = _syntaxFactory.BinaryExpression(opKind, leftOperand, opToken, this.ParseSubExpression(newPrecedence)); | ||
// Normal operator. Eat as a single token (converting cases contextual words 'with' to a keyword) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Recent forways into fixing expression parsing issues have not been fun. The core expression precedence parser is written in a very convoluted and hard to follow way. In particular, it tends to use a lot of local state variables for different purposes, with lack of clarity about the state tehy represent. Portions of that operation also mutate the state of the parser (sometimes consumign only parts of hte input, with the later locations required to figure that out and do the right thing). There are often relationships between variables which are hard to track and reason about.
This is a lot of complexity for what is really just an iterative looping algorithm for precedence parsing. This PR keeps the iterative nature of parsing here (we do not want recursive decent for expressions as we absolutely will blow the stack), while greatly simplifying the individual steps that parsing needs to perform.
Comments inline as to the changes and reasons for them.