-
Notifications
You must be signed in to change notification settings - Fork 12.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Parse & reject postfix operators after casts #68985
Conversation
(rust_highfive has picked a reviewer for you, use r? to override) |
src/librustc_parse/parser/expr.rs
Outdated
ExprKind::Field(_, _) => "field access expressions", | ||
ExprKind::MethodCall(_, _) => "method call expressions", | ||
ExprKind::Await(_) => "awaits", | ||
_ => "expressions", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For someone who's more familiar with the parser codebase: is there a way to generate these en mass? We found some description functions for other AST types (like ItemKind
), but none for ExprKind
. Or if there isn't, should we add one?
r? @estebank |
Just force pushed to fix some whitespace errors in the .stderr test file; didn't want to do extra commits 3 minutes into the PR. |
This comment has been minimized.
This comment has been minimized.
src/librustc_parse/parser/expr.rs
Outdated
|
||
// The resulting parse tree for `&x as T[0]` has a precedence of `((&x) as T)[0]`. | ||
let with_postfix = self.parse_dot_or_call_expr_with_(expr, span)?; | ||
if !matches!(with_postfix.kind, ExprKind::Cast(_, _)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is this condition for? It's either unnecessary (always true) or incorrect.
We must always report an error if we parsed something postfix after the cast/ascription expression, but here it looks like we may accept things like expr as Type1.field as Type2
without an error.
(Also ExprCast
and ExprType
should always do the same thing synctactically, e.g. if expr as Type1 as Type2
work, then expr as Type1: Type2
should also work.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe we call too far down into the parse chain to ever parse another as
expression here. If we've read the code correctly, parse_dot_or_call_expr_with_
will only ever parse things starting with .
, [
or (
after an existing expression. I think we'd have had to call into parse_assoc_expr
in order to parse another as
expression.
When trying to understand the codebase, we came up with this model:
parse_expr
: parses outer attributes, forwards toparse_assoc_expr
parse_assoc_expr
: calls intoparse_prefix_expr
to parse the left hand side, then parses*,+,||,&&, as
and all other binary operators. Calls helpers likeparse_assoc_expr_*
when it encounters these.parse_assoc_expr_*
(assoc
,ascribe
,parse_range_expr
): these parse the thing they're named for, using the lhs parse fromparse_assoc_expr
. The ones with an expression on the right call back intoparse_assoc_expr
to get the right hand side.parse_prefix_expr
: this parses prefix expressions (!
,-
,*
,&
) then calls:parse_bottom_expr
andparse_dot_or_call_expr_with
(passing in argument fromparse_bottom_expr
)parse_bottom_expr
: parses single things like return expressions, parenthesized expressions, etc.parse_dot_or_call_expr_with
: parses function calls, index operators,?
operators and field operators, and applies those to the result passed in (usually fromparse_bottom_expr
)
By jumping directly to parse_dot_or_call_expr_with
, it shouldn't be able to parse any associative associative operators, nor as X
.
But this introduces new test failures, so we've clearly done something wrong. Maybe this is it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's correct that parse_dot_or_call_expr_with_
cannot possibly result in ::Cast
or ::Type
again and will give back the expression passed to it in those cases. However, this is also relying on facts on the ground that may change with time. To prevent such accidents in the future, I think we should make some enhancements:
- Let's make sure there's a test for
expr :/as Type1 :/as Type2
(4 cases, test all of them) and that these are parse failures. - When
parse_dot_or_call_expr_with_
passes back the expression, it does not reallocate. This means that you should be able to hash the address before the call and after and they should be the same. So this can be added as an||
-ed condition after the!matches(...)
. This shouldn't change the semantics, but it does add a measure of robustness.
But this introduces new test failures, so we've clearly done something wrong. Maybe this is it?
The failures come from not anticipating that this function is also used for type ascriptions (as noted by @petrochenkov), e.g. foo: Vec<u8>
. (See the function right below this one for how that happens.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Makes sense that we want to make it more reliable- we've added in address hashing, or at least what I think is good address hashing.
As for expr :/as Type1 :/as Type2
, I've added tests, but they don't fail. Looking at how this behaves prior to these changes, I don't think these are expected to fail? x as T1 as T2
is valid today. This doesn't change that, it'll still just parse x as T1
, then fail to see any postfix, then go back up to parse_assoc_expr
and parse the second as T2
(or : T2
).
Given that we test the expr pointer hash now, if parse_dot_or_call_expr_with_
ever changes to start parsing type ascriptions or casts, then these tests should start to fail because we'll trigger our error message. I think this is OK?
The failures come from not anticipating that this function is also used for type ascriptions (as noted by @petrochenkov), e.g. foo: Vec. (See the function right below this one for how that happens.)
We've gone through and fixed this, I think.
src/librustc_parse/parser/expr.rs
Outdated
|
||
// The resulting parse tree for `&x as T[0]` has a precedence of `((&x) as T)[0]`. | ||
let with_postfix = self.parse_dot_or_call_expr_with_(expr, span)?; | ||
if !matches!(with_postfix.kind, ExprKind::Cast(_, _)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's correct that parse_dot_or_call_expr_with_
cannot possibly result in ::Cast
or ::Type
again and will give back the expression passed to it in those cases. However, this is also relying on facts on the ground that may change with time. To prevent such accidents in the future, I think we should make some enhancements:
- Let's make sure there's a test for
expr :/as Type1 :/as Type2
(4 cases, test all of them) and that these are parse failures. - When
parse_dot_or_call_expr_with_
passes back the expression, it does not reallocate. This means that you should be able to hash the address before the call and after and they should be the same. So this can be added as an||
-ed condition after the!matches(...)
. This shouldn't change the semantics, but it does add a measure of robustness.
But this introduces new test failures, so we've clearly done something wrong. Maybe this is it?
The failures come from not anticipating that this function is also used for type ascriptions (as noted by @petrochenkov), e.g. foo: Vec<u8>
. (See the function right below this one for how that happens.)
This comment has been minimized.
This comment has been minimized.
Thank you for the detailed review! We've updated this with suggestions & more test fixes. The checking that we've not reallocated sounds good, but like it might be more strict than this code really needs. I was thinking hashing the whole struct would be more reliable and more exactly what we want. Would that extra reliability be worth the expense, or do you think hashing just the pointer is better? Besides that, we've done all the improvements we can think of for this PR. |
This comment has been minimized.
This comment has been minimized.
This adds parsing for expressions like 'x as Ty[0]' which will immediately error out, but still give the rest of the parser a valid parse tree to continue.
This is almost entirely refactoring and message changing, with the single behavioral change of panicking for unexpected output.
Previously this just errored out on all usages of type ascription, which isn't helpful.
This is a modified version of estebank's suggestion, with a bit of extra cleanup now that we don't need the different cases for if we can turn a span into a string or not.
Added latest suggestions - thanks for those! |
Implementes suggeseted changes by Centril. This checks whether the memory location of the cast remains the same after atttempting to parse a postfix operator after a cast has been parsed. If the address is not the same, an illegal postfix operator was parsed. Previously the code generated a hash of the pointer, which was overly complex and inefficent. Casting the pointers and comparing them is simpler and more effcient.
Fixes for those last two suggestions applied! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good; I'll leave final approval to @estebank in case they have any other remarks; otherwise r=me
Ping @estebank :) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM overall, other than a couple of nitpicks.
r? @Centril r=Centril
error: casts cannot be followed by ? | ||
--> $DIR/issue-35813-postfix-after-cast.rs:121:5 | ||
| | ||
LL | Err(0u64): Result<u64,u64>?; | ||
| ^^^^^^^^^-^^^^^^^^^^^^^^^^ | ||
| | | ||
| help: maybe write a path separator here: `::` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This case seems to be unhandled (but that might be ok).
// Save the memory location of expr before parsing any following postfix operators. | ||
// This will be compared with the memory location of the output expression. | ||
// If they different we can assume we parsed another expression because the existing expression is not reallocated. | ||
let addr_before = &*cast_expr as *const _ as usize; | ||
let span = cast_expr.span; | ||
let with_postfix = self.parse_dot_or_call_expr_with_(cast_expr, span)?; | ||
let changed = addr_before != &*with_postfix as *const _ as usize; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It feels weird to me to use the memory addresses for this, but I don't see a reason for it to ever fail...
r? @Centril r=Centril |
@bors r=Centril |
📌 Commit 453c505 has been approved by |
🌲 The tree is currently closed for pull requests below priority 1000, this pull request will be tested once the tree is reopened |
Parse & reject postfix operators after casts This adds an explicit error messages for when parsing `x as Type[0]` or similar expressions. Our add an extra parse case for parsing any postfix operator (dot, indexing, method calls, await) that triggers directly after parsing `as` expressions. My friend and I worked on this together, but they're still deciding on a github username and thus I'm submitting this for both of us. It will immediately error out, but will also provide the rest of the parser with a useful parse tree to deal with. There's one decision we made in how this produces the parse tree. In the situation `&x as T[0]`, one could imagine this parsing as either `&((x as T)[0])` or `((&x) as T)[0]`. We chose the latter for ease of implementation, and as it seemed the most intuitive. Feedback welcome! This is our first change to the parser section, and it might be completely horrible. Fixes rust-lang#35813.
Rollup of 9 pull requests Successful merges: - #67741 (When encountering an Item in a pat context, point at the item def) - #68985 (Parse & reject postfix operators after casts) - #69656 (Use .next() instead of .nth(0) on iterators.) - #69680 (rustc_expand: Factor out `Annotatable::into_tokens` to a separate method) - #69690 (test(pattern): add tests for combinations of pattern features) - #69706 (Use subslice patterns in slice methods) - #69727 (Avoid using `unwrap()` in suggestions) - #69754 (Update deprecation version to 1.42 for Error::description) - #69782 (Don't redundantly repeat field names (clippy::redundant_field_names)) Failed merges: r? @ghost
…resses-in-parser-, r=compiler-errors Remove an address comparison from the parser Originally this check was added in rust-lang#68985, as suggested by rust-lang@940f657#r376850175. I don't think that this address check is a robust way of making parser more robust. This code is also extensively tested by [`ui/parser/issues/issue-35813-postfix-after-cast.rs`](https://github.com/rust-lang/rust/blob/57d3c58ed6e0faf89a62411f96c000ffc9fd3937/src/test/ui/parser/issues/issue-35813-postfix-after-cast.rs). _Replaces #103700_ r? `@compiler-errors`
This adds an explicit error messages for when parsing
x as Type[0]
or similar expressions. Our add an extra parse case for parsing any postfix operator (dot, indexing, method calls, await) that triggers directly after parsingas
expressions.My friend and I worked on this together, but they're still deciding on a github username and thus I'm submitting this for both of us.
It will immediately error out, but will also provide the rest of the parser with a useful parse tree to deal with.
There's one decision we made in how this produces the parse tree. In the situation
&x as T[0]
, one could imagine this parsing as either&((x as T)[0])
or((&x) as T)[0]
. We chose the latter for ease of implementation, and as it seemed the most intuitive.Feedback welcome! This is our first change to the parser section, and it might be completely horrible.
Fixes #35813.