Parse & reject postfix operators after casts #68985

daboross · 2020-02-09T05:51:11Z

This adds an explicit error messages for when parsing x as Type[0] or similar expressions. Our add an extra parse case for parsing any postfix operator (dot, indexing, method calls, await) that triggers directly after parsing as expressions.

My friend and I worked on this together, but they're still deciding on a github username and thus I'm submitting this for both of us.

It will immediately error out, but will also provide the rest of the parser with a useful parse tree to deal with.

There's one decision we made in how this produces the parse tree. In the situation &x as T[0], one could imagine this parsing as either &((x as T)[0]) or ((&x) as T)[0]. We chose the latter for ease of implementation, and as it seemed the most intuitive.

Feedback welcome! This is our first change to the parser section, and it might be completely horrible.

Fixes #35813.

rust-highfive · 2020-02-09T05:51:15Z

r? @ecstatic-morse

(rust_highfive has picked a reviewer for you, use r? to override)

daboross · 2020-02-09T05:53:46Z

src/librustc_parse/parser/expr.rs

+                    ExprKind::Field(_, _) => "field access expressions",
+                    ExprKind::MethodCall(_, _) => "method call expressions",
+                    ExprKind::Await(_) => "awaits",
+                    _ => "expressions",


For someone who's more familiar with the parser codebase: is there a way to generate these en mass? We found some description functions for other AST types (like ItemKind), but none for ExprKind. Or if there isn't, should we add one?

ecstatic-morse · 2020-02-09T05:58:00Z

r? @estebank

daboross · 2020-02-09T06:02:43Z

Just force pushed to fix some whitespace errors in the .stderr test file; didn't want to do extra commits 3 minutes into the PR.

petrochenkov · 2020-02-09T12:46:09Z

src/librustc_parse/parser/expr.rs

+
+        // The resulting parse tree for `&x as T[0]` has a precedence of `((&x) as T)[0]`.
+        let with_postfix = self.parse_dot_or_call_expr_with_(expr, span)?;
+        if !matches!(with_postfix.kind, ExprKind::Cast(_, _)) {


What is this condition for? It's either unnecessary (always true) or incorrect.
We must always report an error if we parsed something postfix after the cast/ascription expression, but here it looks like we may accept things like expr as Type1.field as Type2 without an error.

(Also ExprCast and ExprType should always do the same thing synctactically, e.g. if expr as Type1 as Type2 work, then expr as Type1: Type2 should also work.)

I believe we call too far down into the parse chain to ever parse another as expression here. If we've read the code correctly, parse_dot_or_call_expr_with_ will only ever parse things starting with ., [ or ( after an existing expression. I think we'd have had to call into parse_assoc_expr in order to parse another as expression.

When trying to understand the codebase, we came up with this model:

parse_expr: parses outer attributes, forwards to parse_assoc_expr

parse_assoc_expr: calls into parse_prefix_expr to parse the left hand side, then parses *,+,||,&&, as and all other binary operators. Calls helpers like parse_assoc_expr_* when it encounters these.

parse_assoc_expr_* (assoc, ascribe, parse_range_expr): these parse the thing they're named for, using the lhs parse from parse_assoc_expr. The ones with an expression on the right call back into parse_assoc_expr to get the right hand side.

parse_prefix_expr: this parses prefix expressions (!,-,*,&) then calls: parse_bottom_expr and parse_dot_or_call_expr_with (passing in argument from parse_bottom_expr)

parse_bottom_expr: parses single things like return expressions, parenthesized expressions, etc.

parse_dot_or_call_expr_with: parses function calls, index operators, ? operators and field operators, and applies those to the result passed in (usually from parse_bottom_expr)

By jumping directly to parse_dot_or_call_expr_with, it shouldn't be able to parse any associative associative operators, nor as X.

But this introduces new test failures, so we've clearly done something wrong. Maybe this is it?

It's correct that parse_dot_or_call_expr_with_ cannot possibly result in ::Cast or ::Type again and will give back the expression passed to it in those cases. However, this is also relying on facts on the ground that may change with time. To prevent such accidents in the future, I think we should make some enhancements:

Let's make sure there's a test for expr :/as Type1 :/as Type2 (4 cases, test all of them) and that these are parse failures.

When parse_dot_or_call_expr_with_ passes back the expression, it does not reallocate. This means that you should be able to hash the address before the call and after and they should be the same. So this can be added as an ||-ed condition after the !matches(...). This shouldn't change the semantics, but it does add a measure of robustness.

But this introduces new test failures, so we've clearly done something wrong. Maybe this is it?

The failures come from not anticipating that this function is also used for type ascriptions (as noted by @petrochenkov), e.g. foo: Vec<u8>. (See the function right below this one for how that happens.)

Makes sense that we want to make it more reliable- we've added in address hashing, or at least what I think is good address hashing.

As for expr :/as Type1 :/as Type2, I've added tests, but they don't fail. Looking at how this behaves prior to these changes, I don't think these are expected to fail? x as T1 as T2 is valid today. This doesn't change that, it'll still just parse x as T1, then fail to see any postfix, then go back up to parse_assoc_expr and parse the second as T2 (or : T2).

Given that we test the expr pointer hash now, if parse_dot_or_call_expr_with_ ever changes to start parsing type ascriptions or casts, then these tests should start to fail because we'll trigger our error message. I think this is OK?

The failures come from not anticipating that this function is also used for type ascriptions (as noted by @petrochenkov), e.g. foo: Vec. (See the function right below this one for how that happens.)

We've gone through and fixed this, I think.

src/librustc_parse/parser/expr.rs

src/test/ui/parser/issue-35813-postfix-after-cast.rs

src/librustc_parse/parser/expr.rs

Centril · 2020-02-10T02:53:01Z

src/librustc_parse/parser/expr.rs

+
+        // The resulting parse tree for `&x as T[0]` has a precedence of `((&x) as T)[0]`.
+        let with_postfix = self.parse_dot_or_call_expr_with_(expr, span)?;
+        if !matches!(with_postfix.kind, ExprKind::Cast(_, _)) {


It's correct that parse_dot_or_call_expr_with_ cannot possibly result in ::Cast or ::Type again and will give back the expression passed to it in those cases. However, this is also relying on facts on the ground that may change with time. To prevent such accidents in the future, I think we should make some enhancements:

Let's make sure there's a test for expr :/as Type1 :/as Type2 (4 cases, test all of them) and that these are parse failures.

When parse_dot_or_call_expr_with_ passes back the expression, it does not reallocate. This means that you should be able to hash the address before the call and after and they should be the same. So this can be added as an ||-ed condition after the !matches(...). This shouldn't change the semantics, but it does add a measure of robustness.

But this introduces new test failures, so we've clearly done something wrong. Maybe this is it?

The failures come from not anticipating that this function is also used for type ascriptions (as noted by @petrochenkov), e.g. foo: Vec<u8>. (See the function right below this one for how that happens.)

src/librustc_parse/parser/expr.rs

daboross · 2020-02-16T02:06:09Z

Thank you for the detailed review!

We've updated this with suggestions & more test fixes.

The checking that we've not reallocated sounds good, but like it might be more strict than this code really needs. I was thinking hashing the whole struct would be more reliable and more exactly what we want. Would that extra reliability be worth the expense, or do you think hashing just the pointer is better?

Besides that, we've done all the improvements we can think of for this PR.

This adds parsing for expressions like 'x as Ty[0]' which will immediately error out, but still give the rest of the parser a valid parse tree to continue.

This is almost entirely refactoring and message changing, with the single behavioral change of panicking for unexpected output.

Previously this just errored out on all usages of type ascription, which isn't helpful.

src/test/ui/parser/issue-35813-postfix-after-cast.rs

This is a modified version of estebank's suggestion, with a bit of extra cleanup now that we don't need the different cases for if we can turn a span into a string or not.

daboross · 2020-02-22T20:41:59Z

Added latest suggestions - thanks for those!

src/librustc_parse/parser/expr.rs

Implementes suggeseted changes by Centril. This checks whether the memory location of the cast remains the same after atttempting to parse a postfix operator after a cast has been parsed. If the address is not the same, an illegal postfix operator was parsed. Previously the code generated a hash of the pointer, which was overly complex and inefficent. Casting the pointers and comparing them is simpler and more effcient.

daboross · 2020-02-29T19:30:32Z

Fixes for those last two suggestions applied!

Centril

Looks good; I'll leave final approval to @estebank in case they have any other remarks; otherwise r=me

Centril · 2020-03-04T23:14:05Z

Ping @estebank :)

estebank

LGTM overall, other than a couple of nitpicks.

r? @Centril r=Centril

estebank · 2020-03-06T20:43:12Z

src/test/ui/parser/issue-35813-postfix-after-cast.stderr

+error: casts cannot be followed by ?
+  --> $DIR/issue-35813-postfix-after-cast.rs:121:5
+   |
+LL |     Err(0u64): Result<u64,u64>?;
+   |     ^^^^^^^^^-^^^^^^^^^^^^^^^^
+   |              |
+   |              help: maybe write a path separator here: `::`


This case seems to be unhandled (but that might be ok).

estebank · 2020-03-06T20:44:53Z

src/librustc_parse/parser/expr.rs

+        // Save the memory location of expr before parsing any following postfix operators.
+        // This will be compared with the memory location of the output expression.
+        // If they different we can assume we parsed another expression because the existing expression is not reallocated.
+        let addr_before = &*cast_expr as *const _ as usize;
+        let span = cast_expr.span;
+        let with_postfix = self.parse_dot_or_call_expr_with_(cast_expr, span)?;
+        let changed = addr_before != &*with_postfix as *const _ as usize;


It feels weird to me to use the memory addresses for this, but I don't see a reason for it to ever fail...

estebank · 2020-03-06T20:46:10Z

r? @Centril r=Centril

estebank · 2020-03-06T20:46:38Z

@bors r=Centril

bors · 2020-03-06T20:46:40Z

📌 Commit 453c505 has been approved by Centril

bors · 2020-03-06T20:46:40Z

🌲 The tree is currently closed for pull requests below priority 1000, this pull request will be tested once the tree is reopened

Parse & reject postfix operators after casts This adds an explicit error messages for when parsing `x as Type[0]` or similar expressions. Our add an extra parse case for parsing any postfix operator (dot, indexing, method calls, await) that triggers directly after parsing `as` expressions. My friend and I worked on this together, but they're still deciding on a github username and thus I'm submitting this for both of us. It will immediately error out, but will also provide the rest of the parser with a useful parse tree to deal with. There's one decision we made in how this produces the parse tree. In the situation `&x as T[0]`, one could imagine this parsing as either `&((x as T)[0])` or `((&x) as T)[0]`. We chose the latter for ease of implementation, and as it seemed the most intuitive. Feedback welcome! This is our first change to the parser section, and it might be completely horrible. Fixes rust-lang#35813.

@ghost

Rollup of 9 pull requests Successful merges: - #67741 (When encountering an Item in a pat context, point at the item def) - #68985 (Parse & reject postfix operators after casts) - #69656 (Use .next() instead of .nth(0) on iterators.) - #69680 (rustc_expand: Factor out `Annotatable::into_tokens` to a separate method) - #69690 (test(pattern): add tests for combinations of pattern features) - #69706 (Use subslice patterns in slice methods) - #69727 (Avoid using `unwrap()` in suggestions) - #69754 (Update deprecation version to 1.42 for Error::description) - #69782 (Don't redundantly repeat field names (clippy::redundant_field_names)) Failed merges: r? @ghost

…resses-in-parser-, r=compiler-errors Remove an address comparison from the parser Originally this check was added in rust-lang#68985, as suggested by rust-lang@940f657#r376850175. I don't think that this address check is a robust way of making parser more robust. This code is also extensively tested by [`ui/parser/issues/issue-35813-postfix-after-cast.rs`](https://github.com/rust-lang/rust/blob/57d3c58ed6e0faf89a62411f96c000ffc9fd3937/src/test/ui/parser/issues/issue-35813-postfix-after-cast.rs). _Replaces #103700_ r? `@compiler-errors`

rust-highfive assigned ecstatic-morse Feb 9, 2020

rust-highfive added the S-waiting-on-review label Feb 9, 2020

daboross mentioned this pull request Feb 9, 2020

Failure to parse "value as Type[index]" #35813

Closed

daboross commented Feb 9, 2020

View reviewed changes

daboross force-pushed the fix-35813 branch from 016a362 to c5641c5 Compare February 9, 2020 05:55

rust-highfive assigned estebank and unassigned ecstatic-morse Feb 9, 2020

daboross force-pushed the fix-35813 branch from c5641c5 to f2be78c Compare February 9, 2020 06:01

This comment has been minimized.

Sign in to view

petrochenkov self-assigned this Feb 9, 2020

petrochenkov reviewed Feb 9, 2020

View reviewed changes

petrochenkov removed their assignment Feb 9, 2020

Centril self-assigned this Feb 10, 2020

Centril reviewed Feb 10, 2020

View reviewed changes

Centril added S-waiting-on-author and removed S-waiting-on-review labels Feb 10, 2020

This comment has been minimized.

Sign in to view

daboross force-pushed the fix-35813 branch from db3fb78 to 4f48901 Compare February 16, 2020 03:43

daboross added 7 commits February 15, 2020 19:50

Refactor out error case & apply suggestions.

5ce9b80

This is almost entirely refactoring and message changing, with the single behavioral change of panicking for unexpected output.

Type ascription outputs a Type, not Cast

4fc0532

Previously this just errored out on all usages of type ascription, which isn't helpful.

Keep better fix suggestion if type ascription is likely unintended

0cf2049

Add more error cases to issue 35813 tests

f82ca8b

Fix related type ascription tests.

5dd6464

Remove extra debug print in unreachable!

e3eefe2

estebank reviewed Feb 17, 2020

View reviewed changes

src/test/ui/parser/issue-35813-postfix-after-cast.rs Show resolved Hide resolved

daboross added 2 commits February 22, 2020 11:36

Add more double cast + method call tests

fa1f547

Use multipart suggestion

f434c6e

This is a modified version of estebank's suggestion, with a bit of extra cleanup now that we don't need the different cases for if we can turn a span into a string or not.

Centril reviewed Feb 24, 2020

View reviewed changes

src/librustc_parse/parser/expr.rs Outdated Show resolved Hide resolved

src/librustc_parse/parser/expr.rs Outdated Show resolved Hide resolved

Centril approved these changes Mar 1, 2020

View reviewed changes

estebank approved these changes Mar 6, 2020

View reviewed changes

rust-highfive unassigned estebank Mar 6, 2020

bors added S-waiting-on-bors and removed S-waiting-on-author labels Mar 6, 2020

Centril mentioned this pull request Mar 7, 2020

Rollup of 12 pull requests #69795

Closed

Centril mentioned this pull request Mar 7, 2020

Rollup of 9 pull requests #69796

Merged

bors merged commit 111724f into rust-lang:master Mar 7, 2020

daboross deleted the fix-35813 branch March 14, 2020 19:53

Centril mentioned this pull request Mar 30, 2020

internal error: entered unreachable code: parse_dot_or_call_expr_with_ shouldn't produce this #70552

Closed

This was referenced Dec 30, 2021

|| () as ()() is no longer supposed to parse dtolnay/syn#1117

Closed

Range expressions: discrepancies between rustc and parser-lalr #28785

Open

WaffleLapkin mentioned this pull request Nov 8, 2022

Remove an address comparison from the parser #104147

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Parse & reject postfix operators after casts #68985

Parse & reject postfix operators after casts #68985

daboross commented Feb 9, 2020

rust-highfive commented Feb 9, 2020

daboross Feb 9, 2020

ecstatic-morse commented Feb 9, 2020

daboross commented Feb 9, 2020 •

edited

Loading

This comment has been minimized.

petrochenkov Feb 9, 2020

daboross Feb 9, 2020 •

edited

Loading

Centril Feb 10, 2020

daboross Feb 16, 2020

Centril Feb 10, 2020

This comment has been minimized.

daboross commented Feb 16, 2020 •

edited

Loading

This comment has been minimized.

daboross commented Feb 22, 2020 •

edited

Loading

daboross commented Feb 29, 2020

Centril left a comment

Centril commented Mar 4, 2020

estebank left a comment

estebank Mar 6, 2020

estebank Mar 6, 2020

estebank commented Mar 6, 2020

estebank commented Mar 6, 2020

bors commented Mar 6, 2020

bors commented Mar 6, 2020

Parse & reject postfix operators after casts #68985

Parse & reject postfix operators after casts #68985

Conversation

daboross commented Feb 9, 2020

rust-highfive commented Feb 9, 2020

daboross Feb 9, 2020

Choose a reason for hiding this comment

ecstatic-morse commented Feb 9, 2020

daboross commented Feb 9, 2020 • edited Loading

This comment has been minimized.

petrochenkov Feb 9, 2020

Choose a reason for hiding this comment

daboross Feb 9, 2020 • edited Loading

Choose a reason for hiding this comment

Centril Feb 10, 2020

Choose a reason for hiding this comment

daboross Feb 16, 2020

Choose a reason for hiding this comment

Centril Feb 10, 2020

Choose a reason for hiding this comment

This comment has been minimized.

daboross commented Feb 16, 2020 • edited Loading

This comment has been minimized.

daboross commented Feb 22, 2020 • edited Loading

daboross commented Feb 29, 2020

Centril left a comment

Choose a reason for hiding this comment

Centril commented Mar 4, 2020

estebank left a comment

Choose a reason for hiding this comment

estebank Mar 6, 2020

Choose a reason for hiding this comment

estebank Mar 6, 2020

Choose a reason for hiding this comment

estebank commented Mar 6, 2020

estebank commented Mar 6, 2020

bors commented Mar 6, 2020

bors commented Mar 6, 2020

daboross commented Feb 9, 2020 •

edited

Loading

daboross Feb 9, 2020 •

edited

Loading

daboross commented Feb 16, 2020 •

edited

Loading

daboross commented Feb 22, 2020 •

edited

Loading