Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Parse & reject postfix operators after casts #68985

Merged
merged 12 commits into from
Mar 7, 2020
Merged

Conversation

daboross
Copy link
Contributor

@daboross daboross commented Feb 9, 2020

This adds an explicit error messages for when parsing x as Type[0] or similar expressions. Our add an extra parse case for parsing any postfix operator (dot, indexing, method calls, await) that triggers directly after parsing as expressions.

My friend and I worked on this together, but they're still deciding on a github username and thus I'm submitting this for both of us.

It will immediately error out, but will also provide the rest of the parser with a useful parse tree to deal with.

There's one decision we made in how this produces the parse tree. In the situation &x as T[0], one could imagine this parsing as either &((x as T)[0]) or ((&x) as T)[0]. We chose the latter for ease of implementation, and as it seemed the most intuitive.

Feedback welcome! This is our first change to the parser section, and it might be completely horrible.

Fixes #35813.

@rust-highfive
Copy link
Collaborator

r? @ecstatic-morse

(rust_highfive has picked a reviewer for you, use r? to override)

@rust-highfive rust-highfive added the S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. label Feb 9, 2020
ExprKind::Field(_, _) => "field access expressions",
ExprKind::MethodCall(_, _) => "method call expressions",
ExprKind::Await(_) => "awaits",
_ => "expressions",
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For someone who's more familiar with the parser codebase: is there a way to generate these en mass? We found some description functions for other AST types (like ItemKind), but none for ExprKind. Or if there isn't, should we add one?

@ecstatic-morse
Copy link
Contributor

r? @estebank

@daboross
Copy link
Contributor Author

daboross commented Feb 9, 2020

Just force pushed to fix some whitespace errors in the .stderr test file; didn't want to do extra commits 3 minutes into the PR.

@rust-highfive

This comment has been minimized.

@petrochenkov petrochenkov self-assigned this Feb 9, 2020

// The resulting parse tree for `&x as T[0]` has a precedence of `((&x) as T)[0]`.
let with_postfix = self.parse_dot_or_call_expr_with_(expr, span)?;
if !matches!(with_postfix.kind, ExprKind::Cast(_, _)) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is this condition for? It's either unnecessary (always true) or incorrect.
We must always report an error if we parsed something postfix after the cast/ascription expression, but here it looks like we may accept things like expr as Type1.field as Type2 without an error.

(Also ExprCast and ExprType should always do the same thing synctactically, e.g. if expr as Type1 as Type2 work, then expr as Type1: Type2 should also work.)

Copy link
Contributor Author

@daboross daboross Feb 9, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe we call too far down into the parse chain to ever parse another as expression here. If we've read the code correctly, parse_dot_or_call_expr_with_ will only ever parse things starting with ., [ or ( after an existing expression. I think we'd have had to call into parse_assoc_expr in order to parse another as expression.

When trying to understand the codebase, we came up with this model:

  • parse_expr: parses outer attributes, forwards to parse_assoc_expr
  • parse_assoc_expr: calls into parse_prefix_expr to parse the left hand side, then parses *,+,||,&&, as and all other binary operators. Calls helpers like parse_assoc_expr_* when it encounters these.
  • parse_assoc_expr_* (assoc, ascribe, parse_range_expr): these parse the thing they're named for, using the lhs parse from parse_assoc_expr. The ones with an expression on the right call back into parse_assoc_expr to get the right hand side.
  • parse_prefix_expr: this parses prefix expressions (!,-,*,&) then calls: parse_bottom_expr and parse_dot_or_call_expr_with (passing in argument from parse_bottom_expr)
  • parse_bottom_expr: parses single things like return expressions, parenthesized expressions, etc.
  • parse_dot_or_call_expr_with: parses function calls, index operators, ? operators and field operators, and applies those to the result passed in (usually from parse_bottom_expr)

By jumping directly to parse_dot_or_call_expr_with, it shouldn't be able to parse any associative associative operators, nor as X.

But this introduces new test failures, so we've clearly done something wrong. Maybe this is it?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's correct that parse_dot_or_call_expr_with_ cannot possibly result in ::Cast or ::Type again and will give back the expression passed to it in those cases. However, this is also relying on facts on the ground that may change with time. To prevent such accidents in the future, I think we should make some enhancements:

  • Let's make sure there's a test for expr :/as Type1 :/as Type2 (4 cases, test all of them) and that these are parse failures.
  • When parse_dot_or_call_expr_with_ passes back the expression, it does not reallocate. This means that you should be able to hash the address before the call and after and they should be the same. So this can be added as an ||-ed condition after the !matches(...). This shouldn't change the semantics, but it does add a measure of robustness.

But this introduces new test failures, so we've clearly done something wrong. Maybe this is it?

The failures come from not anticipating that this function is also used for type ascriptions (as noted by @petrochenkov), e.g. foo: Vec<u8>. (See the function right below this one for how that happens.)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense that we want to make it more reliable- we've added in address hashing, or at least what I think is good address hashing.

As for expr :/as Type1 :/as Type2, I've added tests, but they don't fail. Looking at how this behaves prior to these changes, I don't think these are expected to fail? x as T1 as T2 is valid today. This doesn't change that, it'll still just parse x as T1, then fail to see any postfix, then go back up to parse_assoc_expr and parse the second as T2 (or : T2).

Given that we test the expr pointer hash now, if parse_dot_or_call_expr_with_ ever changes to start parsing type ascriptions or casts, then these tests should start to fail because we'll trigger our error message. I think this is OK?


The failures come from not anticipating that this function is also used for type ascriptions (as noted by @petrochenkov), e.g. foo: Vec. (See the function right below this one for how that happens.)

We've gone through and fixed this, I think.

@petrochenkov petrochenkov removed their assignment Feb 9, 2020
@Centril Centril self-assigned this Feb 10, 2020
src/librustc_parse/parser/expr.rs Outdated Show resolved Hide resolved
src/librustc_parse/parser/expr.rs Outdated Show resolved Hide resolved
src/librustc_parse/parser/expr.rs Outdated Show resolved Hide resolved
src/librustc_parse/parser/expr.rs Outdated Show resolved Hide resolved
src/librustc_parse/parser/expr.rs Outdated Show resolved Hide resolved
src/librustc_parse/parser/expr.rs Outdated Show resolved Hide resolved
src/librustc_parse/parser/expr.rs Outdated Show resolved Hide resolved

// The resulting parse tree for `&x as T[0]` has a precedence of `((&x) as T)[0]`.
let with_postfix = self.parse_dot_or_call_expr_with_(expr, span)?;
if !matches!(with_postfix.kind, ExprKind::Cast(_, _)) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's correct that parse_dot_or_call_expr_with_ cannot possibly result in ::Cast or ::Type again and will give back the expression passed to it in those cases. However, this is also relying on facts on the ground that may change with time. To prevent such accidents in the future, I think we should make some enhancements:

  • Let's make sure there's a test for expr :/as Type1 :/as Type2 (4 cases, test all of them) and that these are parse failures.
  • When parse_dot_or_call_expr_with_ passes back the expression, it does not reallocate. This means that you should be able to hash the address before the call and after and they should be the same. So this can be added as an ||-ed condition after the !matches(...). This shouldn't change the semantics, but it does add a measure of robustness.

But this introduces new test failures, so we've clearly done something wrong. Maybe this is it?

The failures come from not anticipating that this function is also used for type ascriptions (as noted by @petrochenkov), e.g. foo: Vec<u8>. (See the function right below this one for how that happens.)

src/librustc_parse/parser/expr.rs Outdated Show resolved Hide resolved
@Centril Centril added S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Feb 10, 2020
@rust-highfive

This comment has been minimized.

@daboross
Copy link
Contributor Author

daboross commented Feb 16, 2020

Thank you for the detailed review!

We've updated this with suggestions & more test fixes.

The checking that we've not reallocated sounds good, but like it might be more strict than this code really needs. I was thinking hashing the whole struct would be more reliable and more exactly what we want. Would that extra reliability be worth the expense, or do you think hashing just the pointer is better?

Besides that, we've done all the improvements we can think of for this PR.

@rust-highfive

This comment has been minimized.

This adds parsing for expressions like 'x as Ty[0]' which will
immediately error out, but still give the rest of the parser a valid
parse tree to continue.
This is almost entirely refactoring and message changing, with the
single behavioral change of panicking for unexpected output.
Previously this just errored out on all usages of type ascription,
which isn't helpful.
This is a modified version of estebank's suggestion, with a bit of
extra cleanup now that we don't need the different cases for if we can
turn a span into a string or not.
@daboross
Copy link
Contributor Author

daboross commented Feb 22, 2020

Added latest suggestions - thanks for those!

src/librustc_parse/parser/expr.rs Outdated Show resolved Hide resolved
src/librustc_parse/parser/expr.rs Outdated Show resolved Hide resolved
Implementes suggeseted changes by Centril.

This checks whether the memory location of the cast remains the same
after atttempting to parse a postfix operator after a cast has been
parsed. If the address is not the same, an illegal postfix operator
was parsed.

Previously the code generated a hash of the pointer, which was overly
complex and inefficent. Casting the pointers and comparing them
is simpler and more effcient.
@daboross
Copy link
Contributor Author

Fixes for those last two suggestions applied!

Copy link
Contributor

@Centril Centril left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good; I'll leave final approval to @estebank in case they have any other remarks; otherwise r=me

@Centril
Copy link
Contributor

Centril commented Mar 4, 2020

Ping @estebank :)

Copy link
Contributor

@estebank estebank left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM overall, other than a couple of nitpicks.

r? @Centril r=Centril

Comment on lines +279 to +285
error: casts cannot be followed by ?
--> $DIR/issue-35813-postfix-after-cast.rs:121:5
|
LL | Err(0u64): Result<u64,u64>?;
| ^^^^^^^^^-^^^^^^^^^^^^^^^^
| |
| help: maybe write a path separator here: `::`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This case seems to be unhandled (but that might be ok).

Comment on lines +631 to +637
// Save the memory location of expr before parsing any following postfix operators.
// This will be compared with the memory location of the output expression.
// If they different we can assume we parsed another expression because the existing expression is not reallocated.
let addr_before = &*cast_expr as *const _ as usize;
let span = cast_expr.span;
let with_postfix = self.parse_dot_or_call_expr_with_(cast_expr, span)?;
let changed = addr_before != &*with_postfix as *const _ as usize;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It feels weird to me to use the memory addresses for this, but I don't see a reason for it to ever fail...

@estebank
Copy link
Contributor

estebank commented Mar 6, 2020

r? @Centril r=Centril

@estebank
Copy link
Contributor

estebank commented Mar 6, 2020

@bors r=Centril

@bors
Copy link
Contributor

bors commented Mar 6, 2020

📌 Commit 453c505 has been approved by Centril

@bors
Copy link
Contributor

bors commented Mar 6, 2020

🌲 The tree is currently closed for pull requests below priority 1000, this pull request will be tested once the tree is reopened

@bors bors added S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. and removed S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. labels Mar 6, 2020
Centril added a commit to Centril/rust that referenced this pull request Mar 7, 2020
Parse & reject postfix operators after casts

This adds an explicit error messages for when parsing `x as Type[0]` or similar expressions. Our add an extra parse case for parsing any postfix operator (dot, indexing, method calls, await) that triggers directly after parsing `as` expressions.

My friend and I worked on this together, but they're still deciding on a github username and thus I'm submitting this for both of us.

It will immediately error out, but will also provide the rest of the parser with a useful parse tree to deal with.

There's one decision we made in how this produces the parse tree. In the situation `&x as T[0]`, one could imagine this parsing as either `&((x as T)[0])` or `((&x) as T)[0]`. We chose the latter for ease of implementation, and as it seemed the most intuitive.

Feedback welcome! This is our first change to the parser section, and it might be completely horrible.

Fixes rust-lang#35813.
bors added a commit that referenced this pull request Mar 7, 2020
Rollup of 9 pull requests

Successful merges:

 - #67741 (When encountering an Item in a pat context, point at the item def)
 - #68985 (Parse & reject postfix operators after casts)
 - #69656 (Use .next() instead of .nth(0) on iterators.)
 - #69680 (rustc_expand: Factor out `Annotatable::into_tokens` to a separate method)
 - #69690 (test(pattern): add tests for combinations of pattern features)
 - #69706 (Use subslice patterns in slice methods)
 - #69727 (Avoid using `unwrap()` in suggestions)
 - #69754 (Update deprecation version to 1.42 for Error::description)
 - #69782 (Don't redundantly repeat field names (clippy::redundant_field_names))

Failed merges:

r? @ghost
@bors bors merged commit 111724f into rust-lang:master Mar 7, 2020
@daboross daboross deleted the fix-35813 branch March 14, 2020 19:53
GuillaumeGomez added a commit to GuillaumeGomez/rust that referenced this pull request Nov 8, 2022
…resses-in-parser-, r=compiler-errors

Remove an address comparison from the parser

Originally this check was added in rust-lang#68985, as suggested by rust-lang@940f657#r376850175. I don't think that this address check is a robust way of making parser more robust.

This code is also extensively tested by [`ui/parser/issues/issue-35813-postfix-after-cast.rs`](https://github.com/rust-lang/rust/blob/57d3c58ed6e0faf89a62411f96c000ffc9fd3937/src/test/ui/parser/issues/issue-35813-postfix-after-cast.rs).

_Replaces #103700_

r? `@compiler-errors`
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Failure to parse "value as Type[index]"
7 participants