Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Parsing inconsistencies (lambda, proc, return) #28784

Closed
rprichard opened this issue Oct 1, 2015 · 17 comments
Closed

Parsing inconsistencies (lambda, proc, return) #28784

rprichard opened this issue Oct 1, 2015 · 17 comments
Labels
A-parser Area: The parsing of Rust source code to an AST. C-bug Category: This is a bug. E-easy Call for participation: Easy difficulty. Experience needed to fix: Not much. Good first issue. P-low Low priority T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. T-lang Relevant to the language team, which will review and decide on the PR/issue.

Comments

@rprichard
Copy link
Contributor

I found more inconsistencies between rustc and parser-lalr.

I also noticed that Rust allows return expressions and lambda expressions to end with a struct literal, even when they're in a nostruct context. This seems inconsistent to me.

Lambdas (the two parsers disagree):

struct A { a: i32 }
fn lambda_expr_nostruct() -> A {
    // rustc accepts this, but parser-lalr does not.
    match || A { a: 123 } {
        f => f()
    }
}

Return expressions (the two parsers agree):

struct A { a: i32 }
fn return_ambiguity_1() -> A {
    match A { a: 1 } { x => x } // rejected by rustc and parser-lalr
}
fn return_ambiguity_2() -> A {
    match return A { a: 1 } { _ => A { a: 1 } } // accepted by rustc and parser-lalr
}

The rustc and parser-lalr parsers disagree about whether a bare return expression can be cast:

fn cast_of_return() {
    // rustc rejects, parser-lalr accepts
    // error: expected identifier, found keyword `as`
    return as ();
    (return as ());

    return == (); // rustc accepts, parser-lalr accepts
    loop {
        continue as (); // rustc accepts, parser-lalr accepts
        continue == (); // rustc accepts, parser-lalr accepts
        break as ();    // rustc accepts, parser-lalr accepts
        break == ();    // rustc accepts, parser-lalr accepts
    };
}

Finally, I also noticed these two differences, which seem much less interesting to me. The grammar is probably just out-of-date or buggy:

lambda sometimes requires braces:

fn lambda_braces() {
    // parser-lalr accepts this, but rustc does not.  I think this is an
    // obvious bug in the parser-lalr.y grammar.  If there is a return type,
    // then curly braces are required.
    let _x = || -> i32 10;
}

proc is obsolete:

fn proc_syntax() {
    // parser-lalr also accepts this.  I think the proc syntax is obsolete, and
    // the {proc_expr, proc_expr_nostruct} non-terminals could be removed from
    // parser-lalr.y.
    let _x = proc() {};
}
@steveklabnik
Copy link
Member

/cc @rust-lang/lang , can we disambiguate what's intended here?

@nikomatsakis nikomatsakis added the A-parser Area: The parsing of Rust source code to an AST. label Oct 6, 2015
@nikomatsakis
Copy link
Contributor

I agree with @rprichard's take for the most part. I guess that return as i32 ought to parse...weird as it is.

triage: P-medium

@rust-highfive rust-highfive added the P-medium Medium priority label Oct 6, 2015
@nikomatsakis nikomatsakis added the T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. label Oct 6, 2015
@brson brson added P-low Low priority E-easy Call for participation: Easy difficulty. Experience needed to fix: Not much. Good first issue. and removed P-medium Medium priority labels Jul 14, 2016
@brson
Copy link
Contributor

brson commented Jul 14, 2016

Probably a good beginner bug for someone familiar with parsing.

@brson
Copy link
Contributor

brson commented Jul 15, 2016

Make the rustc parser behavie like parser-lalr, per the op.

@neunenak
Copy link
Contributor

I'm reaaonably familiar with parsers and I'd like to take a crack at this bug.

@brson
Copy link
Contributor

brson commented Jul 20, 2016

@neunenak You got it!

@brson
Copy link
Contributor

brson commented Jul 20, 2016

@nikomatsakis @nrc @pnkfelix Can you confirm the right thing to do here is make libsyntax behave like the reference parser as described in the OP?

@nrc
Copy link
Member

nrc commented Jul 20, 2016

It's probably worth addressing any inconsistencies between the parser the reference on an individual basis. I don't have enough faith in the reference to say it is always right.

My opinions on the issues in the OP:

I think return starts a new context, so it is OK to accept a struct literal after it. I don't see a parsing ambiguity there, so I think it is OK. Since the parser and reference agree, I don't think there is anything to do here.

return expression - I thought return was a statement, so I am not qualified to offer an opinion here as my intuition is messed up. Could someone explain why it is an expression not a statement please?

agree on the last two points - bugs in the reference.

@brson
Copy link
Contributor

brson commented Jul 22, 2016

return is mostly an expression because most everything's an expression, but it does serve some purposes, like allowing it to appear in match arms.

@cramertj
Copy link
Member

@neunenak Are you still interested in this? If not, I'd like to take a stab at it.

@neunenak
Copy link
Contributor

neunenak commented Jan 24, 2017 via email

@cramertj
Copy link
Member

cramertj commented Jan 24, 2017

Thanks! I took a look at the parser, and I think the issue with return is this match. Many (keyword) identifiers cannot appear at the start of an expression, as being just one of these. Should I just cover as for now, or should I match out all identifiers which cannot be in the start of an expression?

Edit: for context, the can_begin_expr method is used by the parser to determine whether or not return is returning a value or whether it is just a bare return (see here).

@cramertj
Copy link
Member

cramertj commented Jan 25, 2017

I went ahead and opened #39303 fixing can_begin_expr for the as case. Let me know if I should fix it for any other keywords.

Moving forward, what's left to do for this issue? Are all the rest just parser-lalr fixes? I don't think the lambda issue can be fixed backward-compatibly, can it? (Since we'd be stopping something from parsing that used to parse.)

Edit: wound up making a new PR handling the remaining can_begin_expr cases on @petrochenkov's advice.

@nikomatsakis
Copy link
Contributor

@cramertj a good question. One of my long-standing "to do" items has been to pick up work on rustypop -- in particular adding a test harness -- as a replacement for parser-lalr. When I was porting the LALR grammar, I found a number of irregularities that struck me as wrong -- often holdovers from the early days of Rust -- as well as various things whose meaning were not obvious (e.g., precedence tricks). The LALRPOP port is free of those problems.

Separately, I think we have put off reaching a firm decision on a lot of these questions. This needs some organization and I think no one has had the time.

I would love to find someone who would be interested in collaborating with me on one or both aspects of this project. If you are interesting, please ping me on irc (nmatsakis) or drop me an e-mail (nmatsakis@mozilla.com).

@cramertj
Copy link
Member

@nikomatsakis Email'd you.

alexcrichton added a commit to alexcrichton/rust that referenced this issue Jan 28, 2017
…x, r=petrochenkov

Fix can_begin_expr keyword behavior

Partial fix for rust-lang#28784.
@steveklabnik steveklabnik added T-lang Relevant to the language team, which will review and decide on the PR/issue. and removed A-lang labels Mar 24, 2017
@Mark-Simulacrum Mark-Simulacrum added the C-bug Category: This is a bug. label Jul 24, 2017
@colinmarsh19
Copy link
Contributor

This seems like an interesting fix. I'll take a look at it and see what I can figure out. -- Colin

@jonas-schievink
Copy link
Contributor

parser-lalr has been removed in #64896. Closing in favor of the work done by the grammar working group in https://github.com/rust-lang/wg-grammar/.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-parser Area: The parsing of Rust source code to an AST. C-bug Category: This is a bug. E-easy Call for participation: Easy difficulty. Experience needed to fix: Not much. Good first issue. P-low Low priority T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. T-lang Relevant to the language team, which will review and decide on the PR/issue.
Projects
None yet
Development

No branches or pull requests