Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix: for in / of statements containing new lines #332

Merged
merged 7 commits into from
Aug 17, 2024

Conversation

jackschu
Copy link
Contributor

@jackschu jackschu commented Jul 8, 2024

The impact of this diff is that inputs like the following will parse correctly

for(let b
  of 
a);

Previously, this would result in this AST because automatic semicolons would be inserted

program [0, 0] - [3, 0]
  for_statement [0, 0] - [2, 3]
    initializer: lexical_declaration [0, 4] - [0, 9]
      variable_declarator [0, 8] - [0, 9]
        name: identifier [0, 8] - [0, 9]
    condition: expression_statement [1, 2] - [1, 4]
      identifier [1, 2] - [1, 4]
    increment: identifier [2, 0] - [2, 1]
    body: empty_statement [2, 2] - [2, 3]

Now this parses as

(program [0, 0] - [2, 3]
  (for_in_statement [0, 0] - [2, 3]
    left: (identifier [0, 8] - [0, 9])
    right: (identifier [2, 0] - [2, 1])
    body: (empty_statement [2, 2] - [2, 3])))

Note that before this patch this behavior could result in more severe parse failures as

for(let b
  of a);

becoming

program [0, 0] - [2, 0]
  for_statement [0, 0] - [1, 8]
    initializer: lexical_declaration [0, 4] - [0, 9]
      variable_declarator [0, 8] - [0, 9]
        name: identifier [0, 8] - [0, 9]
    condition: expression_statement [1, 2] - [1, 4]
      identifier [1, 2] - [1, 4]
      MISSING ; [1, 4] - [1, 4]
    increment: identifier [1, 5] - [1, 6]
    body: empty_statement [1, 7] - [1, 8]

This diff addresses an issue with the way the for_statement handles semicolons

Previously the for_statement did not require semicolons. Note that requiring the semicolon literal is not overly restrictive because automatic semicolons are disallowed by the spec within for statements:

However, there is an additional overriding condition on the preceding rules: a semicolon is never inserted automatically if the semicolon would then be parsed as an empty statement or if that semicolon would become one of the two semicolons in the header of a for statement (see 14.7.4).

The for statement also previously considered itself composed of expression_statements but this is problematic for having real semicolons that arent consumed by $._semicolons
The spec's discussion of the for statement also considers the for_statement to be composed of expressions, not expression statements. So this PR alters that as well.

With this setup, I can add an an optional automatic semicolon to the _for_header used by the for_in_statement so that GLR can kick in and disambiguate rather than considering for(let x <newline> to be concretely a for_statement

grammar.js Show resolved Hide resolved
@amaanq amaanq merged commit d8f277f into tree-sitter:master Aug 17, 2024
4 checks passed
@amaanq
Copy link
Member

amaanq commented Aug 17, 2024

I rebased & regenerated to fix the conflicts, thanks again @jackschu

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants