Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Always emit non-logical newlines for 'empty' lines #27

Merged
merged 1 commit into from
May 15, 2023

Conversation

charliermarsh
Copy link
Contributor

Summary

Right now, if you have a comment like:

# foo

The lexer emits a comment, but no newline. It turns out that if the lexer encounters an "empty" line, we skip the newline emission, and a comment counts as an "empty" line (see: eat_indentation, where we eat indentation and comments).

This PR modifies the lexer to emit a NonLogicalNewline in such cases. As a result, we'll now always have either a newline or non-logical newline token at the end of a line (excepting continuations). I believe this is more consistent with CPython. For example, given this snippet:

# Some comment

def foo():
    return 99

CPython outputs:

TokenInfo(type=62 (ENCODING), string='utf-8', start=(0, 0), end=(0, 0), line='')
TokenInfo(type=60 (COMMENT), string='# Some comment', start=(1, 0), end=(1, 14), line='# Some comment\n')
TokenInfo(type=61 (NL), string='\n', start=(1, 14), end=(1, 15), line='# Some comment\n')
TokenInfo(type=61 (NL), string='\n', start=(2, 0), end=(2, 1), line='\n')
TokenInfo(type=1 (NAME), string='def', start=(3, 0), end=(3, 3), line='def foo():\n')
TokenInfo(type=1 (NAME), string='foo', start=(3, 4), end=(3, 7), line='def foo():\n')
TokenInfo(type=54 (OP), string='(', start=(3, 7), end=(3, 8), line='def foo():\n')
TokenInfo(type=54 (OP), string=')', start=(3, 8), end=(3, 9), line='def foo():\n')
TokenInfo(type=54 (OP), string=':', start=(3, 9), end=(3, 10), line='def foo():\n')
TokenInfo(type=4 (NEWLINE), string='\n', start=(3, 10), end=(3, 11), line='def foo():\n')
TokenInfo(type=5 (INDENT), string='    ', start=(4, 0), end=(4, 4), line='    return 99\n')
TokenInfo(type=1 (NAME), string='return', start=(4, 4), end=(4, 10), line='    return 99\n')
TokenInfo(type=2 (NUMBER), string='99', start=(4, 11), end=(4, 13), line='    return 99\n')
TokenInfo(type=4 (NEWLINE), string='\n', start=(4, 13), end=(4, 14), line='    return 99\n')
TokenInfo(type=61 (NL), string='\n', start=(5, 0), end=(5, 1), line='\n')
TokenInfo(type=6 (DEDENT), string='', start=(6, 0), end=(6, 0), line='')
TokenInfo(type=0 (ENDMARKER), string='', start=(6, 0), end=(6, 0), line='')

Note the NL tokens after the comment, and for the empty line, along with the NL token at the end prior to the dedent.

@charliermarsh
Copy link
Contributor Author

\cc @MichaReiser

@@ -625,7 +625,10 @@ where
}
Some('\n' | '\r') => {
// Empty line!
let tok_start = self.get_pos();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unrelated to your changes. I think this emits two newlines when using \r\n instead of one.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this emits two newlines when using \r\n instead of one

yeah, that should be the case, could probably special case this by looking at the next char and just advancing but can't remember if that caused issues when I was tweaking this a while ago.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it might actually work correctly? I think next_char already does the advancement:

// Helper function to go to the next character coming up.
fn next_char(&mut self) -> Option<char> {
    let mut c = self.window[0];
    self.window.slide();
    match c {
        Some('\r') => {
            if self.window[0] == Some('\n') {
                self.location += TextSize::from(1);
                self.window.slide();
            }

            self.location += TextSize::from(1);
            c = Some('\n');
        }
        #[allow(unused_variables)]
        Some(c) => {
            self.location += c.text_len();
        }
        _ => {}
    }
    c
}

Copy link
Member

@DimitrisJim DimitrisJim May 15, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah, I had forgotten about that 😄 Test would also probably fail if this was the this case.

Copy link
Member

@youknowone youknowone left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The change looks reasonable.
Due to lack of tests of this repository,
You may want to make a ruff port before merging this PR.

Please feel free to merge it when you ready to go.

@youknowone
Copy link
Member

Ruff PR changed: astral-sh/ruff#4438

@charliermarsh charliermarsh force-pushed the charlie/newline branch 2 times, most recently from 4a738f0 to e1f408e Compare May 15, 2023 15:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants