-
-
Notifications
You must be signed in to change notification settings - Fork 30.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Token lines in the tokenize module always end in new line #104825
Comments
Yeah, I ran across that while working on #104798. Are you already working on a fix or shall I take it up? |
I am working on a fix, should be up in half an hour or so |
…ens emitted in the tokenize module
…itted in the tokenize module (#104846)
… in tokens emitted in the tokenize module (pythonGH-104846). (cherry picked from commit c8cf9b4) Co-authored-by: Pablo Galindo Salgado <Pablogsal@gmail.com>
#104846 now removes a newline, even when a newline exists in the original source. ❯ ./python.exe
Python 3.13.0a0 (heads/setuptools-peg-generator-dirty:c7503656c2, May 24 2023, 12:24:20) [Clang 14.0.3 (clang-1403.0.22.14.1)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import tokenize
>>> import io
>>> file = io.StringIO('1 + 2 + 3\n')
>>> for token in tokenize.generate_tokens(file.readline):
... print(token)
...
TokenInfo(type=2 (NUMBER), string='1', start=(1, 0), end=(1, 1), line='1 + 2 + 3')
TokenInfo(type=55 (OP), string='+', start=(1, 2), end=(1, 3), line='1 + 2 + 3')
TokenInfo(type=2 (NUMBER), string='2', start=(1, 4), end=(1, 5), line='1 + 2 + 3')
TokenInfo(type=55 (OP), string='+', start=(1, 6), end=(1, 7), line='1 + 2 + 3')
TokenInfo(type=2 (NUMBER), string='3', start=(1, 8), end=(1, 9), line='1 + 2 + 3')
TokenInfo(type=4 (NEWLINE), string='\n', start=(1, 9), end=(1, 10), line='1 + 2 + 3')
TokenInfo(type=0 (ENDMARKER), string='', start=(2, 0), end=(2, 0), line='') |
@pablogsal Do you think the above should be fixed? Will you work on it? |
Hummmmm, not sure how easy is going to be for us to distinguish between implicit and explicit newlines at the end. I would say, leave it like is now unless you disagree, in which case let's see if we can fix it easily. |
My only concern is that tests that rely on that will fail. For example, def test_repeat_0_simple(self) -> None:
grammar = """
start: thing thing* NEWLINE
thing: NUMBER
"""
parser_class = make_parser(grammar)
node = parse_string("1 2 3\n", parser_class)
self.assertEqual(
node,
[
TokenInfo(NUMBER, string="1", start=(1, 0), end=(1, 1), line="1 2 3\n"),
[
TokenInfo(
NUMBER, string="2", start=(1, 2), end=(1, 3), line="1 2 3\n"
),
TokenInfo(
NUMBER, string="3", start=(1, 4), end=(1, 5), line="1 2 3\n"
),
],
TokenInfo(
NEWLINE, string="\n", start=(1, 5), end=(1, 6), line="1 2 3\n"
),
],
)
node = parse_string("1\n", parser_class)
self.assertEqual(
node,
[
TokenInfo(NUMBER, string="1", start=(1, 0), end=(1, 1), line="1\n"),
[],
TokenInfo(NEWLINE, string="\n", start=(1, 1), end=(1, 2), line="1\n"),
],
) The failure is:
The fix is very simple, so I'm okay with going with this as-is. I will fix |
Order of events: Terry merged new idlelib test into main. Ms. I. made a 3.12 backport; tests passed. Pablo merged the tokenize change with idlelib test fix into main. Pablo merged a 3.12 backport without the idle test fix as the backport of the latter had not yet been been merged. Terry merged the idlelib test backport. The new test failed on at least 4 3.12 buildbots because of the tokenize change. This PR backports the now needed idlelib test fix. (cherry picked from commit c8cf9b4)
Order of events: Terry merged new idlelib test into main. Ms. I. made a 3.12 backport; tests passed. Pablo merged the tokenize change with idlelib test fix into main. Pablo merged a 3.12 backport without the idle test fix as the backport of the latter had not yet been been merged. Terry merged the idlelib test backport. The new test failed on at least 4 3.12 buildbots because of the tokenize change. This PR backports the now needed idlelib test fix. (cherry picked from commit c8cf9b4)
You say:
But it looks to me like the lines used to always end in newline, except for ENDMARKER. This uses tokbug.py from #104972:
|
As a consequence of the changes in the Python tokenizer post PEP701, now the
line
attribute always ends in a new line character. We should remove this as it adds no information and wasn't there before.Linked PRs
The text was updated successfully, but these errors were encountered: