Consider emitting buffered DEDENT tokens on the last line #104976

pablogsal · 2023-05-26T13:29:30Z

In Python 3.12, porting the tokenizer to use the C tokenizer underneath to support PEP 701 has now a documented change in docs.python.org/3.12/whatsnew/3.12.html#changes-in-the-python-api:

Some final DEDENT tokens are now emitted within the bounds of the input. This means that for a file containing 3 lines, the old version of the tokenizer returned a DEDENT token in line 4 whilst the new version returns the token in line 3.

Apparently, this affects negatively some formatting tools (see PyCQA/pycodestyle#1142). Let's consider what options do we have and see if we can fix this without adding a lot of maintenance burden to the C tokenizer or slowing down everything.

Linked PRs

pablogsal · 2023-05-26T13:40:20Z

We can do something like this:

diff --git a/Lib/tokenize.py b/Lib/tokenize.py
index 911f0f12f9..63dc44b0dc 100644
--- a/Lib/tokenize.py
+++ b/Lib/tokenize.py
@@ -452,7 +452,9 @@ def _tokenize(rl_gen, encoding):
         yield token
     if token is not None:
         last_line, _ = token.start
-        yield TokenInfo(ENDMARKER, '', (last_line + 1, 0), (last_line + 1, 0), '')
+        if token.type != DEDENT:
+            last_line != 1
+        yield TokenInfo(ENDMARKER, '', (last_line, 0), (last_line, 0), '')


 def generate_tokens(readline):
diff --git a/Python/Python-tokenize.c b/Python/Python-tokenize.c
index 88087c1256..2a09dfd94a 100644
--- a/Python/Python-tokenize.c
+++ b/Python/Python-tokenize.c
@@ -214,6 +214,10 @@ tokenizeriter_next(tokenizeriterobject *it)
     }

     if (it->tok->tok_extra_tokens) {
+        if (type == DEDENT && it->tok->done == E_EOF) {
+            lineno = end_lineno = lineno + 1;
+            col_offset = end_col_offset = 0;
+        }
         // Necessary adjustments to match the original Python tokenize
         // implementation
         if (type > DEDENT && type < OP) {

but I think this forces us to somehow handle the ENDMARKER internally. Maybe that's a possible solution but I fear this still has some side effects.

pablogsal · 2023-05-26T13:40:40Z

@lysnikolaou thoughts?

…vious tokenizer

pablogsal · 2023-05-26T14:14:20Z

CC: @mgmacias95

pablogsal · 2023-05-26T14:14:49Z

Opened #104980 to test this idea

lysnikolaou · 2023-05-26T14:16:02Z

Yeah, I think that, if we want to support doing the same thing as 3.11, the only way is to special-case it Python-tokenize.c and not in the C tokenizer itself.

…vious tokenizer Signed-off-by: Pablo Galindo <pablogsal@gmail.com>

pablogsal · 2023-05-26T14:53:38Z

Yeah, I think that, if we want to support doing the same thing as 3.11, the only way is to special-case it Python-tokenize.c and not in the C tokenizer itself.

Ok, then check if you like #104980

…vious tokenizer Signed-off-by: Pablo Galindo <pablogsal@gmail.com>

…vious tokenizer (pythonGH-104980) (cherry picked from commit 46b52e6) Co-authored-by: Pablo Galindo Salgado <Pablogsal@gmail.com> Signed-off-by: Pablo Galindo <pablogsal@gmail.com>

…tokenizer (#104980) Signed-off-by: Pablo Galindo <pablogsal@gmail.com>

…evious tokenizer (GH-104980) (#105000)

pablogsal mentioned this issue May 26, 2023

W391: spurious warnings with python 3.12 beta PyCQA/pycodestyle#1142

Closed

bedevere-bot mentioned this issue May 26, 2023

gh-104976: Ensure trailing dedent tokens are emitted as the previous tokenizer #104980

Merged

pablogsal added a commit to pablogsal/cpython that referenced this issue May 26, 2023

pythongh-104976: Ensure trailing dedent tokens are emitted as the pre…

3bdac97

…vious tokenizer

pablogsal self-assigned this May 26, 2023

pablogsal added a commit to pablogsal/cpython that referenced this issue May 26, 2023

pythongh-104976: Ensure trailing dedent tokens are emitted as the pre…

709f577

…vious tokenizer Signed-off-by: Pablo Galindo <pablogsal@gmail.com>

pablogsal added a commit to pablogsal/cpython that referenced this issue May 26, 2023

pythongh-104976: Ensure trailing dedent tokens are emitted as the pre…

7c0113e

…vious tokenizer Signed-off-by: Pablo Galindo <pablogsal@gmail.com>

pablogsal added a commit to pablogsal/cpython that referenced this issue May 26, 2023

pythongh-104976: Ensure trailing dedent tokens are emitted as the pre…

6db032a

…vious tokenizer Signed-off-by: Pablo Galindo <pablogsal@gmail.com>

pablogsal added a commit to pablogsal/cpython that referenced this issue May 26, 2023

pythongh-104976: Ensure trailing dedent tokens are emitted as the pre…

8a23a18

…vious tokenizer Signed-off-by: Pablo Galindo <pablogsal@gmail.com>

pablogsal added a commit to pablogsal/cpython that referenced this issue May 26, 2023

pythongh-104976: Ensure trailing dedent tokens are emitted as the pre…

19a58c5

…vious tokenizer Signed-off-by: Pablo Galindo <pablogsal@gmail.com>

pablogsal added a commit to pablogsal/cpython that referenced this issue May 26, 2023

pythongh-104976: Ensure trailing dedent tokens are emitted as the pre…

a1a0a48

…vious tokenizer Signed-off-by: Pablo Galindo <pablogsal@gmail.com>

pablogsal added a commit to pablogsal/cpython that referenced this issue May 26, 2023

pythongh-104976: Ensure trailing dedent tokens are emitted as the pre…

54fc1d5

…vious tokenizer Signed-off-by: Pablo Galindo <pablogsal@gmail.com>

asottile mentioned this issue May 26, 2023

Python 3.12 support pytest-dev/pytest#10894

Merged

bedevere-bot mentioned this issue May 26, 2023

[3.12] gh-104976: Ensure trailing dedent tokens are emitted as the previous tokenizer (GH-104980) #105000

Merged

pablogsal added a commit that referenced this issue May 26, 2023

gh-104976: Ensure trailing dedent tokens are emitted as the previous …

46b52e6

…tokenizer (#104980) Signed-off-by: Pablo Galindo <pablogsal@gmail.com>

pablogsal pushed a commit that referenced this issue May 26, 2023

[3.12] gh-104976: Ensure trailing dedent tokens are emitted as the pr…

2c02c68

…evious tokenizer (GH-104980) (#105000)

pablogsal closed this as completed May 26, 2023

erlend-aasland mentioned this issue Jun 6, 2023

3.12 backport gh 105236 #105358

Closed

jayaddison mentioned this issue Jun 6, 2023

test_intl: resolve test flakiness by using integer nanosecond timestamps during file modification detection sphinx-doc/sphinx#11435

Merged

Erotemic mentioned this issue Oct 23, 2023

Tokenize generate_tokens regression in CPython 3.12 #111224

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Consider emitting buffered DEDENT tokens on the last line #104976

Consider emitting buffered DEDENT tokens on the last line #104976

pablogsal commented May 26, 2023 •

edited by bedevere-bot

Loading

pablogsal commented May 26, 2023 •

edited

Loading

pablogsal commented May 26, 2023

pablogsal commented May 26, 2023

pablogsal commented May 26, 2023

lysnikolaou commented May 26, 2023

pablogsal commented May 26, 2023

Consider emitting buffered DEDENT tokens on the last line #104976

Consider emitting buffered DEDENT tokens on the last line #104976

Comments

pablogsal commented May 26, 2023 • edited by bedevere-bot Loading

Linked PRs

pablogsal commented May 26, 2023 • edited Loading

pablogsal commented May 26, 2023

pablogsal commented May 26, 2023

pablogsal commented May 26, 2023

lysnikolaou commented May 26, 2023

pablogsal commented May 26, 2023

pablogsal commented May 26, 2023 •

edited by bedevere-bot

Loading

pablogsal commented May 26, 2023 •

edited

Loading