gh-105069: Add a readline-like callable to the tokenizer to consume input iteratively #105070

pablogsal · 2023-05-29T19:14:04Z

Issue: Consider not consuming all the buffer in one go in the tokenizer module #105069

pablogsal · 2023-05-29T20:30:56Z

Lib/test/test_tokenize.py

@@ -2668,43 +2704,44 @@ def test_unicode(self):

    def test_invalid_syntax(self):
        def get_tokens(string):
-            return list(_generate_tokens_from_c_tokenizer(string))
-
-        self.assertRaises(SyntaxError, get_tokens, "(1+2]")


This was bothering me 😅

pablogsal · 2023-05-29T20:31:19Z

Lib/tokenize.py


 def generate_tokens(readline):
    """Tokenize a source reading Python code as unicode strings.

    This has the same API as tokenize(), except that it expects the *readline*
    callable to return str objects instead of bytes.
    """
-    def _gen():


Now that we are taking callables all of these can go :)

…sume input iteratively Signed-off-by: Pablo Galindo <pablogsal@gmail.com>

… to consume input iteratively

…kenizer to consume input iteratively

… the tokenizer to consume input iteratively

pablogsal · 2023-05-30T09:08:37Z

Lib/inspect.py

@@ -2203,7 +2203,7 @@ def _signature_strip_non_python_syntax(signature):
        add(string)
        if (string == ','):
            add(' ')
-    clean_signature = ''.join(text).strip()


This change is because for some reason the inspect module is relying on the fact that if lines yielded by the generator do not end in \n then they are concatenated together, which is wrong because the contract says "should yield one line at a time" so if the line doesn't end in newline we add one always.

lysnikolaou

Great job! 💯

A couple of comments and it's good to go!

Parser/tokenizer.c

Lib/tokenize.py

…able to the tokenizer to consume input iteratively

…ke callable to the tokenizer to consume input iteratively

pablogsal · 2023-05-30T16:14:24Z

Fixed the problems and added another test.

@lysnikolaou ready for another review!

pablogsal · 2023-05-30T16:14:36Z

CC: @mgmacias95 wanna make a review?

…line-like callable to the tokenizer to consume input iteratively

mgmacias95

LGTM

miss-islington · 2023-05-30T21:43:45Z

Thanks @pablogsal for the PR 🌮🎉.. I'm working now to backport this PR to: 3.12.
🐍🍒⛏🤖

miss-islington · 2023-05-30T21:43:47Z

Sorry @pablogsal, I had trouble checking out the 3.12 backport branch.
Please retry by removing and re-adding the "needs backport to 3.12" label.
Alternatively, you can backport using cherry_picker on the command line.
cherry_picker 9216e69a87d16d871625721ed5a8aa302511f367 3.12

miss-islington · 2023-05-30T21:44:01Z

Thanks @pablogsal for the PR 🌮🎉.. I'm working now to backport this PR to: 3.12.
🐍🍒⛏🤖

…sume input iteratively (pythonGH-105070) (cherry picked from commit 9216e69) Co-authored-by: Pablo Galindo Salgado <Pablogsal@gmail.com>

bedevere-bot · 2023-05-30T21:44:10Z

GH-105119 is a backport of this pull request to the 3.12 branch.

…nsume input iteratively (GH-105070) (#105119) gh-105069: Add a readline-like callable to the tokenizer to consume input iteratively (GH-105070) (cherry picked from commit 9216e69) Co-authored-by: Pablo Galindo Salgado <Pablogsal@gmail.com>

pablogsal force-pushed the tokenizer_iter branch from 9089ece to ff4d45c Compare May 29, 2023 19:14

bedevere-bot mentioned this pull request May 29, 2023

Consider not consuming all the buffer in one go in the tokenizer module #105069

Closed

pablogsal force-pushed the tokenizer_iter branch from ff4d45c to bc6d2da Compare May 29, 2023 20:02

pablogsal mentioned this pull request May 29, 2023

W391: spurious warnings with python 3.12 beta PyCQA/pycodestyle#1142

Closed

pablogsal force-pushed the tokenizer_iter branch 3 times, most recently from 0a661c4 to 713379f Compare May 29, 2023 20:29

pablogsal marked this pull request as ready for review May 29, 2023 20:30

pablogsal requested a review from lysnikolaou as a code owner May 29, 2023 20:30

pablogsal force-pushed the tokenizer_iter branch from 713379f to 35328a4 Compare May 29, 2023 20:30

pablogsal requested a review from lysnikolaou as a code owner May 29, 2023 20:30

bedevere-bot added the awaiting core review label May 29, 2023

pablogsal commented May 29, 2023

View reviewed changes

pablogsal force-pushed the tokenizer_iter branch from 35328a4 to 1b8ff7e Compare May 29, 2023 20:32

pythongh-105069: Add a readline-like callable to the tokenizer to con…

7caac01

…sume input iteratively Signed-off-by: Pablo Galindo <pablogsal@gmail.com>

pablogsal force-pushed the tokenizer_iter branch from 1b8ff7e to 7caac01 Compare May 29, 2023 21:21

pablogsal added 3 commits May 30, 2023 00:17

fixup! pythongh-105069: Add a readline-like callable to the tokenizer…

8903d0d

… to consume input iteratively

fixup! fixup! pythongh-105069: Add a readline-like callable to the to…

d370087

…kenizer to consume input iteratively

fixup! fixup! fixup! pythongh-105069: Add a readline-like callable to…

2d6f0a6

… the tokenizer to consume input iteratively

pablogsal added the skip news label May 30, 2023

pablogsal commented May 30, 2023

View reviewed changes

lysnikolaou reviewed May 30, 2023

View reviewed changes

Parser/tokenizer.c Show resolved Hide resolved

Lib/tokenize.py Outdated Show resolved Hide resolved

pablogsal added 2 commits May 30, 2023 17:12

fixup! fixup! fixup! fixup! pythongh-105069: Add a readline-like call…

0935371

…able to the tokenizer to consume input iteratively

fixup! fixup! fixup! fixup! fixup! pythongh-105069: Add a readline-li…

9990b7e

…ke callable to the tokenizer to consume input iteratively

fixup! fixup! fixup! fixup! fixup! fixup! pythongh-105069: Add a read…

0598127

…line-like callable to the tokenizer to consume input iteratively

mgmacias95 approved these changes May 30, 2023

View reviewed changes

pablogsal merged commit 9216e69 into python:main May 30, 2023

pablogsal deleted the tokenizer_iter branch May 30, 2023 21:43

bedevere-bot removed the awaiting core review label May 30, 2023

pablogsal added awaiting core review needs backport to 3.12 bug and security fixes labels May 30, 2023

miss-islington assigned pablogsal May 30, 2023

pablogsal added needs backport to 3.12 bug and security fixes and removed needs backport to 3.12 bug and security fixes labels May 30, 2023

bedevere-bot removed the needs backport to 3.12 bug and security fixes label May 30, 2023

erlend-aasland mentioned this pull request Jun 6, 2023

3.12 backport gh 105236 #105358

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

gh-105069: Add a readline-like callable to the tokenizer to consume input iteratively #105070

gh-105069: Add a readline-like callable to the tokenizer to consume input iteratively #105070

pablogsal commented May 29, 2023 •

edited by bedevere-bot

Loading

pablogsal May 29, 2023

pablogsal May 29, 2023

pablogsal May 30, 2023 •

edited

Loading

lysnikolaou left a comment

pablogsal commented May 30, 2023

pablogsal commented May 30, 2023

mgmacias95 left a comment

miss-islington commented May 30, 2023

miss-islington commented May 30, 2023

miss-islington commented May 30, 2023

bedevere-bot commented May 30, 2023

gh-105069: Add a readline-like callable to the tokenizer to consume input iteratively #105070

gh-105069: Add a readline-like callable to the tokenizer to consume input iteratively #105070

Conversation

pablogsal commented May 29, 2023 • edited by bedevere-bot Loading

pablogsal May 29, 2023

Choose a reason for hiding this comment

pablogsal May 29, 2023

Choose a reason for hiding this comment

pablogsal May 30, 2023 • edited Loading

Choose a reason for hiding this comment

lysnikolaou left a comment

Choose a reason for hiding this comment

pablogsal commented May 30, 2023

pablogsal commented May 30, 2023

mgmacias95 left a comment

Choose a reason for hiding this comment

miss-islington commented May 30, 2023

miss-islington commented May 30, 2023

miss-islington commented May 30, 2023

bedevere-bot commented May 30, 2023

pablogsal commented May 29, 2023 •

edited by bedevere-bot

Loading

pablogsal May 30, 2023 •

edited

Loading