-
-
Notifications
You must be signed in to change notification settings - Fork 30.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
gh-105069: Add a readline-like callable to the tokenizer to consume input iteratively #105070
Conversation
pablogsal
commented
May 29, 2023
•
edited by bedevere-bot
Loading
edited by bedevere-bot
- Issue: Consider not consuming all the buffer in one go in the tokenizer module #105069
0a661c4
to
713379f
Compare
@@ -2668,43 +2704,44 @@ def test_unicode(self): | |||
|
|||
def test_invalid_syntax(self): | |||
def get_tokens(string): | |||
return list(_generate_tokens_from_c_tokenizer(string)) | |||
|
|||
self.assertRaises(SyntaxError, get_tokens, "(1+2]") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was bothering me 😅
|
||
def generate_tokens(readline): | ||
"""Tokenize a source reading Python code as unicode strings. | ||
|
||
This has the same API as tokenize(), except that it expects the *readline* | ||
callable to return str objects instead of bytes. | ||
""" | ||
def _gen(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Now that we are taking callables all of these can go :)
…sume input iteratively Signed-off-by: Pablo Galindo <pablogsal@gmail.com>
… to consume input iteratively
…kenizer to consume input iteratively
… the tokenizer to consume input iteratively
@@ -2203,7 +2203,7 @@ def _signature_strip_non_python_syntax(signature): | |||
add(string) | |||
if (string == ','): | |||
add(' ') | |||
clean_signature = ''.join(text).strip() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This change is because for some reason the inspect module is relying on the fact that if lines yielded by the generator do not end in \n
then they are concatenated together, which is wrong because the contract says "should yield one line at a time" so if the line doesn't end in newline we add one always.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great job! 💯
A couple of comments and it's good to go!
…able to the tokenizer to consume input iteratively
…ke callable to the tokenizer to consume input iteratively
Fixed the problems and added another test. @lysnikolaou ready for another review! |
CC: @mgmacias95 wanna make a review? |
…line-like callable to the tokenizer to consume input iteratively
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Thanks @pablogsal for the PR 🌮🎉.. I'm working now to backport this PR to: 3.12. |
Sorry @pablogsal, I had trouble checking out the |
Thanks @pablogsal for the PR 🌮🎉.. I'm working now to backport this PR to: 3.12. |
…sume input iteratively (pythonGH-105070) (cherry picked from commit 9216e69) Co-authored-by: Pablo Galindo Salgado <Pablogsal@gmail.com>
GH-105119 is a backport of this pull request to the 3.12 branch. |