Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Possible regression in Python3.12 tokenizer #105013

Closed
arkamar opened this issue May 27, 2023 · 15 comments
Closed

Possible regression in Python3.12 tokenizer #105013

arkamar opened this issue May 27, 2023 · 15 comments
Assignees
Labels
interpreter-core (Objects, Python, Grammar, and Parser dirs) stdlib Python modules in the Lib dir type-bug An unexpected behavior, bug, or error

Comments

@arkamar
Copy link
Contributor

arkamar commented May 27, 2023

Bug report

I have encountered a possible regression while I was testing pyrsistent with python3.12_beta1, see tobgu/pyrsistent#275. Tests fail with following error:

/usr/lib/python3.12/site-packages/_pytest/python.py:617: in _importtestmodule
    mod = import_path(self.path, mode=importmode, root=self.config.rootpath)
/usr/lib/python3.12/site-packages/_pytest/pathlib.py:564: in import_path
    importlib.import_module(module_name)
/usr/lib/python3.12/importlib/__init__.py:90: in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
<frozen importlib._bootstrap>:1293: in _gcd_import
    ???
<frozen importlib._bootstrap>:1266: in _find_and_load
    ???
<frozen importlib._bootstrap>:1237: in _find_and_load_unlocked
    ???
<frozen importlib._bootstrap>:841: in _load_unlocked
    ???
/usr/lib/python3.12/site-packages/_pytest/assertion/rewrite.py:178: in exec_module
    exec(co, module.__dict__)
tests/hypothesis_vector_test.py:48: in <module>
    PVectorAndLists = st.lists(st.builds(RefCountTracker)).map(
/usr/lib/python3.12/site-packages/hypothesis/strategies/_internal/strategies.py:347: in map
    if is_identity_function(pack):
/usr/lib/python3.12/site-packages/hypothesis/internal/reflection.py:638: in is_identity_function
    return bool(re.fullmatch(r"lambda (\w+): \1", get_pretty_function_description(f)))
/usr/lib/python3.12/site-packages/hypothesis/internal/reflection.py:432: in get_pretty_function_description
    return extract_lambda_source(f)
/usr/lib/python3.12/site-packages/hypothesis/internal/reflection.py:312: in extract_lambda_source
    source = inspect.getsource(f)
/usr/lib/python3.12/inspect.py:1274: in getsource
    lines, lnum = getsourcelines(object)
/usr/lib/python3.12/inspect.py:1266: in getsourcelines
    return getblock(lines[lnum:]), lnum + 1
/usr/lib/python3.12/inspect.py:1241: in getblock
    for _token in tokens:
/usr/lib/python3.12/tokenize.py:451: in _tokenize
    for token in _generate_tokens_from_c_tokenizer(source, extra_tokens=True):
/usr/lib/python3.12/tokenize.py:542: in _generate_tokens_from_c_tokenizer
    for info in c_tokenizer.TokenizerIter(source, extra_tokens=extra_tokens):
E     File "<string>", line 1
E       lambda l: (l, pvector(l)))
E                                ^
E   SyntaxError: unmatched ')'

It is triggered for this code (see https://github.com/tobgu/pyrsistent/blob/cc90f3e2b339653fde0df422aaf3ccdb3fc1225d/tests/hypothesis_vector_test.py#L48-L49 )

PVectorAndLists = st.lists(st.builds(RefCountTracker)).map(
    lambda l: (l, pvector(l)))

if I concatenate it to one line, everything works. I also tried to backport fix for #104866 as it sounds relevant, but it didn't solved the issue.

Your environment

  • Gentoo ~amd64
  • Python 3.12.0b1

Linked PRs

@arkamar arkamar added the type-bug An unexpected behavior, bug, or error label May 27, 2023
@arhadthedev arhadthedev added the stdlib Python modules in the Lib dir label May 27, 2023
gentoo-bot pushed a commit to gentoo/gentoo that referenced this issue May 27, 2023
@arhadthedev
Copy link
Member

arhadthedev commented May 27, 2023

The typo seems to be in pyrsistent itself.

https://github.com/tobgu/pyrsistent/blob/cc90f3e2b339653fde0df422aaf3ccdb3fc1225d/tests/hypothesis_vector_test.py#L49:

    lambda l: (l, pvector(l)))

@arkamar
Copy link
Contributor Author

arkamar commented May 27, 2023

Sorry, I don't follow, where is the typo? Why this works

PVectorAndLists = st.lists(st.builds(RefCountTracker)).map(lambda l: (l, pvector(l)))

and this doesn't

PVectorAndLists = st.lists(st.builds(RefCountTracker)).map(
    lambda l: (l, pvector(l)))

btw, the wrapped version works in Python3.10 and Python3.11

@arhadthedev
Copy link
Member

Correction: the third parenthesis comes from the line above (found out preparing a PR for pyrsistent):

PVectorAndLists = st.lists(st.builds(RefCountTracker)).map(
    lambda l: (l, pvector(l)))

@arhadthedev arhadthedev added the interpreter-core (Objects, Python, Grammar, and Parser dirs) label May 27, 2023
@pablogsal
Copy link
Member

pablogsal commented May 27, 2023

This is not failing on main for me:

❯ cat lel.py
PVectorAndLists = st.lists(st.builds(RefCountTracker)).map(
    lambda l: (l, pvector(l)))

❯ ./python -m tokenize < lel.py
1,0-1,15:           NAME           'PVectorAndLists'
1,16-1,17:          OP             '='
1,18-1,20:          NAME           'st'
1,20-1,21:          OP             '.'
1,21-1,26:          NAME           'lists'
1,26-1,27:          OP             '('
1,27-1,29:          NAME           'st'
1,29-1,30:          OP             '.'
1,30-1,36:          NAME           'builds'
1,36-1,37:          OP             '('
1,37-1,52:          NAME           'RefCountTracker'
1,52-1,53:          OP             ')'
1,53-1,54:          OP             ')'
1,54-1,55:          OP             '.'
1,55-1,58:          NAME           'map'
1,58-1,59:          OP             '('
1,59-1,60:          NL             '\n'
2,4-2,10:           NAME           'lambda'
2,11-2,12:          NAME           'l'
2,12-2,13:          OP             ':'
2,14-2,15:          OP             '('
2,15-2,16:          NAME           'l'
2,16-2,17:          OP             ','
2,18-2,25:          NAME           'pvector'
2,25-2,26:          OP             '('
2,26-2,27:          NAME           'l'
2,27-2,28:          OP             ')'
2,28-2,29:          OP             ')'
2,29-2,30:          OP             ')'
2,30-2,31:          NEWLINE        '\n'
3,0-3,0:            ENDMARKER      ''

@pablogsal
Copy link
Member

Actually, both versions work with 3.12 (current):

v1 = """\
PVectorAndLists = st.lists(st.builds(RefCountTracker)).map(
    lambda l: (l, pvector(l)))
"""

v2 = """\
PVectorAndLists = st.lists(st.builds(RefCountTracker)).map(lambda l: (l, pvector(l)))
"""

import tokenize
import pprint
import io

for code in [v1, v2]:
    b = io.StringIO(code)
    pprint.pprint(list(tokenize.generate_tokens(b.readline)))

This shows:

[TokenInfo(type=1 (NAME), string='PVectorAndLists', start=(1, 0), end=(1, 15), line='PVectorAndLists = st.lists(st.builds(RefCountTracker)).map(\n'),
 TokenInfo(type=55 (OP), string='=', start=(1, 16), end=(1, 17), line='PVectorAndLists = st.lists(st.builds(RefCountTracker)).map(\n'),
 TokenInfo(type=1 (NAME), string='st', start=(1, 18), end=(1, 20), line='PVectorAndLists = st.lists(st.builds(RefCountTracker)).map(\n'),
 TokenInfo(type=55 (OP), string='.', start=(1, 20), end=(1, 21), line='PVectorAndLists = st.lists(st.builds(RefCountTracker)).map(\n'),
 TokenInfo(type=1 (NAME), string='lists', start=(1, 21), end=(1, 26), line='PVectorAndLists = st.lists(st.builds(RefCountTracker)).map(\n'),
 TokenInfo(type=55 (OP), string='(', start=(1, 26), end=(1, 27), line='PVectorAndLists = st.lists(st.builds(RefCountTracker)).map(\n'),
 TokenInfo(type=1 (NAME), string='st', start=(1, 27), end=(1, 29), line='PVectorAndLists = st.lists(st.builds(RefCountTracker)).map(\n'),
 TokenInfo(type=55 (OP), string='.', start=(1, 29), end=(1, 30), line='PVectorAndLists = st.lists(st.builds(RefCountTracker)).map(\n'),
 TokenInfo(type=1 (NAME), string='builds', start=(1, 30), end=(1, 36), line='PVectorAndLists = st.lists(st.builds(RefCountTracker)).map(\n'),
 TokenInfo(type=55 (OP), string='(', start=(1, 36), end=(1, 37), line='PVectorAndLists = st.lists(st.builds(RefCountTracker)).map(\n'),
 TokenInfo(type=1 (NAME), string='RefCountTracker', start=(1, 37), end=(1, 52), line='PVectorAndLists = st.lists(st.builds(RefCountTracker)).map(\n'),
 TokenInfo(type=55 (OP), string=')', start=(1, 52), end=(1, 53), line='PVectorAndLists = st.lists(st.builds(RefCountTracker)).map(\n'),
 TokenInfo(type=55 (OP), string=')', start=(1, 53), end=(1, 54), line='PVectorAndLists = st.lists(st.builds(RefCountTracker)).map(\n'),
 TokenInfo(type=55 (OP), string='.', start=(1, 54), end=(1, 55), line='PVectorAndLists = st.lists(st.builds(RefCountTracker)).map(\n'),
 TokenInfo(type=1 (NAME), string='map', start=(1, 55), end=(1, 58), line='PVectorAndLists = st.lists(st.builds(RefCountTracker)).map(\n'),
 TokenInfo(type=55 (OP), string='(', start=(1, 58), end=(1, 59), line='PVectorAndLists = st.lists(st.builds(RefCountTracker)).map(\n'),
 TokenInfo(type=65 (NL), string='\n', start=(1, 59), end=(1, 60), line='PVectorAndLists = st.lists(st.builds(RefCountTracker)).map(\n'),
 TokenInfo(type=1 (NAME), string='lambda', start=(2, 4), end=(2, 10), line='    lambda l: (l, pvector(l)))\n'),
 TokenInfo(type=1 (NAME), string='l', start=(2, 11), end=(2, 12), line='    lambda l: (l, pvector(l)))\n'),
 TokenInfo(type=55 (OP), string=':', start=(2, 12), end=(2, 13), line='    lambda l: (l, pvector(l)))\n'),
 TokenInfo(type=55 (OP), string='(', start=(2, 14), end=(2, 15), line='    lambda l: (l, pvector(l)))\n'),
 TokenInfo(type=1 (NAME), string='l', start=(2, 15), end=(2, 16), line='    lambda l: (l, pvector(l)))\n'),
 TokenInfo(type=55 (OP), string=',', start=(2, 16), end=(2, 17), line='    lambda l: (l, pvector(l)))\n'),
 TokenInfo(type=1 (NAME), string='pvector', start=(2, 18), end=(2, 25), line='    lambda l: (l, pvector(l)))\n'),
 TokenInfo(type=55 (OP), string='(', start=(2, 25), end=(2, 26), line='    lambda l: (l, pvector(l)))\n'),
 TokenInfo(type=1 (NAME), string='l', start=(2, 26), end=(2, 27), line='    lambda l: (l, pvector(l)))\n'),
 TokenInfo(type=55 (OP), string=')', start=(2, 27), end=(2, 28), line='    lambda l: (l, pvector(l)))\n'),
 TokenInfo(type=55 (OP), string=')', start=(2, 28), end=(2, 29), line='    lambda l: (l, pvector(l)))\n'),
 TokenInfo(type=55 (OP), string=')', start=(2, 29), end=(2, 30), line='    lambda l: (l, pvector(l)))\n'),
 TokenInfo(type=4 (NEWLINE), string='\n', start=(2, 30), end=(2, 31), line='    lambda l: (l, pvector(l)))\n'),
 TokenInfo(type=0 (ENDMARKER), string='', start=(3, 0), end=(3, 0), line='')]
[TokenInfo(type=1 (NAME), string='PVectorAndLists', start=(1, 0), end=(1, 15), line='PVectorAndLists = st.lists(st.builds(RefCountTracker)).map(lambda l: (l, pvector(l)))\n'),
 TokenInfo(type=55 (OP), string='=', start=(1, 16), end=(1, 17), line='PVectorAndLists = st.lists(st.builds(RefCountTracker)).map(lambda l: (l, pvector(l)))\n'),
 TokenInfo(type=1 (NAME), string='st', start=(1, 18), end=(1, 20), line='PVectorAndLists = st.lists(st.builds(RefCountTracker)).map(lambda l: (l, pvector(l)))\n'),
 TokenInfo(type=55 (OP), string='.', start=(1, 20), end=(1, 21), line='PVectorAndLists = st.lists(st.builds(RefCountTracker)).map(lambda l: (l, pvector(l)))\n'),
 TokenInfo(type=1 (NAME), string='lists', start=(1, 21), end=(1, 26), line='PVectorAndLists = st.lists(st.builds(RefCountTracker)).map(lambda l: (l, pvector(l)))\n'),
 TokenInfo(type=55 (OP), string='(', start=(1, 26), end=(1, 27), line='PVectorAndLists = st.lists(st.builds(RefCountTracker)).map(lambda l: (l, pvector(l)))\n'),
 TokenInfo(type=1 (NAME), string='st', start=(1, 27), end=(1, 29), line='PVectorAndLists = st.lists(st.builds(RefCountTracker)).map(lambda l: (l, pvector(l)))\n'),
 TokenInfo(type=55 (OP), string='.', start=(1, 29), end=(1, 30), line='PVectorAndLists = st.lists(st.builds(RefCountTracker)).map(lambda l: (l, pvector(l)))\n'),
 TokenInfo(type=1 (NAME), string='builds', start=(1, 30), end=(1, 36), line='PVectorAndLists = st.lists(st.builds(RefCountTracker)).map(lambda l: (l, pvector(l)))\n'),
 TokenInfo(type=55 (OP), string='(', start=(1, 36), end=(1, 37), line='PVectorAndLists = st.lists(st.builds(RefCountTracker)).map(lambda l: (l, pvector(l)))\n'),
 TokenInfo(type=1 (NAME), string='RefCountTracker', start=(1, 37), end=(1, 52), line='PVectorAndLists = st.lists(st.builds(RefCountTracker)).map(lambda l: (l, pvector(l)))\n'),
 TokenInfo(type=55 (OP), string=')', start=(1, 52), end=(1, 53), line='PVectorAndLists = st.lists(st.builds(RefCountTracker)).map(lambda l: (l, pvector(l)))\n'),
 TokenInfo(type=55 (OP), string=')', start=(1, 53), end=(1, 54), line='PVectorAndLists = st.lists(st.builds(RefCountTracker)).map(lambda l: (l, pvector(l)))\n'),
 TokenInfo(type=55 (OP), string='.', start=(1, 54), end=(1, 55), line='PVectorAndLists = st.lists(st.builds(RefCountTracker)).map(lambda l: (l, pvector(l)))\n'),
 TokenInfo(type=1 (NAME), string='map', start=(1, 55), end=(1, 58), line='PVectorAndLists = st.lists(st.builds(RefCountTracker)).map(lambda l: (l, pvector(l)))\n'),
 TokenInfo(type=55 (OP), string='(', start=(1, 58), end=(1, 59), line='PVectorAndLists = st.lists(st.builds(RefCountTracker)).map(lambda l: (l, pvector(l)))\n'),
 TokenInfo(type=1 (NAME), string='lambda', start=(1, 59), end=(1, 65), line='PVectorAndLists = st.lists(st.builds(RefCountTracker)).map(lambda l: (l, pvector(l)))\n'),
 TokenInfo(type=1 (NAME), string='l', start=(1, 66), end=(1, 67), line='PVectorAndLists = st.lists(st.builds(RefCountTracker)).map(lambda l: (l, pvector(l)))\n'),
 TokenInfo(type=55 (OP), string=':', start=(1, 67), end=(1, 68), line='PVectorAndLists = st.lists(st.builds(RefCountTracker)).map(lambda l: (l, pvector(l)))\n'),
 TokenInfo(type=55 (OP), string='(', start=(1, 69), end=(1, 70), line='PVectorAndLists = st.lists(st.builds(RefCountTracker)).map(lambda l: (l, pvector(l)))\n'),
 TokenInfo(type=1 (NAME), string='l', start=(1, 70), end=(1, 71), line='PVectorAndLists = st.lists(st.builds(RefCountTracker)).map(lambda l: (l, pvector(l)))\n'),
 TokenInfo(type=55 (OP), string=',', start=(1, 71), end=(1, 72), line='PVectorAndLists = st.lists(st.builds(RefCountTracker)).map(lambda l: (l, pvector(l)))\n'),
 TokenInfo(type=1 (NAME), string='pvector', start=(1, 73), end=(1, 80), line='PVectorAndLists = st.lists(st.builds(RefCountTracker)).map(lambda l: (l, pvector(l)))\n'),
 TokenInfo(type=55 (OP), string='(', start=(1, 80), end=(1, 81), line='PVectorAndLists = st.lists(st.builds(RefCountTracker)).map(lambda l: (l, pvector(l)))\n'),
 TokenInfo(type=1 (NAME), string='l', start=(1, 81), end=(1, 82), line='PVectorAndLists = st.lists(st.builds(RefCountTracker)).map(lambda l: (l, pvector(l)))\n'),
 TokenInfo(type=55 (OP), string=')', start=(1, 82), end=(1, 83), line='PVectorAndLists = st.lists(st.builds(RefCountTracker)).map(lambda l: (l, pvector(l)))\n'),
 TokenInfo(type=55 (OP), string=')', start=(1, 83), end=(1, 84), line='PVectorAndLists = st.lists(st.builds(RefCountTracker)).map(lambda l: (l, pvector(l)))\n'),
 TokenInfo(type=55 (OP), string=')', start=(1, 84), end=(1, 85), line='PVectorAndLists = st.lists(st.builds(RefCountTracker)).map(lambda l: (l, pvector(l)))\n'),
 TokenInfo(type=4 (NEWLINE), string='\n', start=(1, 85), end=(1, 86), line='PVectorAndLists = st.lists(st.builds(RefCountTracker)).map(lambda l: (l, pvector(l)))\n'),
 TokenInfo(type=0 (ENDMARKER), string='', start=(2, 0), end=(2, 0), line='')]

@pablogsal
Copy link
Member

Can you provide a reproducer that doesn't use any 3rd party stuff that fails with main or the tip of 3.12?

@arkamar
Copy link
Contributor Author

arkamar commented May 27, 2023

pyrsistent tests still don't work with the tip of 3.12. However, I was pointed out, that similar errors occurred in hypothesis tests. I dag into it and this should be the reproducer:

import inspect

ordered_pair = (
    lambda right: [].map(
        lambda length: ()))

print(inspect.getsource(ordered_pair))

The issue seems to be related to reflection and it prints out following error with Python3.12

Traceback (most recent call last):
  File "/root/test.py", line 7, in <module>
    print(inspect.getsource(ordered_pair))
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.12/inspect.py", line 1274, in getsource
    lines, lnum = getsourcelines(object)
                  ^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.12/inspect.py", line 1266, in getsourcelines
    return getblock(lines[lnum:]), lnum + 1
           ^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.12/inspect.py", line 1241, in getblock
    for _token in tokens:
  File "/usr/lib/python3.12/tokenize.py", line 451, in _tokenize
    for token in _generate_tokens_from_c_tokenizer(source, extra_tokens=True):
  File "/usr/lib/python3.12/tokenize.py", line 542, in _generate_tokens_from_c_tokenizer
    for info in c_tokenizer.TokenizerIter(source, extra_tokens=extra_tokens):
  File "<string>", line 2
    lambda length: ()))
                      ^
SyntaxError: unmatched ')'

but Python3.11 prints this:

    lambda right: [].map(
        lambda length: ()))

@arkamar
Copy link
Contributor Author

arkamar commented May 27, 2023

or this one with just one lambda

import inspect

l = (
    lambda x: ())

print(inspect.getsource(l))

@arkamar
Copy link
Contributor Author

arkamar commented May 27, 2023

It works correctly if I concatenate it to one line

l = (lambda x: ())

@pablogsal
Copy link
Member

The regression here is not on the tokenize but in the inspect module. The tokenize tokenizes the source as it does in 3.11:

$ cat lel2.py
import inspect

l = (
    lambda x: ())

print(inspect.getsource(l))

$ diff <(./python -m tokenize < lel2.py ) <(/home/pablogsal/.pyenv/shims/python3.11 -m tokenize < lel2.py)

@pablogsal
Copy link
Member

pablogsal commented May 27, 2023

@arkamar can you check if this fixes pyrsistent with #105021 ?

pablogsal added a commit to pablogsal/cpython that referenced this issue May 27, 2023
@pablogsal
Copy link
Member

Seems that I can execute the test suite correctly with the fix installed:

❯ python -m pytest tests -v -s
=============================================================================================================================================== test session starts ===============================================================================================================================================
platform linux -- Python 3.13.0a0, pytest-7.3.1, pluggy-1.0.0 -- /home/pablogsal/github/python/main/venv/bin/python
cachedir: .pytest_cache
hypothesis profile 'default' -> database=DirectoryBasedExampleDatabase('/home/pablogsal/github/python/main/pyrsistent/.hypothesis/examples')
rootdir: /home/pablogsal/github/python/main/pyrsistent
configfile: pytest.ini
plugins: hypothesis-6.75.6
collecting ...
...
============================================================================================================================ 532 passed, 102 skipped, 15 warnings in 79.66s (0:01:19) =============================================================================================================================

pablogsal added a commit to pablogsal/cpython that referenced this issue May 27, 2023
…iline lambdas

Signed-off-by: Pablo Galindo <pablogsal@gmail.com>
pablogsal added a commit to pablogsal/cpython that referenced this issue May 27, 2023
pablogsal added a commit to pablogsal/cpython that referenced this issue May 27, 2023
miss-islington pushed a commit to miss-islington/cpython that referenced this issue May 27, 2023
…ambdas (pythonGH-105021)

(cherry picked from commit 3a5be87)

Co-authored-by: Pablo Galindo Salgado <Pablogsal@gmail.com>
@pablogsal
Copy link
Member

pablogsal commented May 27, 2023

Fixed in #105021

@mgorny
Copy link
Contributor

mgorny commented May 28, 2023

I can confirm that this patch alone is sufficient to resolve the collection error in pyrsistent. Python segfaults while running the tests afterwards but FWICS this is another bug that's already fixed in the 3.12 branch.

@arkamar
Copy link
Contributor Author

arkamar commented May 28, 2023

@mgorny thanks for testing.

pablogsal added a commit that referenced this issue May 28, 2023
…lambdas (GH-105021) (#105032)

gh-105013: Fix inspect.getsource with parenthesized multiline lambdas (GH-105021)
(cherry picked from commit 3a5be87)

Co-authored-by: Pablo Galindo Salgado <Pablogsal@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
interpreter-core (Objects, Python, Grammar, and Parser dirs) stdlib Python modules in the Lib dir type-bug An unexpected behavior, bug, or error
Projects
None yet
Development

No branches or pull requests

5 participants