Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PEP 701 – Syntactic formalization of f-strings #102856

Closed
4 tasks done
pablogsal opened this issue Mar 20, 2023 · 55 comments
Closed
4 tasks done

PEP 701 – Syntactic formalization of f-strings #102856

pablogsal opened this issue Mar 20, 2023 · 55 comments

Comments

@pablogsal
Copy link
Member Author

@pablogsal
Copy link
Member Author

See this for the latest report on errors from @isidentical

@pablogsal
Copy link
Member Author

Draft PR for the C tokenizer up: #102855

@pablogsal
Copy link
Member Author

pablogsal commented Mar 20, 2023

Things for the cleanup of #102855:

  • Cleaning up the grammar and the action helpers (the names are still ridiculous and there are multiple rules commented out).
  • Remove the old parsing code and check that we didn't break anything 😅
  • Clean/refactor the tokenizer struct (better names, factor stuff into its own structure as needed).
  • Consider factoring out tok_get_fstring_mode because is a monster.

@pablogsal
Copy link
Member Author

pablogsal commented Mar 20, 2023

Ok with #102855 we have the following failing tests:

  • test_ast
  • test_cmd_line_script
  • test_eof
  • test_exceptions
  • test_fstring
  • test_tokenize
  • test_type_comments
  • test_unparse

Most of these are updating error messages, line numbers and other stuff but some may have actual bugs so we should check them. Please, mention which ones are you working on so we don't clash with one another.

@mgmacias95
Copy link
Contributor

Working on test_tokenize

@Eclips4
Copy link
Member

Eclips4 commented Mar 21, 2023

Hello, Pablo!
Can I get work on test_ast?
Recently I sent some PR's about this file (for example, #102797). So, I have some experience in that =)

@ramvikrams
Copy link
Contributor

I can work with test_type_comments and test_unparse.

@pablogsal
Copy link
Member Author

@Eclips4 @ramvikrams wonderful! Just make PRs against my fork!

Report here or ping any of us if you find something that could be a bug (don't just fix the tests blindly because there may be bugs lurking).

@pablogsal
Copy link
Member Author

@lysnikolaou can you work on cleaning up the grammar + the actions?

@isidentical can you work on cleaning up some of the tokenizer layers? (This is quite a lot so we can probably work together here).

@Eclips4
Copy link
Member

Eclips4 commented Mar 21, 2023

@pablogsal
About test_ast.py
Seems thats like there only a one test will be failed, and how I undestand, that's a bug:

with self.assertRaises(SyntaxError):
ast.parse('f"{x=}"', feature_version=(3, 7))

I think, there's two solutions:

  1. Remove this test, because support of python3.7 will be ended soon.
  2. Now errors raised by tokenizer.c instead of string_parser.c, so as I understand, we should change python_gram, is it right? ( We need access to feature_version, which in tokenizer inaccessible )

@pablogsal
Copy link
Member Author

2. Now errors raised by tokenizer.c instead of string_parser.c, so as I understand, we should change python_gram, is it right? ( We need access to feature_version, which in tokenizer inaccessible )

Probably we can do this but on the other hand I would prefer to not overcomplicate this so I think (1) is better

@lysnikolaou
Copy link
Member

@lysnikolaou can you work on cleaning up the grammar + the actions?

Will do!

@Eclips4
Copy link
Member

Eclips4 commented Mar 21, 2023

Also, I can take a look at test_cmd_line_script. Seems easy.

@pablogsal
Copy link
Member Author

Also, I can take a look at test_cmd_line_script. Seems easy.

All yours!

@CharlieZhao95
Copy link
Contributor

I found that no one has claimed test_eof yet, so I made some work. :)
Failed test case: test_eof.test_eof_with_line_continuation

I looked at its commit history. This test case is a regression test for crash, so it seems like a good choice to keep the case and update the error message directly.

def test_eof_with_line_continuation(self):
expect = "unexpected EOF while parsing (<string>, line 1)"

Update unexpected EOF while parsing (<string>, line 1) to (unicode error) 'unicodeescape' codec can't decode bytes in position 0-1: truncated \xXX escape (<string>, line 1)

@ramvikrams
Copy link
Contributor

@pablogsal in test_type_comments we have not used any f-strings.

@pablogsal
Copy link
Member Author

pablogsal commented Mar 23, 2023

@pablogsal in test_type_comments we have not used any f-strings.

The one failing there is this problem:

======================================================================
FAIL: test_fstring (test.test_type_comments.TypeCommentTests.test_fstring)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/runner/work/cpython/cpython-ro-srcdir/Lib/test/test_type_comments.py", line 275, in test_fstring
    for tree in self.parse_all(fstring, minver=6):
  File "/home/runner/work/cpython/cpython-ro-srcdir/Lib/test/test_type_comments.py", line 239, in parse_all
    with self.assertRaisesRegex(SyntaxError, expected_regex,
AssertionError: SyntaxError not raised : feature_version=(3, 4)

----------------------------------------------------------------------

which I think is a feature version problem.

@Eclips4
Copy link
Member

Eclips4 commented Mar 23, 2023

@pablogsal in test_type_comments we have not used any f-strings.

The one failing there is this problem:

======================================================================
FAIL: test_fstring (test.test_type_comments.TypeCommentTests.test_fstring)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/runner/work/cpython/cpython-ro-srcdir/Lib/test/test_type_comments.py", line 275, in test_fstring
    for tree in self.parse_all(fstring, minver=6):
  File "/home/runner/work/cpython/cpython-ro-srcdir/Lib/test/test_type_comments.py", line 239, in parse_all
    with self.assertRaisesRegex(SyntaxError, expected_regex,
AssertionError: SyntaxError not raised : feature_version=(3, 4)

----------------------------------------------------------------------

which I think is a feature version problem.

lowest = 4 # Lowest minor version supported

We can just change this to 6, and this test will be pass.
I don't research this problem, but this solution looks like the simplest.
However... supporting syntax of python3.4 && 3.5 looks cinda strange.

@pablogsal
Copy link
Member Author

We can just change this to 6, and this test will be pass.
I don't research this problem, but this solution looks like the simplest.
However... supporting syntax of python3.4 && 3.5 looks cinda strange.

I don't think that will work because we are not doing version checking anymore. See previous comments. The fix is probably to no pass feature_version.

@sunmy2019
Copy link
Member

Looks like no one is analyzing test_exceptions. I will look into it these two days.

4 platforms seem to have met the same problem here.

@pablogsal
Copy link
Member Author

@isidentical @lysnikolaou i have pushed some rules for error messages, please take a look and complete them with more if you have some spare cycles. With these the failures in test_fstring have decreased notably

@isidentical
Copy link
Member

I can confirm that the total number of failures has been decrased from 88 to 63. I'll try to see what are the most high impact ones and submit a PR to clear them.

@isidentical
Copy link
Member

If anyone intends to work on any of the remaining tasks in test_fstring, please double check with this PR (pablogsal#52) since it brings down the total failures to 30 with some explanations/required decisions for the rest.

@sunmy2019
Copy link
Member

After looking into the failure in test_exceptions.

check(b'Python = "\xcf\xb3\xf2\xee\xed" +', 1, 18)

Old parser and the new parser raises the same exception (UnicodeDecodeError), but with different col_offset. This is because it was raised by the wrong token.

I would consider it a bug in the old parser. Just as this comment mentions,

/* This is needed, in order for the SyntaxError to point to the token t,
since _PyPegen_raise_error uses p->tokens[p->fill - 1] for the
error location, if p->known_err_token is not set. */
p->known_err_token = t;
if (octal) {
RAISE_SYNTAX_ERROR("invalid octal escape sequence '\\%.3s'",
first_invalid_escape);
}
else {
RAISE_SYNTAX_ERROR("invalid escape sequence '\\%c'", c);
}

the error token was not correctly set in the old parser.

Maybe we should open an issue for the old parse? But the possible fixing might be error-prone, since we might need to keep track of every possible code path.

As for the new parser, I think a change in the test case would be fine.

@sunmy2019
Copy link
Member

I am working on an PR to fix test_unparse this weekend.

_PyPegen_concatenate_strings did not implement concatenating empty Constant with FormattedValue, resulting unparse failure.

@sunmy2019
Copy link
Member

sunmy2019 commented Mar 31, 2023

Hi, I got some bad news.

I have been testing against memory leaks with ./python -m test -j $(nproc) -R :
~30% of the tests failed on current head 270b661

For example,

0:01:01 load avg: 13.77 [157/433/42] test_unittest failed (reference leak)
beginning 9 repetitions
test_unittest leaked [89, 89, 89, 89] references, sum=356
test_unittest leaked [90, 89, 89, 89] memory blocks, sum=357
.......
0:01:16 load avg: 15.00 [185/433/49] test_inspect failed (reference leak)
beginning 9 repetitions
test_inspect leaked [429, 429, 429, 429] references, sum=1716
test_inspect leaked [318, 318, 318, 317] memory blocks, sum=1271

These references are most likely to leak during the compilation (such as using import, using compile/exec/eval, or using ast.parse)

We might need to look into that.


Update: Memory Leakage fixed by commit pablogsal@d8b12e2

The root cause is that someone forgot to use _PyArena_AddPyObject in 3 places.

This is very tricky because _PyArena_AddPyObject was scattered in many subroutines. Sometimes you should add _PyArena_AddPyObject, but sometimes you should not (add will cause a negative ref count).

Just like an old saying, managing memory by hand is so much pain, and also error-prone. This check can be done, by analyzing the PyObject*s registered to the arena by the time the AST was created, but that is a totally different story.

@pablogsal
Copy link
Member Author

Only 12 test left in test_fstring and we are ready to go!

pablogsal added a commit that referenced this issue May 24, 2023
…04824)

Co-authored-by: Pablo Galindo Salgado <Pablogsal@gmail.com>
Co-authored-by: Jelle Zijlstra <jelle.zijlstra@gmail.com>
miss-islington pushed a commit to miss-islington/cpython that referenced this issue May 24, 2023
…cs (pythonGH-104824)

(cherry picked from commit c45701e)

Co-authored-by: Marta Gómez Macías <mgmacias@google.com>
Co-authored-by: Pablo Galindo Salgado <Pablogsal@gmail.com>
Co-authored-by: Jelle Zijlstra <jelle.zijlstra@gmail.com>
pablogsal added a commit that referenced this issue May 24, 2023
…ocs (GH-104824) (#104847)

gh-102856: Add changes related to PEP 701 in 3.12 What's New docs (GH-104824)
(cherry picked from commit c45701e)

Co-authored-by: Marta Gómez Macías <mgmacias@google.com>
Co-authored-by: Pablo Galindo Salgado <Pablogsal@gmail.com>
Co-authored-by: Jelle Zijlstra <jelle.zijlstra@gmail.com>
@pablogsal
Copy link
Member Author

pablogsal commented May 24, 2023

Closing this as we already have a What's New entry, C tokenizer, and Python tokenizer. Let's tackle any small remaining items in separate issues from now on.

Probably we need to alter https://docs.python.org/3/reference/lexical_analysis.html#formatted-string-literals. @lysnikolaou can you take a go at that?

miss-islington pushed a commit to miss-islington/cpython that referenced this issue May 24, 2023
(cherry picked from commit 3e97c00)

Co-authored-by: Hugo van Kemenade <hugovk@users.noreply.github.com>
@lysnikolaou
Copy link
Member

Probably we need to alter https://docs.python.org/3/reference/lexical_analysis.html#formatted-string-literals. @lysnikolaou can you take a go at that?

Sure!

hugovk added a commit that referenced this issue May 24, 2023
Co-authored-by: Hugo van Kemenade <hugovk@users.noreply.github.com>
lysnikolaou added a commit to lysnikolaou/cpython that referenced this issue May 24, 2023
lysnikolaou added a commit to lysnikolaou/cpython that referenced this issue May 24, 2023
miss-islington pushed a commit to miss-islington/cpython that referenced this issue May 24, 2023
…r PEP701 (pythonGH-104861)

(cherry picked from commit 8e5b3b9)

Co-authored-by: Lysandros Nikolaou <lisandrosnik@gmail.com>
lysnikolaou added a commit that referenced this issue May 24, 2023
…er PEP701 (GH-104861) (#104865)

(cherry picked from commit 8e5b3b9)

Co-authored-by: Lysandros Nikolaou <lisandrosnik@gmail.com>
@flying-sheep
Copy link
Contributor

Ah great! I didn't know someone would continue PEP 536. Happy that it's happening!

Shouldn't PEP 536 be mentioned in this one?

@pablogsal
Copy link
Member Author

Ah great! I didn't know someone would continue PEP 536. Happy that it's happening!

Shouldn't PEP 536 be mentioned in this one?

It's mentioned in the PEP:

https://peps.python.org/pep-0701/

Btw as heads up: We don't monitor normally closed issues so is very likely that people won't answer to comments when the issue is closed :)

@flying-sheep
Copy link
Contributor

Thanks! Seems like I missed the mention then. Perfect!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

9 participants