Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Failure in test_traceback: Different column pointer upon parsing error #65

Closed
lysnikolaou opened this issue Apr 9, 2020 · 10 comments
Closed
Labels
3.9.0b1 Needs to be fixed/implemented until the 3.9.0b1 release

Comments

@lysnikolaou
Copy link
Member

lysnikolaou commented Apr 9, 2020

Current Parser:

╰─ ./python
Python 3.9.0a5+ (heads/pegen:99a8e2fa08, Apr  9 2020, 16:19:04) 
[GCC 9.2.1 20191008] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> x = 5 | 4 |
  File "<stdin>", line 1
    x = 5 | 4 |
              ^
SyntaxError: invalid syntax

Pegen:

╰─ ./python -p new
Python 3.9.0a5+ (heads/pegen:99a8e2fa08, Apr  9 2020, 16:19:04) 
[GCC 9.2.1 20191008] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> x = 5 | 4 |
  File "<stdin>", line 1
    x = 5 | 4 |
               ^
SyntaxError: invalid syntax

Difference by one. Would it be okay to just skip this test if pegen is enabled?

@pablogsal
Copy link

Difference by one. Would it be okay to just skip this test if pegen is enabled?

Why is not pointing to the "|"? It has consumed more whitespace? Maybe we can use the ~ operator somewhere so it does not try something more....

@pablogsal
Copy link

Interestingly, with two | it does it correctly:

>>> x = 5 | 4 |             |
  File "<stdin>", line 1
    x = 5 | 4 |             |
                            ^
SyntaxError: invalid syntax

@pablogsal
Copy link

pablogsal commented Apr 9, 2020

Yeah, the problem is that it consumes whitespace:

>>> x = 5 | 4 |                                        # Lots of whitespace here
  File "<stdin>", line 1
    x = 5 | 4 |
                                                       ^
SyntaxError: invalid syntax

Maybe we can remove all trailing whitespace when reporting errors: move the error indicator to the first not-whitespace character.

@gvanrossum
Copy link

Maybe we can remove all trailing whitespace when reporting errors: move the error indicator to the first not-whitespace character.

But won't that be fooled by a comment?

Arguably the old parser is wrong. Using zero, one, two spaces at the end of the line I get:

>>> 1|
  File "<stdin>", line 1
    1|
     ^
SyntaxError: invalid syntax
>>> 1| 
  File "<stdin>", line 1
    1| 
      ^
SyntaxError: invalid syntax
>>> 1|  
  File "<stdin>", line 1
    1|  
       ^
SyntaxError: invalid syntax
>>> 

It looks like an off-by-one error, trying to point to the end of the line. Also:

>>> 1|#
  File "<stdin>", line 1
    1|#
      ^
SyntaxError: invalid syntax
>>> 1| #
  File "<stdin>", line 1
    1| #
       ^
SyntaxError: invalid syntax
>>> 1|  #
  File "<stdin>", line 1
    1|  #
        ^
SyntaxError: invalid syntax
>>> 

The new parser seems to be more rational:

>>> 1|
  File "<stdin>", line 1
    1|
      ^
SyntaxError: invalid syntax
>>> 1| 
  File "<stdin>", line 1
    1| 
       ^
SyntaxError: invalid syntax
>>> 1|  
  File "<stdin>", line 1
    1|  
        ^
SyntaxError: invalid syntax
>>> 1|#
  File "<stdin>", line 1
    1|#
      ^
SyntaxError: invalid syntax
>>> 1| #
  File "<stdin>", line 1
    1| #
       ^
SyntaxError: invalid syntax
>>> 1|  #
  File "<stdin>", line 1
    1|  #
        ^
SyntaxError: invalid syntax
>>> 

The basic problem here is that the syntax error wants to point at the NEWLINE token -- the | operator itself is not in error, there's just something missing after it. (There's nothing special about | either.)

The new parser points at the whitespace, which seems correct -- though maybe it could point to the first space or # instead of at the end. The old parser tries this but has an off-by-one error (perhaps due to some ambiguity we have had for a long time about whether column numbers are 0-based or 1-based).

The quickest fix would be to fix the old parser to add 1.

@lysnikolaou lysnikolaou added the 3.9.0b1 Needs to be fixed/implemented until the 3.9.0b1 release label Apr 28, 2020
@lysnikolaou
Copy link
Member Author

For some weird reason that I don't really understand, the following fixes the issue by moving the caret one position to the right.

diff --git a/Python/pythonrun.c b/Python/pythonrun.c
index 1b79a33c81..f101cfb968 100644
--- a/Python/pythonrun.c
+++ b/Python/pythonrun.c
@@ -1620,17 +1620,8 @@ err_input(perrdetail *err)
         errtext = Py_None;
         Py_INCREF(Py_None);
     } else {
-        errtext = PyUnicode_DecodeUTF8(err->text, err->offset,
+        errtext = PyUnicode_DecodeUTF8(err->text, err->offset - 1,
                                        "replace");
-        if (errtext != NULL) {
-            Py_ssize_t len = strlen(err->text);
-            offset = (int)PyUnicode_GET_LENGTH(errtext);
-            if (len != err->offset) {
-                Py_DECREF(errtext);
-                errtext = PyUnicode_DecodeUTF8(err->text, len,
-                                               "replace");
-            }
-        }
     }
     v = Py_BuildValue("(OiiN)", err->filename,
                       err->lineno, offset, errtext);

Note that instead of adding one, we need to subtract one, which I find weird. Also, this creates some problems of its own. Since pegen is better here, I'd propose we modify the test to pass with the new parser, skip it when using the old parser and close this issue.

@gvanrossum
Copy link

gvanrossum commented May 7, 2020 via email

@lysnikolaou
Copy link
Member Author

Sorry, I didn't make myself clear. I only posted this diff to show what's needed to move the caret when using the old parser and to say that that is not really an acceptable solution.

Since it is not an acceptable solution, I'm proposing to leave the old parser as is, change the failing test to succeed under the new parser and fail under the old one and then skip it when using the old parser.

@gvanrossum
Copy link

gvanrossum commented May 7, 2020 via email

@lysnikolaou
Copy link
Member Author

I opened bpo-40546, since the traceback module would need to change as well to handle this correctly.

@lysnikolaou
Copy link
Member Author

Fixed in python#20072.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
3.9.0b1 Needs to be fixed/implemented until the 3.9.0b1 release
Projects
None yet
Development

No branches or pull requests

3 participants