Failure in test_traceback: Different column pointer upon parsing error #65

lysnikolaou · 2020-04-09T13:33:25Z

Current Parser:

╰─ ./python
Python 3.9.0a5+ (heads/pegen:99a8e2fa08, Apr  9 2020, 16:19:04) 
[GCC 9.2.1 20191008] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> x = 5 | 4 |
  File "<stdin>", line 1
    x = 5 | 4 |
              ^
SyntaxError: invalid syntax

Pegen:

╰─ ./python -p new
Python 3.9.0a5+ (heads/pegen:99a8e2fa08, Apr  9 2020, 16:19:04) 
[GCC 9.2.1 20191008] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> x = 5 | 4 |
  File "<stdin>", line 1
    x = 5 | 4 |
               ^
SyntaxError: invalid syntax

Difference by one. Would it be okay to just skip this test if pegen is enabled?

The text was updated successfully, but these errors were encountered:

pablogsal · 2020-04-09T14:54:20Z

Difference by one. Would it be okay to just skip this test if pegen is enabled?

Why is not pointing to the "|"? It has consumed more whitespace? Maybe we can use the ~ operator somewhere so it does not try something more....

pablogsal · 2020-04-09T14:56:16Z

Interestingly, with two | it does it correctly:

>>> x = 5 | 4 |             |
  File "<stdin>", line 1
    x = 5 | 4 |             |
                            ^
SyntaxError: invalid syntax

pablogsal · 2020-04-09T14:56:53Z

Yeah, the problem is that it consumes whitespace:

>>> x = 5 | 4 |                                        # Lots of whitespace here
  File "<stdin>", line 1
    x = 5 | 4 |
                                                       ^
SyntaxError: invalid syntax

Maybe we can remove all trailing whitespace when reporting errors: move the error indicator to the first not-whitespace character.

gvanrossum · 2020-04-09T18:15:39Z

Maybe we can remove all trailing whitespace when reporting errors: move the error indicator to the first not-whitespace character.

But won't that be fooled by a comment?

Arguably the old parser is wrong. Using zero, one, two spaces at the end of the line I get:

>>> 1|
  File "<stdin>", line 1
    1|
     ^
SyntaxError: invalid syntax
>>> 1| 
  File "<stdin>", line 1
    1| 
      ^
SyntaxError: invalid syntax
>>> 1|  
  File "<stdin>", line 1
    1|  
       ^
SyntaxError: invalid syntax
>>>

It looks like an off-by-one error, trying to point to the end of the line. Also:

>>> 1|#
  File "<stdin>", line 1
    1|#
      ^
SyntaxError: invalid syntax
>>> 1| #
  File "<stdin>", line 1
    1| #
       ^
SyntaxError: invalid syntax
>>> 1|  #
  File "<stdin>", line 1
    1|  #
        ^
SyntaxError: invalid syntax
>>>

The new parser seems to be more rational:

>>> 1|
  File "<stdin>", line 1
    1|
      ^
SyntaxError: invalid syntax
>>> 1| 
  File "<stdin>", line 1
    1| 
       ^
SyntaxError: invalid syntax
>>> 1|  
  File "<stdin>", line 1
    1|  
        ^
SyntaxError: invalid syntax
>>> 1|#
  File "<stdin>", line 1
    1|#
      ^
SyntaxError: invalid syntax
>>> 1| #
  File "<stdin>", line 1
    1| #
       ^
SyntaxError: invalid syntax
>>> 1|  #
  File "<stdin>", line 1
    1|  #
        ^
SyntaxError: invalid syntax
>>>

The basic problem here is that the syntax error wants to point at the NEWLINE token -- the | operator itself is not in error, there's just something missing after it. (There's nothing special about | either.)

The new parser points at the whitespace, which seems correct -- though maybe it could point to the first space or # instead of at the end. The old parser tries this but has an off-by-one error (perhaps due to some ambiguity we have had for a long time about whether column numbers are 0-based or 1-based).

The quickest fix would be to fix the old parser to add 1.

lysnikolaou · 2020-05-07T13:49:38Z

For some weird reason that I don't really understand, the following fixes the issue by moving the caret one position to the right.

diff --git a/Python/pythonrun.c b/Python/pythonrun.c
index 1b79a33c81..f101cfb968 100644
--- a/Python/pythonrun.c
+++ b/Python/pythonrun.c
@@ -1620,17 +1620,8 @@ err_input(perrdetail *err)
         errtext = Py_None;
         Py_INCREF(Py_None);
     } else {
-        errtext = PyUnicode_DecodeUTF8(err->text, err->offset,
+        errtext = PyUnicode_DecodeUTF8(err->text, err->offset - 1,
                                        "replace");
-        if (errtext != NULL) {
-            Py_ssize_t len = strlen(err->text);
-            offset = (int)PyUnicode_GET_LENGTH(errtext);
-            if (len != err->offset) {
-                Py_DECREF(errtext);
-                errtext = PyUnicode_DecodeUTF8(err->text, len,
-                                               "replace");
-            }
-        }
     }
     v = Py_BuildValue("(OiiN)", err->filename,
                       err->lineno, offset, errtext);

Note that instead of adding one, we need to subtract one, which I find weird. Also, this creates some problems of its own. Since pegen is better here, I'd propose we modify the test to pass with the new parser, skip it when using the old parser and close this issue.

gvanrossum · 2020-05-07T14:46:01Z

Careful, looks like you're deleting some code there that deals with non-ASCII source...

lysnikolaou · 2020-05-07T15:17:33Z

Sorry, I didn't make myself clear. I only posted this diff to show what's needed to move the caret when using the old parser and to say that that is not really an acceptable solution.

Since it is not an acceptable solution, I'm proposing to leave the old parser as is, change the failing test to succeed under the new parser and fail under the old one and then skip it when using the old parser.

gvanrossum · 2020-05-07T15:29:56Z

SGTM

lysnikolaou · 2020-05-07T16:06:51Z

I opened bpo-40546, since the traceback module would need to change as well to handle this correctly.

lysnikolaou · 2020-05-24T19:01:26Z

Fixed in python#20072.

lysnikolaou added the 3.9.0b1 Needs to be fixed/implemented until the 3.9.0b1 release label Apr 28, 2020

pablogsal mentioned this issue May 1, 2020

Wrong column for invalid string prefix #121

Closed

lysnikolaou closed this as completed May 24, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Failure in test_traceback: Different column pointer upon parsing error #65

Failure in test_traceback: Different column pointer upon parsing error #65

lysnikolaou commented Apr 9, 2020 •

edited

Loading

pablogsal commented Apr 9, 2020

pablogsal commented Apr 9, 2020

pablogsal commented Apr 9, 2020 •

edited

Loading

gvanrossum commented Apr 9, 2020

lysnikolaou commented May 7, 2020

gvanrossum commented May 7, 2020 via email •

edited

Loading

lysnikolaou commented May 7, 2020

gvanrossum commented May 7, 2020 via email •

edited

Loading

lysnikolaou commented May 7, 2020

lysnikolaou commented May 24, 2020

Failure in test_traceback: Different column pointer upon parsing error #65

Failure in test_traceback: Different column pointer upon parsing error #65

Comments

lysnikolaou commented Apr 9, 2020 • edited Loading

pablogsal commented Apr 9, 2020

pablogsal commented Apr 9, 2020

pablogsal commented Apr 9, 2020 • edited Loading

gvanrossum commented Apr 9, 2020

lysnikolaou commented May 7, 2020

gvanrossum commented May 7, 2020 via email • edited Loading

lysnikolaou commented May 7, 2020

gvanrossum commented May 7, 2020 via email • edited Loading

lysnikolaou commented May 7, 2020

lysnikolaou commented May 24, 2020

lysnikolaou commented Apr 9, 2020 •

edited

Loading

pablogsal commented Apr 9, 2020 •

edited

Loading

gvanrossum commented May 7, 2020 via email •

edited

Loading

gvanrossum commented May 7, 2020 via email •

edited

Loading