-
-
Notifications
You must be signed in to change notification settings - Fork 30.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
3.11 branch: column >= -1
assertion failure with pytest rewriting
#94694
Comments
Is possible that this is just that the assertion here fails because the node was not created by us, but this could corrupt the line table info (I need to check, maybe we are handling values of -1 correctly). |
I will take a look this weekend |
I'm not getting that assertion, I'm getting something that looks like a GC crash:
|
@iritkatriel, I'm getting that too. But if you look at a core dump, you can see it's hitting that assertion. Maybe pytest is capturing the output during collection or something. With a bit of debugging, I found that the assert is hit when Pytest is rewriting the following four Python-level assert statements:
Where I'm at now with this: removing |
Also, make sure you're setting |
Those locations have negative start columns, surely that would mess things up. -1 is "no value", but -2 and -6 are not expected. |
Yep. I'm just wondering how we get there. Pytest doesn't set any Theory: They're inserting method calls, but keeping the same location info as the old assert. This messes up |
Yep, my hunch was correct. The following patch fixes this issue on the 3.11 branch: diff --git a/Python/compile.c b/Python/compile.c
index a71e7d31ee..f14315748d 100644
--- a/Python/compile.c
+++ b/Python/compile.c
@@ -4790,6 +4790,9 @@ update_location_to_match_attr(struct compiler *c, expr_ty meth)
// Make start location match attribute
c->u->u_lineno = meth->end_lineno;
c->u->u_col_offset = meth->end_col_offset - (int)PyUnicode_GetLength(meth->v.Attribute.attr)-1;
+ if (c->u->u_col_offset < -1) {
+ c->u->u_col_offset = -1;
+ }
}
} |
So I'm willing to call this one "our fault", even though Pytest is doing some pretty weird stuff here. I'll prepare a PR. |
I don't know if we want to leave u_end_col_offset != -1 when u_col_offset is -1. Also, we can test for the scenario you're conjecturing - I think we would see |
or |
So with this patch pytest works:
|
But I don't think that's correct - probably in the case that someone messed up the AST we do want to unset the location as you did. |
That breaks PEP 626 though. The whole point of |
This function was only intended for line numbers. When extended position info was added, it basically became a mostly-but-not-always-working hack for those too. |
That function was added about 3 weeks ago. before that just line number was set, but that caused problems when a multiline function call had start_col > end_col. So this function tries to set the columns to something sensible. Yeah, I know my patch is not the solution. The point of my patch is to confirm that the issue is due to a mismatch between the columns and the attr name in the AST.
|
As for the solution - I would say that if someone dynamically rewrote the function name then there is no source location (the name is not in the source code at all). So we can set both col_offset and end_col_offset to -1.
|
Do you mean (a.
b()) …breaks this invariant. The issue is that that we assume the dot is on the same line as the name, which is why we adjust by one. That’s not always true. I agree that just setting them both to |
I think the 1 is not for the dot, but because end_col is the actual index of the last column (so end-start is 1 less than the length). |
I don't think >>> print(ast.dump(ast.parse("a.b").body[0], include_attributes=True, indent=4))
Expr(
value=Attribute(
value=Name(
id='a',
ctx=Load(),
lineno=1,
col_offset=0,
end_lineno=1,
end_col_offset=1),
attr='b',
ctx=Load(),
lineno=1,
col_offset=0,
end_lineno=1,
end_col_offset=3),
lineno=1,
col_offset=0,
end_lineno=1,
end_col_offset=3) At any rate, I'm not sure what
( a
.b()) |
The current code subtracts 1 to include the dot in the span: Traceback (most recent call last):
File "/home/brandtbucher/cpython/test_columns.py", line 3, in f
.foo())
^^^^
AttributeError: 'NoneType' object has no attribute 'foo' But this looks strange when the dot is on another line: Traceback (most recent call last):
File "/home/brandtbucher/cpython/test_columns.py", line 6, in <module>
foo()
^^^^
AttributeError: 'NoneType' object has no attribute 'foo' Or elsewhere on the same line: Traceback (most recent call last):
File "/home/brandtbucher/cpython/test_columns.py", line 10, in <module>
. foo()
^^^^
AttributeError: 'NoneType' object has no attribute 'foo' I propose we don't try to include the dot. It's more trouble than it's worth. |
Oh, the end line number is inclusive. Yeah I agree about the dot.
Sorry I meant to look at the columns of Unfortunately that spans the dot so yeah they can be on different lines. But I don't know if it's a deliberate choice to include the dot in the attr. If it didn't include the dot we could use the column fields instead of For 3.11 something along the lines of your PR is probably all we can do. |
FWIW, with my pytest maintainer hat on (though I've never really looked at the assertion rewriting code in detail): If you have suggestions on how pytest could keep the same functionality but make it a little bit less weird, I'd be all ears! (Maybe one day someone will get around to writing a PEP to get some kind of language support for the kind of thing pytest does, which would make those shenanigans unnecessary - I think @RonnyPfannschmidt did look at that some while ago, but I'm not sure if anything ever got out of it.) |
Removing release blocker status, but leaving the issue open as we may want to make attributes |
For this particular issue, if you can make sure that you replace “assert״ by a name of the same length, you will avoid the code path where we wipe the column info. |
This is a CPython bug IMO, pytest (and other tools like coconut, hy, etc) should be able to produce an AST with whatever locations it likes (as long as they are valid). The problem is that the compiler is "correcting" the location information from the parser. We should perform that transformation in the parser. |
I think what you are proposing is what what we are discussing in the PR that @brandtbucher made: adding enough location information to properly identify the attributes. Otherwise I don't think I would be getting what you are proposing: this function only makes sense in the compiler because it's trying to inline a multi line call. The parser currently populates correctly as much as it can. |
If the compiler needs to adjust the locations, then they aren't "correct". What I am suggesting is that the parser sets the locations to whatever we deem to be the "correct" location. That way, the compiler won't mess up the locations assigned by pytest. |
Well, the produced AST is completely correct. There is nothing that the parser is doing incorrectly here as far as I know. If I understand correctly, the compiler is trying to transform the AST and therefore needs to do some processing to the original nodes to figure out the correct final offsets. But as you can imagine, this is the compiler responsibility as the original AST produced by the parser is fully consistent. We can change it if you want or we can add or modify information, but there is nothing Intrinsically in the parser that can be considered a bug. |
More context so this is clear: the parser produces a correct AST for a multi-line method call that involves attribute access. Then the compiler does a bunch of checks to rule out imported names, calls with many arguments and then it emits an inlined |
since pytest currently expands subexpression results into variables, even method load |
@pablogsal If the compiler has to change the locations, and the output from the compiler is correct, then the input must be "incorrect". Why is adjusting the locations the compiler's responsibility? This has nothing to do with optimisation. It is about providing useful location information for calls. |
Correct means that the locations correspond to the code that generates the nodes. In any case I don't think we should discuss the metaphysics of correctness here :)
Again, the compiler does this because the compiler tries to transform the AST on the fly and infer the new locations so the generated debug info makes sense.
Because is the compiler the one doing the transformations to the original AST. |
@markshannon The issue (as we all agreed last week) is that the parser currently doesn't create a Name node for the attribute. This is why the compiler ends up trying to guess the location of the attribute from the other location info in the AST. The fix that @pablogsal suggested for this is to add this Name node in the parser. |
Exactly, that's what I said in #94694 (comment). The sentence that started this conversation is that if @markshannon was not referring to the fact that the strings do not have location in the current AST, then I was not understanding what he was proposing. |
Let me try and explain some more. Take the following statement: obj
.a()
.b()
.c() It is better for the purposes of tracing, debugging and error messages to treat the calls to @iritkatriel (
a + b
)() What should be the location of the call? Currently the PEG parser produces locations that span the full range of locations within the tree. It isn't just calls, either. It is more useful to have the location of a |
Given
The parser gives the calls lines (2,3), (2,4) and (2,5). This is because as with other expressions like With this example:
The call spans (2,3).
I strongly disagree, the parser should end producing the final AST. The whole design is so no post-processing is needed. Adding post-processing makes everything much more complicated as it makes the final output of the parser not the final product. And we already know how complicated to maintain that is because that was |
The parser produces nodes that spans the subtree. If you want the location of individual pieces in that node then you should visit the tree and get them, not mutating the tree based on the idiosyncratic nature of the receiver. The compiler is not the only consumer of this tree and what is useful for it may be super inconvenient for other consumers. And that's without considering how difficult and unmaintainable is going to be introducing manual location preferences into the machinery. |
@pablogsal That the compiler is not the only consumer of the AST is even more reason to produce locations that are useful. This is not about what is "convenient" for the compiler, but what are sensible locations for various AST nodes. Changing the locations post-parsing is a matter of walking the tree. Why would that be so difficult and unmaintainable? |
That what you define as "useful" here may not be for other consumers. AST analysis tools have been relying for years on the fact that the AST produced by the parser spans the entire subtree in every node. That was the design and that keeps being the design. In this particular case, the compiler is interested on assigning the location of a method call to the last attribute access and the parentheses, instead of the whole expression on the left hand side. But that is the preference of the compiler and may not be the same for other tools that want to do something different. You cannot just define "usefulness" by what you want the compiler to do. And here we already have a status quo that you would be breaking if we change how these nodes are defined. A call in the parser is defined as
The parser doesn't know what is in
I am arguing that that's not what the parser should do. The compiler should see the
The correct thing to do here in my opinion is what we have discussed before and agreed with @iritkatriel and @brandtbucher. Make the parser to produce location for the NAME in the attribute access so the compiler can just pick it up and therefore it has nothing to fix. You can produce the bytecode you want for the cases you mentioned directly and no manual code is needed. |
I agree that converting the right hand side of attributes to a I've opened #94758 to discuss the broader issue of what to do about locations. |
Crash report
When running
pytest
in e.g. the pytest repository on the current Python 3.11 branch, it aborts, perhaps similarly to #93387 or #92597 somehow...I've been unable to get a minimal reproducer so far, but this will reproduce it:
git clone https://github.com/pytest-dev/pytest
cd pytest
~/proj/cpython/python -m venv .venv
(with the current3.11
branch).venv/bin/pip install -e ".[testing]"
.venv/bin/pytest
I was able to bisect this to df091e1:
cc @iritkatriel @markshannon @pablogsal
Note this (thankfully) only seems to affect
--with-pydebug
builds.Error messages
This assertion seems to fail:
cpython/Python/compile.c
Line 7584 in df091e1
Unfortunately I haven't been able to get the
column
value, as it was apparently optimized out...Python stack:
C stack:
Your environment
The text was updated successfully, but these errors were encountered: