Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PEP 669: for loop fires LINE multiple times for body, but only one for for #104239

Closed
nedbat opened this issue May 6, 2023 · 13 comments · Fixed by #104387
Closed

PEP 669: for loop fires LINE multiple times for body, but only one for for #104239

nedbat opened this issue May 6, 2023 · 13 comments · Fixed by #104387
Assignees
Labels
type-bug An unexpected behavior, bug, or error

Comments

@nedbat
Copy link
Member

nedbat commented May 6, 2023

I'm experimenting with the new monitoring interface in PEP 669. The LINE event happens multiple times for the line in the body of a for loop (as I expect), but the for statement itself only gets an event on entering, not each time around the loop.

This code:

import sys

def loop3():
    for i in range(3):
        print(i)
    print("done")

the_code = loop3.__code__

def line_callback(code, line_number):
    assert code == the_code
    print(f"LINE: {line_number}")

my_id = sys.monitoring.COVERAGE_ID
sys.monitoring.use_tool_id(my_id, "repro")
sys.monitoring.register_callback(my_id, sys.monitoring.events.LINE, line_callback)
sys.monitoring.set_local_events(my_id, the_code, sys.monitoring.events.LINE)

loop3()

produces:

LINE: 4
LINE: 5
0
LINE: 5
1
LINE: 5
2
LINE: 6
done

I would expect Line 4 to be monitored after each "LINE: 5", including after the last one.

This is with commit 263abd3 of CPython.

@nedbat nedbat added the type-bug An unexpected behavior, bug, or error label May 6, 2023
@terryjreedy
Copy link
Member

terryjreedy commented May 6, 2023

Perhaps because iteration jumps into the middle of the bytecode for the header line, and line events only trigger on the first bytecode of the line. To me, "LINE: An instruction is about to be executed that has a different line number from the preceding instruction." can reasonably be interpreted the way you expected. But I am not sure what 'instruction' means in this context. "bytecode" is specific to CPython and the feature is not meant to be so limited. We need a draft code addition in order to ask such questions.

Anyway, I am glad to see a specific example of monitoring in use.

@nedbat
Copy link
Member Author

nedbat commented May 6, 2023

The disassembly looks like this:

Disassembly of <code object loop3 at 0x1011f7cc0, file "loop3.py", line 3>:
  3           0 RESUME                   0

  4           2 LOAD_GLOBAL              1 (NULL + range)
             12 LOAD_CONST               1 (3)
             14 CALL                     1
             22 GET_ITER
        >>   24 FOR_ITER                13 (to 54)
             28 STORE_FAST               0 (i)

  5          30 LOAD_GLOBAL              3 (NULL + print)
             40 LOAD_FAST                0 (i)
             42 CALL                     1
             50 POP_TOP
             52 JUMP_BACKWARD           15 (to 24)

  4     >>   54 END_FOR

  6          56 LOAD_GLOBAL              3 (NULL + print)
             66 LOAD_CONST               2 ('done')
             68 CALL                     1
             76 POP_TOP
             78 RETURN_CONST             0 (None)

Line 5 ends with a JUMP_BACKWARD to 24, which is in line 4, so shouldn't it call the line callback again?

@gaogaotiantian
Copy link
Member

Line 5 ends with a JUMP_BACKWARD to 24, which is in line 4, so shouldn't it call the line callback again?

I believe only the first instruction on each line is instrumented as INSTRUMENTED_LINE, which would trigger a LINE event. All the others will be INSTRUMENTED_INSTRUCTION so no LINE event will be triggered.

You can use dis.dis(loop3, adaptive=True) to confirm (of course after instrumenting the code).

I would guess @markshannon expected the tools to handle this with BRANCH event. We can probably produce a LINE event in instrumented branches? Not sure if that's the way to go, might have other side effects(besides the obvious performance hit).

@nedbat
Copy link
Member Author

nedbat commented May 7, 2023

I believe only the first instruction on each line is instrumented as INSTRUMENTED_LINE, which would trigger a LINE event. All the others will be INSTRUMENTED_INSTRUCTION so no LINE event will be triggered.

I don't understand: why do I get three events for line 5?

@terryjreedy
Copy link
Member

By the hypothesis of me and Tian, because the first bytecode of line 5, the load global of 'print', (and as Tian added, INSTRUMENTED_LINE) is executed 3 times.

@nedbat
Copy link
Member Author

nedbat commented May 7, 2023

By the hypothesis of me and Tian, because the first bytecode of line 5, the load global of 'print', (and as Tian added, INSTRUMENTED_LINE) is executed 3 times.

If that is correct, and the behavior I'm seeing is expected, then the PEP needs to be updated:

LINE: An instruction is about to be executed that has a different line number from the preceding instruction.

@markshannon
Copy link
Member

LINE: An instruction is about to be executed that has a different line number from the preceding instruction.

"Preceding" here means the instruction immediately before it in the code object, not the instruction previously executed.

The LOAD_GLOBAL bytecode on line 5 emits a "line" event and the loop iterates 3 times; so three "line 5" events.

Because the jump at the end of line 5 jumps into the middle of line 4, there is only one "line 4" event.

@nedbat
Copy link
Member Author

nedbat commented May 9, 2023

I thought that debuggers would use the LINE event to step by lines, but maybe I misunderstood. If they did, stepping wouldn't behave properly, because I'd expect a debugger to visit line 4 a number of times.

@markshannon
Copy link
Member

Maybe we could trigger LINE events for any instruction that has any predecessor with a different line number.
It would complicate the instrumentation code, as we would need to mark jump targets.

We also need to maintain the behavior of sys.settrace, which limits what we can do a bit, but that code needs to handle extra line events anyway.

@fabioz any thoughts on this?

@terryjreedy
Copy link
Member

IDLE's bdb-based debugger on 3.12.0a7 stops 4 times on line 4 (3 + 1 for Stopping) and 3 times for line 5. I presume stepping in pdb, also based on bdb, does the same. A new debugger should continue stopping on the header line when stepping. This is even more important for while loops, where one might want to see local values to debug the loop condition.

@gaogaotiantian
Copy link
Member

I agree that we should stop at the for and while line. We only need to change the branch target instruction to INSTRUCTION_LINE and handle the extra LINE event right?

@fabioz
Copy link
Contributor

fabioz commented May 10, 2023

Maybe we could trigger LINE events for any instruction that has any predecessor with a different line number. It would complicate the instrumentation code, as we would need to mark jump targets.

We also need to maintain the behavior of sys.settrace, which limits what we can do a bit, but that code needs to handle extra line events anyway.

@fabioz any thoughts on this?

I think that whenever the line changes (regardless of being marked as the entry point of the line or not) the line event should be triggered (which I guess is mostly what happens with the current sys.settrace, where I guess the logic worked reasonably).

The best place for that logic is probably inside the new monitoring mechanism, otherwise all clients will probably end up having to do some workarounds which would probably be more costly than what can be done inside of the new monitoring code (I'm not even sure what'd be the workaround for clients at this point, maybe doing additional bookeping using both the LINE as well as the JUMP and BRANCH and then deciding whether it changed -- but it seems more costly doing it from the outside... doing it with INSTRUCTION with that logic would probably be even more costly).

@markshannon
Copy link
Member

My original thinking was that this would be too slow.
But, as you point out, it is the overall cost (VM plus tool) that really matters, as long as the cost is zero when turned off.

So here's a plan (for LINE events, nothing else changes)

  • Instrument all instructions where a possible predecessor has a different line number, which includes all exception handlers
  • Check dynamically for a new line in the INSTRUMENTED_LINE instruction
  • Simplify legacy tracing support, as it only needs to handle backwards jumps within lines, everything else is handled by the INSTRUMENTED_LINE instruction

This should also fix #103471

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type-bug An unexpected behavior, bug, or error
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants