-
-
Notifications
You must be signed in to change notification settings - Fork 30.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Bytecode positions seem way too broad #93691
Comments
I can perhaps see why the argument could be made that we should have the location info for certain constructs span their entire block or jump range, but to me this just feels like the shape of the AST and the design of the compiler are leaking into the bytecode more than is really helpful in practice. |
As a quick-and-dirty experiment: for the given example function, setting |
I don't think that many of these things were conscious decisions. Originally we added and enabled the infrastructure so the debug information could be propagated and we spent some time doing small optimisations, but we are missing a full pass over the compiler to fix things like this. Additionally, there are many instructions that don't really benefit from having position information because they are either artificial or don't map well to source code. I think this is a very good find. With what seems like a small tedious amount of work we could reduce substantially the size for some functions, specially in block setup stuff. Thanks for opening the issue and the insights @brandtbucher, this is very interesting indeed. |
I don't think the shape of the AST is "leaking" into the bytecode. The AST defines the locations. We should make the location used explicit when generating code, not use the implicit location stored in the compiler. |
…ng it on the global compiler state (GH-98001)
* main: (31 commits) pythongh-95913: Move subinterpreter exper removal to 3.11 WhatsNew (pythonGH-98345) pythongh-95914: Add What's New item describing PEP 670 changes (python#98315) Remove unused arrange_output_buffer function from zlibmodule.c. (pythonGH-98358) pythongh-98174: Handle EPROTOTYPE under macOS in test_sendfile_fallback_close_peer_in_the_middle_of_receiving (python#98316) pythonGH-98327: Reduce scope of catch_warnings() in _make_subprocess_transport (python#98333) pythongh-93691: Compiler's code-gen passes location around instead of holding it on the global compiler state (pythonGH-98001) pythongh-97669: Create Tools/build/ directory (python#97963) pythongh-95534: Improve gzip reading speed by 10% (python#97664) pythongh-95913: Forward-port int/str security change to 3.11 What's New in main (python#98344) pythonGH-91415: Mention alphabetical sort ordering in the Sorting HOWTO (pythonGH-98336) pythongh-97930: Merge with importlib_resources 5.9 (pythonGH-97929) pythongh-85525: Remove extra row in doc (python#98337) pythongh-85299: Add note warning about entry point guard for asyncio example (python#93457) pythongh-97527: IDLE - fix buggy macosx patch (python#98313) pythongh-98307: Add docstring and documentation for SysLogHandler.createSocket (pythonGH-98319) pythongh-94808: Cover `PyFunction_GetCode`, `PyFunction_GetGlobals`, `PyFunction_GetModule` (python#98158) pythonGH-94597: Deprecate child watcher getters and setters (python#98215) pythongh-98254: Include stdlib module names in error messages for NameErrors (python#98255) Improve speed. Reduce auxiliary memory to 16.6% of the main array. (pythonGH-98294) [doc] Update logging cookbook with an example of custom handling of levels. (pythonGH-98290) ...
…tors (pythonGH-120330) (cherry picked from commit 97b69db) Co-authored-by: Irit Katriel <1055913+iritkatriel@users.noreply.github.com>
…t iterators (pythonGH-120330). (cherry picked from commit 97b69db) Co-authored-by: Irit Katriel <1055913+iritkatriel@users.noreply.github.com>
…ructions (pythonGH-120125) (cherry picked from commit eca3f77) Co-authored-by: Irit Katriel <1055913+iritkatriel@users.noreply.github.com>
…ructions (pythonGH-120125) (cherry picked from commit eca3f77) Co-authored-by: Irit Katriel <1055913+iritkatriel@users.noreply.github.com>
(Note that
dis
currently has a bug in displaying accurate location info in the presence ofCACHE
s. The correct information can be observed by working withco_positions
directly or using the code from that PR.)While developing
specialist
, I realized that there are lots of common code patterns that produce bytecode with unexpectedly large source ranges. In addition to being unhelpful for both friendly tracebacks (the original motivation) and things like bytecode introspection, I suspect these huge ranges may also be bloating the size of our internal position tables as well.Consider the following function:
Things that should probably span one line at most:
GET_ITER
/FOR_ITER
pair span all of lines 4 through 10.GET_ITER
/FOR_ITER
pair spans all of lines 5 through 10.POP_JUMP_FORWARD_IF_FALSE
spans all of lines 6 through 9.POP_JUMP_FORWARD_IF_FALSE
spans all of lines 8 through 9.with
cleanup each span all of lines 3 through 10.Things that should probably be artificial:
JUMP_FORWARD
spans all of line 7.JUMP_BACKWARD
spans all of line 10.JUMP_BACKWARD
spans all of lines 5 through 10.Things I don't get:
NOP
spans all of lines 4 through 10.As a result, over half of the generated bytecode for this function claims to span line 9, for instance. Also not shown here: the instructions for building functions and classes have similarly huge spans.
I think this can be tightened up in the compiler by:
SET_LOC
on child nodes.UNSET_LOC
before unconditional jumps.Linked PRs
The text was updated successfully, but these errors were encountered: