Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

__static_attributes__ is not deterministic #124442

Closed
kp2pml30 opened this issue Sep 24, 2024 · 5 comments
Closed

__static_attributes__ is not deterministic #124442

kp2pml30 opened this issue Sep 24, 2024 · 5 comments
Labels
3.13 bugs and security fixes interpreter-core (Objects, Python, Grammar, and Parser dirs) type-bug An unexpected behavior, bug, or error

Comments

@kp2pml30
Copy link
Contributor

kp2pml30 commented Sep 24, 2024

Bug report

Bug description:

Tested on commit: e9b00cc

Order of strings in __static_attributes__ is not deterministic

Running the following a couple of times produces different hashes:

/opt/cpython/cross-build/build/python ../../Programs/_freeze_module.py zipimport ../../Lib/zipimport.py Python/frozen_modules/zipimport.h
sha256sum ./Python/frozen_modules/zipimport.h

I patched this generator to output dis as follows:

diff --git a/Programs/_freeze_module.py b/Programs/_freeze_module.py
index ba638ee..7e837e9 100644
--- a/Programs/_freeze_module.py
+++ b/Programs/_freeze_module.py
@@ -24,6 +24,9 @@ def compile_and_marshal(name: str, text: bytes) -> bytes:
     filename = f"<frozen {name}>"
     # exec == Py_file_input
     code = compile(text, filename, "exec", optimize=0, dont_inherit=True)
+    import dis
+    with open("/tmp/a", "at") as f:
+        dis.dis(code, file=f)
     return marshal.dumps(code)

Then disassembly differs in one place (after removing at 0x[a-f0-9]*):

diff --git a/tmp/a b/tmp/b
index 372a97c..4010046 100644
--- a/tmp/a
+++ b/tmp/b
@@ -291,7 +291,7 @@ Disassembly of <code object zipimporter , file "<frozen zipimport>", line 50>:
 289           LOAD_CONST              15 (<code object __repr__ , file "<frozen zipimport>", line 289>)
               MAKE_FUNCTION
               STORE_NAME              16 (__repr__)
-              LOAD_CONST              16 (('archive', 'prefix'))
+              LOAD_CONST              16 (('prefix', 'archive'))
               STORE_NAME              17 (__static_attributes__)
               RETURN_CONST             4 (None)

diff in .h files is quite small, so I assume only sorting is affected:

-    0,0,0,41,2,114,46,0,0,0,114,44,0,0,0,169,
+    0,0,0,41,2,114,44,0,0,0,114,46,0,0,0,169,

CPython versions tested on:

3.13

Operating systems tested on:

No response

Linked PRs

@kp2pml30 kp2pml30 added the type-bug An unexpected behavior, bug, or error label Sep 24, 2024
@Eclips4 Eclips4 added the interpreter-core (Objects, Python, Grammar, and Parser dirs) label Sep 24, 2024
@JelleZijlstra
Copy link
Member

This is because we store u_static_attributes as a set inside the compiler, and set iteration order is nondeterministic. While we don't guarantee any order for this attribute, I can see how nondeterministic ordering is problematic if you want reproducible builds.

We could fix this either by using some structure other than a set (e.g., a dict with dummy values), or by sorting the attributes when we convert them from a set into a tuple. cc @iritkatriel

@kp2pml30
Copy link
Contributor Author

kp2pml30 commented Sep 24, 2024

This patch seems to fix it

diff --git a/Python/compile.c b/Python/compile.c
index 7752a68..dfaa5a4 100644
--- a/Python/compile.c
+++ b/Python/compile.c
@@ -2544,7 +2544,14 @@ compiler_class_body(struct compiler *c, stmt_ty s, int firstlineno)
         return ERROR;
     }
     assert(c->u->u_static_attributes);
-    PyObject *static_attributes = PySequence_Tuple(c->u->u_static_attributes);
+    PyObject *static_attributes_unsorted = PySequence_List(c->u->u_static_attributes);
+    if (static_attributes_unsorted == NULL) {
+        compiler_exit_scope(c);
+        return ERROR;
+    }
+    PyList_Sort(static_attributes_unsorted);
+    PyObject *static_attributes = PySequence_Tuple(static_attributes_unsorted);
+    Py_CLEAR(static_attributes_unsorted);
     if (static_attributes == NULL) {
         compiler_exit_scope(c);
         return ERROR;

If it's ok I can open a pr to not waste core team time

@gpshead
Copy link
Member

gpshead commented Sep 24, 2024

Go ahead and create a PR. Sorting at compile time prior to tupling will work. Another option would be to use a dict instead of a set which preserves insertion order. But I think it is much wiser for us to have the tuple always be in sorted order so that changes in the order of otherwise logically equivalent code do not result in different ordering in the tuple.

@gpshead gpshead added the 3.13 bugs and security fixes label Sep 24, 2024
@gpshead
Copy link
Member

gpshead commented Sep 24, 2024

@Yhg1s fyi (probably not a release blocker - distributors who need the canonical output here should wind up backporting the fix that we'd have for 3.13.1)

@terryjreedy terryjreedy changed the title __static_attributes__ is not deteministic __static_attributes__ is not deteministic Sep 24, 2024
@ncoghlan ncoghlan changed the title __static_attributes__ is not deteministic __static_attributes__ is not deterministic Sep 25, 2024
kp2pml30 added a commit to kp2pml30/cpython that referenced this issue Sep 25, 2024
…y sorting

Signed-off-by: kp2pml30 <kp2pml30@gmail.com>
kp2pml30 added a commit to kp2pml30/cpython that referenced this issue Sep 25, 2024
…y sorting

Signed-off-by: kp2pml30 <kp2pml30@gmail.com>
kp2pml30 added a commit to kp2pml30/cpython that referenced this issue Sep 25, 2024
…y sorting

Signed-off-by: kp2pml30 <kp2pml30@gmail.com>
kp2pml30 added a commit to kp2pml30/cpython that referenced this issue Sep 25, 2024
…y sorting

Signed-off-by: kp2pml30 <kp2pml30@gmail.com>
JelleZijlstra added a commit that referenced this issue Sep 28, 2024
…4492)

Signed-off-by: kp2pml30 <kp2pml30@gmail.com>
Co-authored-by: Jelle Zijlstra <jelle.zijlstra@gmail.com>
Co-authored-by: Adam Turner <9087854+AA-Turner@users.noreply.github.com>
JelleZijlstra added a commit to JelleZijlstra/cpython that referenced this issue Sep 28, 2024
… sorting (pythonGH-124492)

(cherry picked from commit 04c837d)

Co-authored-by: Kira <kp2pml30@gmail.com>
Signed-off-by: kp2pml30 <kp2pml30@gmail.com>
Co-authored-by: Jelle Zijlstra <jelle.zijlstra@gmail.com>
Co-authored-by: Adam Turner <9087854+AA-Turner@users.noreply.github.com>
Yhg1s pushed a commit that referenced this issue Oct 1, 2024
…ng (GH-124492) (#124738)

* [3.13] gh-124442: make `__static_attributes__` deterministic by sorting (GH-124492)
(cherry picked from commit 04c837d)

Co-authored-by: Kira <kp2pml30@gmail.com>
Signed-off-by: kp2pml30 <kp2pml30@gmail.com>
Co-authored-by: Jelle Zijlstra <jelle.zijlstra@gmail.com>
Co-authored-by: Adam Turner <9087854+AA-Turner@users.noreply.github.com>
@picnixz
Copy link
Member

picnixz commented Oct 6, 2024

Can this be closed? I don't think we need more deterministic processing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
3.13 bugs and security fixes interpreter-core (Objects, Python, Grammar, and Parser dirs) type-bug An unexpected behavior, bug, or error
Projects
None yet
Development

No branches or pull requests

5 participants