Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

gh-99108: Release the GIL around hashlib built-in computation #104675

Merged
merged 7 commits into from
May 23, 2023

Conversation

gpshead
Copy link
Member

@gpshead gpshead commented May 20, 2023

This matches the GIL releasing behavior of our existing _hashopenssl
module, extending it to the HACL* built-ins.

gpshead added 2 commits May 19, 2023 19:46
This matches the GIL releasing behavior of our existing `_hashopenssl`
module, extending it to the HACL* built-ins.
@gpshead gpshead self-assigned this May 20, 2023
@gpshead gpshead changed the title gh-99108: Release the GIL around hashlib built-in hash updates. gh-99108: Release the GIL around hashlib built-in computation May 20, 2023
@gpshead
Copy link
Member Author

gpshead commented May 20, 2023

@msprotz - does this make sense to you?

@msprotz
Copy link
Contributor

msprotz commented May 20, 2023

Do you have a reference for how the Python GC works? I'm reading https://wiki.python.org/moin/GlobalInterpreterLock, https://docs.python.org/3/c-api/init.html#thread-state-and-the-global-interpreter-lock, and https://docs.python.org/3/c-api/memory.html -- is there any other good resource?

Basically I'd like to understand if:

  • a compaction can be triggered while the C code no longer has the GIL (I assume not, but it'd be good to confirm), and
  • if the C code needs to increase the refcount of e.g. the input data (I also assume not, since the callstack owns the object somewhere, its refcount must remain > 1, but also would be good to confirm).

I'd like to read up a little bit before giving you a thumbs-up. Thanks!

@gpshead
Copy link
Member Author

gpshead commented May 20, 2023

CPython is purely references counted with no compaction or moving of objects in memory. The GC exists solely to deal with reference cycles. It never rearranges memory and never frees memory behind any object with a non-zero reference count. We've got a writeup on that at https://devguide.python.org/internals/garbage-collector/. It has been this way since Python 2.0 when the cyclic GC was introduced. (before that, reference cycles were memory leaks)

No memory returned from Python C APIs will ever be moved or freed so long as it belongs to referenced objects. The PyBytes objects the hash functions receive always have positive refcounts by definition, thus the PyBytes_AsStringAndSize() returned pointer is safely passed synchronously to other C code with GIL released as our thread owns that immutable object.

This PR applies identical logic to what _hashopenssl has done for a very long time to release the GIL. The lock per hash object is added to avoid code being able to call into the C hash state mutation APIs on a given instance from multiple threads at once. (It'd be clearly buggy code design if anyone ever did - our goal is just to avoid undefined behavior of C API misuse should anyone ever try)

Copy link
Contributor

@msprotz msprotz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok that makes a lock more sense and now I understand the purpose of the lock. Thanks for the pointers!

Overall I think it would be helpful to document the intent behind this locking behavior, notably:

  • what you said about avoiding locking up the CPU in case there's lots of data to hash
  • the defensive lock that isn't strictly required but helps guard against rogue C API clients
  • the lazy initialization and the fact that the module can't assume obj->lock is non-NULL (may be uninitialized or, as I understand it, lock creation may have failed).

Other than that, as far as I can tell, this looks fine. Thanks!

Modules/md5module.c Show resolved Hide resolved
Modules/md5module.c Show resolved Hide resolved
Modules/md5module.c Show resolved Hide resolved
@gpshead
Copy link
Member Author

gpshead commented May 22, 2023

Agreed, I'll work on some common code comments to apply to everywhere relevant about these patterns.

@gpshead gpshead marked this pull request as ready for review May 22, 2023 23:03
@gpshead gpshead requested a review from tiran as a code owner May 22, 2023 23:03
@gpshead gpshead added the needs backport to 3.12 bug and security fixes label May 22, 2023
@gpshead gpshead enabled auto-merge (squash) May 22, 2023 23:05
@gpshead gpshead merged commit 2e5d8a9 into python:main May 23, 2023
@miss-islington
Copy link
Contributor

Thanks @gpshead for the PR 🌮🎉.. I'm working now to backport this PR to: 3.12.
🐍🍒⛏🤖

miss-islington pushed a commit to miss-islington/cpython that referenced this pull request May 23, 2023
…ythonGH-104675)

This matches the GIL releasing behavior of our existing `_hashopenssl`
module, extending it to the HACL* built-ins.

Includes adding comments to better describe the ENTER/LEAVE macros
purpose and explain the lock strategy in both existing and new code.
(cherry picked from commit 2e5d8a9)

Co-authored-by: Gregory P. Smith <greg@krypto.org>
@bedevere-bot
Copy link

GH-104776 is a backport of this pull request to the 3.12 branch.

@bedevere-bot bedevere-bot removed the needs backport to 3.12 bug and security fixes label May 23, 2023
@gpshead gpshead deleted the hashlib_hacl_dropGIL branch May 23, 2023 00:22
gpshead added a commit that referenced this pull request May 23, 2023
…H-104675) (#104776)

gh-99108: Release the GIL around hashlib built-in computation (GH-104675)

This matches the GIL releasing behavior of our existing `_hashopenssl`
module, extending it to the HACL* built-ins.

Includes adding comments to better describe the ENTER/LEAVE macros
purpose and explain the lock strategy in both existing and new code.
(cherry picked from commit 2e5d8a9)

Co-authored-by: Gregory P. Smith [Google] <greg@krypto.org>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants