Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

gh-91351: Support re-entrancy in importlib/_bootstrap.py #94342

Closed

Conversation

exarkun
Copy link
Contributor

@exarkun exarkun commented Jun 27, 2022

This is a PR against the 3.9 branch (I will forward-port changes after addressing other review feedback).

Note that most of the size of the diff for this PR is generated changes to re-freeze _bootstrap.py and most of what's left after subtracting that is new comments about how the implementation works to make it easier to understand.

See #91351 for details about the problem.

Re-entrancy is always tricky and given the requirements of _bootstrap.py (to operate with re-entrancy and multi-threading and to do so without exposing any of the details to application code doing an import) I think this goes double.

This PR does a few things to achieve better safety in the face of re-entrancy:

  • Switch some data structures to those that support atomic operation so that they are consistent in case of asynchronous re-entrancy (eg from the garbage collector or a signal handler).
  • Add more RLock-like behavior to prevent deadlocks in the re-entrant case.
  • Update the deadlock detection algorithm to support the fact that one thread might be "blocked on" acquiring the module lock for more than one module at a time.

I'm not quite sure I believe this new version of the code is 100% correct with respect to re-entrancy but it does fixes mishandling of two specific cases:

  • A re-entrant import is performed between the time _blocking_on is populated and cleaned up inside _ModuleLock.acquire. Previous this failed with a KeyError (as described in the linked issue).
  • A re-entrant import is performed while the module lock is held inside _ModuleLock.acquire. Previously this failed by deadlocking.

This PR also does not include any new unit tests. I have a small stand-alone program which can reproduce both of these but only with the assistance of some additional instrumentation inside _bootstrap.py to make sure the re-entrancy happens at the interesting times. If adding this kind of instrumentation is acceptable then it may be possible to turn this program into some unit tests.

It may also be possible to simplify _BlockingOnManager by switching _ModuleLock.lock to an RLock. That solution didn't originally occur to me so I developed this - but if others think that is a better approach I think it's a fairly simple change.

For reference, here is a stand-alone reproducer. This one isn't quite deterministic but by running the codepath over and over it seems to be fairly reliable in reproducing one of the problem codepaths on my system. For a completely deterministic reproducer, I think _bootstrap.py instrumentation is required.

import sys, socket, gc

class Cycle:
    pass

def a_cycle():
    c = Cycle()
    c.cycle = c
    c.s = socket.socket()

def main():
    while True:
        # import a module that socket.__del__ is going to import to exercise
        # re-entrant _ModuleLock.lock handling
        a_cycle()
        import linecache

        del sys.modules["linecache"]

main()

This might make it a little bit easier for a new reader/maintainer to
understand what this code is doing.

Also, link to a couple application-level bug reports about probable
misbehaviors related to the deadlock-detection code.
This more clearly separates the logic for that management from the application
code that runs in this context.

It will also make subsequent changes to improve that logic more clear.
Switch to a recursive implementation so it can easily follow the branching
path through the "blocking on" graph that is now possible thanks to
re-entrancy.
@cpython-cla-bot
Copy link

cpython-cla-bot bot commented Jun 27, 2022

All commit authors signed the Contributor License Agreement.
CLA signed

@AA-Turner
Copy link
Member

Hi @exarkun -- 3.9 is in security fix only mode, and I don't think this qualifies -- in addition the Python workflow is to use backports rather than forward ports.

I'm closing this PR, please can we discuss further on the issue?

A

@AA-Turner AA-Turner closed this Jun 27, 2022
@exarkun
Copy link
Contributor Author

exarkun commented Jun 27, 2022

I'm closing this PR, please can we discuss further on the issue?

Okay.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants