Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

test.libregrtest: Race condition in runtest_mp leads to hangs (never exits) #90363

Closed
colesbury opened this issue Dec 30, 2021 · 5 comments
Closed
Labels
3.9 only security fixes 3.10 only security fixes 3.11 only security fixes tests Tests in the Lib/test dir type-bug An unexpected behavior, bug, or error

Comments

@colesbury
Copy link
Contributor

BPO 46205
Nosy @vstinner, @colesbury, @corona10, @miss-islington
PRs
  • bpo-46205: exit if no workers are alive in runtest_mp #30470
  • [3.10] bpo-46205: exit if no workers are alive in runtest_mp (GH-30470) #30523
  • [3.9] bpo-46205: exit if no workers are alive in runtest_mp (GH-30470) #30524
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = None
    closed_at = <Date 2022-01-11.03:32:45.800>
    created_at = <Date 2021-12-30.17:12:53.170>
    labels = ['type-bug', 'tests', '3.9', '3.10', '3.11']
    title = 'test.libregrtest: Race condition in runtest_mp leads to hangs (never exits)'
    updated_at = <Date 2022-01-11.03:32:45.800>
    user = 'https://github.com/colesbury'

    bugs.python.org fields:

    activity = <Date 2022-01-11.03:32:45.800>
    actor = 'corona10'
    assignee = 'none'
    closed = True
    closed_date = <Date 2022-01-11.03:32:45.800>
    closer = 'corona10'
    components = ['Tests']
    creation = <Date 2021-12-30.17:12:53.170>
    creator = 'colesbury'
    dependencies = []
    files = []
    hgrepos = []
    issue_num = 46205
    keywords = ['patch']
    message_count = 5.0
    messages = ['409374', '409740', '410271', '410272', '410273']
    nosy_count = 4.0
    nosy_names = ['vstinner', 'colesbury', 'corona10', 'miss-islington']
    pr_nums = ['30470', '30523', '30524']
    priority = 'normal'
    resolution = None
    stage = 'resolved'
    status = 'closed'
    superseder = None
    type = 'behavior'
    url = 'https://bugs.python.org/issue46205'
    versions = ['Python 3.9', 'Python 3.10', 'Python 3.11']

    @colesbury
    Copy link
    Contributor Author

    The runtest_mp.py has a race condition between checking for worker.is_alive() and processing the queue that can lead to indefinite hangs.

    The hang happens when the all the results from the self.output queue are processed but at least one of the workers hasn't finished exiting.

    def _get_result(self) -> QueueOutput | None:
    if not any(worker.is_alive() for worker in self.workers):
    # all worker threads are done: consume pending results
    try:
    return self.output.get(timeout=0)
    except queue.Empty:
    return None
    use_faulthandler = (self.ns.timeout is not None)
    timeout = PROGRESS_UPDATE
    while True:
    if use_faulthandler:
    faulthandler.dump_traceback_later(MAIN_PROCESS_TIMEOUT,
    exit=True)
    # wait for a thread
    try:
    return self.output.get(timeout=timeout)
    except queue.Empty:
    pass
    # display progress
    running = get_running(self.workers)
    if running and not self.ns.pgo:
    self.log('running: %s' % ', '.join(running))

    The main thread tries to get a result from the output queue, but the queue is empty and remains empty. Although the queue.get() operation eventually times out (after 30 seconds), the main thread does not re-check if all the workers have exited (!), but instead retries the queue.get() in the "while True" loop.

    # display progress
    running = get_running(self.workers)
    if running and not self.ns.pgo:
    self.log('running: %s' % ', '.join(running))

    To reproduce, apply the below patch which introduces a small delay to more reliably trigger the hang.

    curl "https://gist.githubusercontent.com/colesbury/fe3769f43dfb724c86ecbb182b1f6749/raw/e29a4eaeebb8d5252cdd66f3f8a70f7bc5fa14e7/runtest_mp.diff" | patch -p1
    ./python -m test test_regrtest -m test_module_from_test_autotest -v

    @colesbury colesbury added 3.8 (EOL) end of life 3.10 only security fixes 3.11 only security fixes 3.9 only security fixes type-bug An unexpected behavior, bug, or error labels Dec 30, 2021
    @vstinner
    Copy link
    Member

    vstinner commented Jan 5, 2022

    Do you want to work on a fix?

    @vstinner vstinner added the tests Tests in the Lib/test dir label Jan 5, 2022
    @vstinner vstinner changed the title Race condition in runtest_mp leads to hangs (never exits) test.libregrtest: Race condition in runtest_mp leads to hangs (never exits) Jan 5, 2022
    @vstinner vstinner added the tests Tests in the Lib/test dir label Jan 5, 2022
    @vstinner vstinner changed the title Race condition in runtest_mp leads to hangs (never exits) test.libregrtest: Race condition in runtest_mp leads to hangs (never exits) Jan 5, 2022
    @corona10
    Copy link
    Member

    New changeset e13cdca by Sam Gross in branch 'main':
    bpo-46205: exit if no workers are alive in runtest_mp (GH-30470)
    e13cdca

    @miss-islington
    Copy link
    Contributor

    New changeset e0ec08d by Miss Islington (bot) in branch '3.10':
    bpo-46205: exit if no workers are alive in runtest_mp (GH-30470)
    e0ec08d

    @miss-islington
    Copy link
    Contributor

    New changeset 690ed88 by Miss Islington (bot) in branch '3.9':
    bpo-46205: exit if no workers are alive in runtest_mp (GH-30470)
    690ed88

    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    3.9 only security fixes 3.10 only security fixes 3.11 only security fixes tests Tests in the Lib/test dir type-bug An unexpected behavior, bug, or error
    Projects
    None yet
    Development

    No branches or pull requests

    4 participants