Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

runtime: preemption in startTemplateThread may cause infinite hang [1.14 backport] #38933

Closed
gopherbot opened this issue May 7, 2020 · 2 comments
Labels
CherryPickApproved Used during the release process for point releases FrozenDueToAge
Milestone

Comments

@gopherbot
Copy link
Contributor

@prattmic requested issue #38931 to be considered for backport to the next 1.14 minor release.

@gopherbot backport please 1.13 1.14

@gopherbot
Copy link
Contributor Author

Change https://golang.org/cl/234885 mentions this issue: [release-branch.go1.14] runtime: disable preemption in startTemplateThread

@andybons andybons added the CherryPickApproved Used during the release process for point releases label May 27, 2020
@gopherbot gopherbot removed the CherryPickCandidate Used during the release process for point releases label May 27, 2020
@gopherbot
Copy link
Contributor Author

Closed by merging 5112209 to release-branch.go1.14.

gopherbot pushed a commit that referenced this issue May 27, 2020
…hread

When a locked M wants to start a new M, it hands off to the template
thread to actually call clone and start the thread. The template thread
is lazily created the first time a thread is locked (or if cgo is in
use).

stoplockedm will release the P (_Pidle), then call handoffp to give the
P to another M. In the case of a pending STW, one of two things can
happen:

1. handoffp starts an M, which does acquirep followed by schedule, which
will finally enter _Pgcstop.

2. handoffp immediately enters _Pgcstop. This only occurs if the P has
no local work, GC work, and no spinning M is required.

If handoffp starts an M, and must create a new M to do so, then newm
will simply queue the M on newmHandoff for the template thread to do the
clone.

When a stop-the-world is required, stopTheWorldWithSema will start the
stop and then wait for all Ps to enter _Pgcstop. If the template thread
is not fully created because startTemplateThread gets stopped, then
another stoplockedm may queue an M that will never get created, and the
handoff P will never leave _Pidle. Thus stopTheWorldWithSema will wait
forever.

A sequence to trigger this hang when STW occurs can be visualized with
two threads:

  T1                                 T2
-------------------------------   -----------------------------

LockOSThread                      LockOSThread
  haveTemplateThread == 0
  startTemplateThread
    haveTemplateThread = 1
    newm                            haveTemplateThread == 1
      preempt -> schedule           g.m.lockedExt++
        gcstopm -> _Pgcstop         g.m.lockedg = ...
        park                        g.lockedm = ...
                                    return

                                 ... (any code)
                                   preempt -> schedule
                                     stoplockedm
                                       releasep -> _Pidle
                                       handoffp
                                         startm (first 3 handoffp cases)
                                          newm
                                            g.m.lockedExt != 0
                                            Add to newmHandoff, return
                                       park

Note that the P in T2 is stuck sitting in _Pidle. Since the template
thread isn't running, the new M will not be started complete the
transition to _Pgcstop.

To resolve this, we disable preemption around the assignment of
haveTemplateThread and the creation of the template thread in order to
guarantee that if handTemplateThread is set then the template thread
will eventually exist, in the presence of stops.

For #38931
Fixes #38933

Change-Id: I50535fbbe2f328f47b18e24d9030136719274191
Reviewed-on: https://go-review.googlesource.com/c/go/+/232978
Run-TryBot: Michael Pratt <mpratt@google.com>
Reviewed-by: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
(cherry picked from commit 11b3730)
Reviewed-on: https://go-review.googlesource.com/c/go/+/234885
Reviewed-by: Cherry Zhang <cherryyz@google.com>
@golang golang locked and limited conversation to collaborators May 27, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
CherryPickApproved Used during the release process for point releases FrozenDueToAge
Projects
None yet
Development

No branches or pull requests

2 participants