Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Shutdowner: Support calling from fx.Invoke #1076

Merged
merged 7 commits into from
May 8, 2023

Conversation

abhinav
Copy link
Collaborator

@abhinav abhinav commented Apr 29, 2023

Stacked on top of:

However, since I can't push branches directly to this repository,
this PR shows commits from all PRs.


App.Start nils out the "last" signal recorded by signalReceivers,
which it otherwise broadcasts to waiters if it was already received.
This is unnecessary especially because there's a discrepancy in behavior
of using App.Start vs App.Run when shutting down from fx.Invoke.

Given a program that calls Shutdown from fx.Invoke,
when we do:

app := fx.New(...)

The shutdowner has already sent the signal, and signalReceivers has
already recorded it.
At that point, whether we call App.Start or App.Run changes behavior:

  • If we call App.Run, that calls App.Done (or App.Wait after App.Run: Respect Shutdowner ExitCode #1075),
    which gives it back a channel that already has the signal filled in.
    It then calls App.Start, waits on the channel--which returns
    immediately--and then calls App.Stop.
  • If we call App.Start and App.Wait, on the other hand,
    Start will clear the signal recorded in signalReceivers,
    and then App.Wait will build a channel that will block indefinitely
    because Shutdowner.Shutdown will not be called again.

So even though App.Run() and App.Start()+App.Wait() are meant to be
equivalent, this causes a serious discrepancy in behavior.
It makes sense to resolve this by supporting Shutdown from Invoke.

Refs #1074

@codecov
Copy link

codecov bot commented Apr 29, 2023

Codecov Report

Merging #1076 (2c05edc) into master (ec9b096) will not change coverage.
The diff coverage is 100.00%.

❗ Current head 2c05edc differs from pull request most recent head 98f0f8e. Consider uploading reports for the commit 98f0f8e to get more accurate results

@@           Coverage Diff           @@
##           master    #1076   +/-   ##
=======================================
  Coverage   98.52%   98.52%           
=======================================
  Files          39       39           
  Lines        2984     2984           
=======================================
  Hits         2940     2940           
  Misses         37       37           
  Partials        7        7           
Impacted Files Coverage Δ
shutdown.go 100.00% <ø> (ø)
app.go 97.22% <100.00%> (ø)
signal.go 100.00% <100.00%> (ø)

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

@abhinav
Copy link
Collaborator Author

abhinav commented May 2, 2023

Coverage issue will be resolved by #1078

@JacobOaks
Copy link
Contributor

Spent a little bit looking at this and I think it makes sense.

Have we considered switching the order of execution of Start vs. Wait in the case of calling Run to more closely match the Start -> Wait -> Stop flow? For example, run can take a channel-generating function rather than a channel itself, and we can pass Wait:

func (app *App) Run() {
    if code := app.run(app.Wait); code != 0 {
        // ...
    }
}

func (app *App) run(doneFn func() <-chan ShutdownSignal) (exitCode int) {
// ...
if err := app.Start(); err != nil { /* ... */ }
sig := <-doneFn()
// ...
}

This would cause shutdown from invokes to no longer work when using Run making it a breaking change by itself, but in combination with this PR it shouldn't be breaking, and it might be a good idea to prevent future changes/features from introducing additional disparities between app.Run and app.Start -> app.Wait -> app.Stop. Just a thought.

@abhinav
Copy link
Collaborator Author

abhinav commented May 3, 2023

That's a good idea! I'll make the switch.

@@ -109,7 +109,6 @@ func (recv *signalReceivers) Start(ctx context.Context) {
return
}

recv.last = nil
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I'm understanding this correctly, since this is never cleared, if a shutdown signal is sent to the app, all successive runs of the app will always immediately terminate too. Not sure if that's an issue/how big of one it is, but I do know some folks rely on running an app multiple times. Maybe there's somewhere else we can clear it (in Stop)?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, that's a good call. Made an update: signalReceivers is now re-usable with your suggested change.
Added a test that starts the app and shuts it down N times with different exit codes.

@abhinav
Copy link
Collaborator Author

abhinav commented May 6, 2023

Actually, that's a problem. Looks like receivers.Stop is called from both, App.Stop and Shutdowner.Shutdown. I'll see if that can be fixed.

@abhinav
Copy link
Collaborator Author

abhinav commented May 6, 2023

Okay, I fixed it but then the ShutdownTimeout option is meaningless. I'll dig a little.

@abhinav
Copy link
Collaborator Author

abhinav commented May 6, 2023

Okay, so @JacobOaks @sywhang, Shutdown has a ShutdownTimeout option that doesn't make sense.
Shutdown can't be a blocking operation.
signalReceivers has the operations: Start, Stop, Broadcast, Wait (and its old version, Done).
Start starts a background goroutine to relay information, and Stop stops it.
It makes no sense for Shutdowner.Shutdown to call Stop if Shutdown is just to unblock a <-Wait() and a corresponding Stop.

Let me see if I can wrangle this apart a bit.

@abhinav
Copy link
Collaborator Author

abhinav commented May 6, 2023

A good signal that the ShutdownTimeout option makes little sense is this in the documentation:

 // [..] As the Shutdown method will block while waiting for a signal receiver relay
 // goroutine to stop.

That's an internal implementation detail.

TestDataRace never calls App.Stop, which can cause a resource leak.
This currently passes because of the implementation detail that
Shutdowner.Shutdown happens to call signalReceivers.Stop
instead of just Shutdown, which may not always be true.
Shutdowner.Shutdown is not a blocking operation.
It does not need to block and wait for the relay goroutine to stop;
signalReceiver.Stop will do that.

To clarify this further, signalReceivers has the following operations:

- Start: starts a relay goroutine
- Stop: stops the relay goroutine and waits for it
- Wait: create and return a `chan ShutdownSignal`
- Done: create and return a `chan os.Signal`
- Broadcast: Send the message to all waiting channels

The only reason that the relay goroutine exists is
to map `os.Signal`s received by `os/signal.Notify`
into an `fx.ShutdownSignal`.

Shutdowner.Shutdown should not call signalReceivers.Stop
because the relay goroutine should keep running
so that we can re-use Shutdowner.Shutdown.

This change exposed a leak in TestDataRace (fixed in uber-go#1081).
Adds a test for calling Shutdown from an fx.Invoke.
This is partially tested in uber-go#1075 already,
but as reported in uber-go#1074, it doesn't work if Start is used.
This adds a test for that case as well.
App.Start nils out the "last" signal recorded by signalReceivers,
which it otherwise broadcasts to waiters if it was already received.
This is unnecessary especially because there's a discrepancy in behavior
of using App.Start vs App.Run when shutting down from fx.Invoke.

Given a program that calls Shutdown from fx.Invoke,
when we do:

    app := fx.New(...)

The shutdowner has already sent the signal, and signalReceivers has
already recorded it.
At that point, whether we call App.Start or App.Run changes behavior:

- If we call App.Run, that calls App.Done (or App.Wait after uber-go#1075),
  which gives it back a channel that already has the signal filled in.
  It then calls App.Start, waits on the channel--which returns
  immediately--and then calls App.Stop.
- If we call App.Start and App.Wait, on the other hand,
  Start will clear the signal recorded in signalReceivers,
  and then App.Wait will build a channel that will block indefinitely
  because Shutdowner.Shutdown will not be called again.

So even though App.Run() and App.Start()+App.Wait() are meant to be
equivalent, this causes a serious discrepancy in behavior.
It makes sense to resolve this by supporting Shutdown from Invoke.

Refs uber-go#1074
Changes Run to match how we tell users to use Start and Wait:

    Start()
    <-Wait()
    Stop()

Versus the prior version which was:

    Wait()
    Start()
    Stop()

This caused a difference in behavior between Start and Run.
@abhinav
Copy link
Collaborator Author

abhinav commented May 6, 2023

Updated. Check PRs stacked below this for more context.

Copy link
Contributor

@JacobOaks JacobOaks left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

Copy link
Contributor

@sywhang sywhang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

@sywhang sywhang merged commit 7174fde into uber-go:master May 8, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

3 participants