Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

shutdown gracefully without os.Exit #7480

Merged
merged 3 commits into from
Oct 14, 2020
Merged

Conversation

yihuang
Copy link
Collaborator

@yihuang yihuang commented Oct 8, 2020

Description

Change the behaviour of signal handling to gracefully shutdown rather than os.Exit.
The purpose is to give defer function or wrapper function a chance to run after it quit.
Particularly in our case, we want to check test coverage of our externally running integration tests, we take this approach. basically we wrap the main function in a test function, and hope the main function returns normally when signaled with SIGINT/SIGTERM, so the test binary will output the coverage report.


Before we can merge this PR, please make sure that all the following items have been
checked off. If any of the checklist items are not applicable, please leave them but
write a little note why.

  • Targeted PR against correct branch (see CONTRIBUTING.md)
  • Linked to Github issue with discussion and accepted design OR link to spec that describes this work.
  • Code follows the module structure standards.
  • Wrote unit and integration tests
  • Updated relevant documentation (docs/) or specification (x/<module>/spec/)
  • Added relevant godoc comments.
  • Added a relevant changelog entry to the Unreleased section in CHANGELOG.md
  • Re-reviewed Files changed in the Github PR explorer
  • Review Codecov Report in the comment section below once CI passes

@yihuang yihuang force-pushed the shutdown-gracefully branch 2 times, most recently from 7d3d6bf to 32f528c Compare October 8, 2020 04:38
@codecov
Copy link

codecov bot commented Oct 8, 2020

Codecov Report

Merging #7480 into master will decrease coverage by 0.18%.
The diff coverage is n/a.

@@            Coverage Diff             @@
##           master    #7480      +/-   ##
==========================================
- Coverage   55.01%   54.82%   -0.19%     
==========================================
  Files         448      597     +149     
  Lines       30321    37996    +7675     
==========================================
+ Hits        16681    20833    +4152     
- Misses      11994    15049    +3055     
- Partials     1646     2114     +468     

Copy link
Contributor

@alessio alessio left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense to me. utACK.

Copy link
Contributor

@alexanderbez alexanderbez left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't quite follow why this approach is better. Can you provide concrete examples? Having a single signal trap that encapsulates all cleanup logic is preferable to me.

@yihuang
Copy link
Collaborator Author

yihuang commented Oct 8, 2020

I don't quite follow why this approach is better. Can you provide concrete examples? Having a single signal trap that encapsulates all cleanup logic is preferable to me.

The method we use to collect test coverage when doing external testing is described here. The example code is here.

Basically what we do is wrapping the main function in a test function:

func TestBincoverRunMain(t *testing.T) {
	bincover.RunTest(main)
}

And built it with go test -coverpkg=github.com/cosmos/cosmos-sdk/...,./... -c -o ./chain-maind-inst, and run it with chain-maind-inst -test.coverprofile=coverage.out start --home=..., so when the main function returns normally, the binary will save the test coverage report in coverage.out.
so we start the devnet with it, run external integration tests(in python) against it, when the testing is done, we send signal SIGTERM to quit the chain process, and hope it output the coverage report before quit, that's why we don't want it to just os.Exit.

@alexanderbez
Copy link
Contributor

Is there a way in the TrapSignal to ensure you capture the coverage report?

@yihuang
Copy link
Collaborator Author

yihuang commented Oct 8, 2020

Is there a way in the TrapSignal to ensure you capture the coverage report?

The report generation is handled by go internal after the test functions returned, what I observe currently is it's not generated with os.Exit.

@alexanderbez
Copy link
Contributor

I see. What if we added a time/sleep before os.Exit?

@yihuang
Copy link
Collaborator Author

yihuang commented Oct 9, 2020

I see. What if we added a time/sleep before os.Exit?

I think it won't work unless the main function returns normally, because the wrapping code don't have a chance to run.

@alessio
Copy link
Contributor

alessio commented Oct 9, 2020

I see. What if we added a time/sleep before os.Exit?

I think it won't work unless the main function returns normally, because the wrapping code don't have a chance to run.

And it'd be very, very ugly...

server/start.go Outdated
Comment on lines 293 to 312
if tmNode.IsRunning() {
_ = tmNode.Stop()
}

if apiSrv != nil {
_ = apiSrv.Close()
}
if cpuProfileCleanup != nil {
cpuProfileCleanup()
}

if grpcSrv != nil {
grpcSrv.Stop()
}
if apiSrv != nil {
_ = apiSrv.Close()
}

ctx.Logger.Info("exiting...")
})
if grpcSrv != nil {
grpcSrv.Stop()
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok can we at least group these in a single cleanup function?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

made it into a defer expression now.

server/start.go Outdated
if tmNode.IsRunning() {
_ = tmNode.Stop()
}
WaitForQuitSignals()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We're not using the return value here.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added the _ = now.

@yihuang yihuang force-pushed the shutdown-gracefully branch 2 times, most recently from af475f3 to 0834b90 Compare October 10, 2020 03:04
@yihuang yihuang force-pushed the shutdown-gracefully branch 2 times, most recently from df74d17 to 696d612 Compare October 10, 2020 03:06
Copy link
Contributor

@alexanderbez alexanderbez left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. But @alessio, don't we want to exit with the status code resulting from WaitForQuitSignals?

@yihuang
Copy link
Collaborator Author

yihuang commented Oct 11, 2020

LGTM. But @alessio, don't we want to exit with the status code resulting from WaitForQuitSignals?

we might need a ErrorCode{int} for that, I can do it if we need that.

@alessio
Copy link
Contributor

alessio commented Oct 12, 2020

we might need a ErrorCode{int} for that, I can do it if we need that.

That'd be great

@yihuang yihuang force-pushed the shutdown-gracefully branch 3 times, most recently from 78e74cf to 3619164 Compare October 12, 2020 14:43
sigs := make(chan os.Signal, 1)
signal.Notify(sigs, syscall.SIGINT, syscall.SIGTERM)
sig := <-sigs
return ErrorCode{Code: int(sig.(syscall.Signal)) + 128}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

❤️

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why adding 128?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because that is the standard way you handle -signal process termination in UNIX (n+128)

server/util.go Outdated Show resolved Hide resolved
Copy link
Contributor

@alessio alessio left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tests are failing - looks great otherwise

"github.com/cosmos/cosmos-sdk/simapp/simd/cmd"
)

func main() {
rootCmd, _ := cmd.NewRootCmd()
if err := cmd.Execute(rootCmd); err != nil {
os.Exit(1)
switch e := err.(type) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we log an error as well?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Exit code + logging information already provided by the underlying goroutines should do it all.

Copy link
Collaborator Author

@yihuang yihuang Oct 14, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Current error output when received SIGTERM is like this:

Error: 143
Usage:
  chain-maind start [flags]

Flags:
  ...

maybe add an error string into the ErrorCode?

server/util.go Outdated Show resolved Hide resolved
Copy link
Contributor

@alessio alessio left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks good to me

@alessio alessio added the A:automerge Automatically merge PR once all prerequisites pass. label Oct 14, 2020
@mergify mergify bot merged commit f260ca4 into cosmos:master Oct 14, 2020
clevinson pushed a commit that referenced this pull request Oct 29, 2020
* shutdown gracefully without os.Exit

* Update server/util.go

Co-authored-by: Alessio Treglia <quadrispro@ubuntu.com>
Co-authored-by: Alessio Treglia <alessio@tendermint.com>
larry0x pushed a commit to larry0x/cosmos-sdk that referenced this pull request May 22, 2023
* shutdown gracefully without os.Exit

* Update server/util.go

Co-authored-by: Alessio Treglia <quadrispro@ubuntu.com>
Co-authored-by: Alessio Treglia <alessio@tendermint.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A:automerge Automatically merge PR once all prerequisites pass.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants