Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Design and implement graceful shutdown for Zebra #1678

Closed
3 tasks
Tracked by #2310
teor2345 opened this issue Feb 3, 2021 · 3 comments
Closed
3 tasks
Tracked by #2310

Design and implement graceful shutdown for Zebra #1678

teor2345 opened this issue Feb 3, 2021 · 3 comments
Labels
A-rust Area: Updates to Rust code C-design Category: Software design work C-enhancement Category: This is an improvement

Comments

@teor2345
Copy link
Contributor

teor2345 commented Feb 3, 2021

Motivation

In #1637, we check is_shutting_down in the CheckpointVerifier and network service.

But we'd like to implement graceful shutdown, where Zebra's endless loops wait on a shutdown condition or future.

We might also be able to implement this automatically using shutdown_timeout:
https://docs.rs/tokio/1.6.1/tokio/runtime/struct.Runtime.html#method.shutdown_timeout

Here's tokio's advice on graceful shutdown:
https://tokio.rs/tokio/topics/shutdown

Ideas

Alternative designs:

  • endless loops wait on a boolean condition, like is_shutting_down
  • endless loops select on a shutdown future, like zebra-network's cancel handles
  • each spawned task waits on select between a shutdown future and the actual task closure
    • replace tokio::spawn with zebra::spawn
    • could use a global watch channel with 3 states: Running, ShuttingDown, ExitNow
    • code that needs to exit can use the watch sender to set ShuttingDown
  • do we need a new zebra-shutdown crate? (current users: zebra-state, zebrad)
  • should library crates return a task error / service readiness error instead, so the app can shut down?
    • check if any other services can error
  • do we also need to use an at_exit handler? Should it just set ExitNow?
  • how do we handle panics?
    • currently we abort, but we could propagate them to the calling task, or set ExitNow

Related Issues

Specific issues that this fix solves:

@conradoplg
Copy link
Collaborator

Here's one panic that can happen when Ctrl+C'ing Zebra

Error

ClientRequest oneshot sender must not be dropped before send: Canceled

Metadata

key value
version 1.0.0-alpha.13+4.g86ff25c
Zcash network Mainnet
state version 5
branch finalized-state-in-chain-verification
git commit 86ff25c
commit timestamp 2021-07-16T06:26:12+00:00
target triple x86_64-unknown-linux-gnu
build profile debug
location zebra-network/src/peer/client.rs:260:26

SpanTrace

SpanTrace:
   0: zebrad::components::sync::obtain_tips
             at zebrad/src/components/sync.rs:367
   1: zebrad::components::sync::sync
             at zebrad/src/components/sync.rs:262
   2: zebrad::application::
           with zebrad="86ff25c" net="Main"
             at zebrad/src/application.rs:365

@mpguerra mpguerra added Epic Zenhub Label. Denotes a theme of work under which related issues will be grouped and removed Epic Zenhub Label. Denotes a theme of work under which related issues will be grouped labels Nov 24, 2021
@teor2345
Copy link
Contributor Author

We're gradually doing this as errors come up.

@teor2345
Copy link
Contributor Author

teor2345 commented Jun 2, 2022

This seems to work and has worked for months.

mpguerra added a commit that referenced this issue May 19, 2023
mergify bot pushed a commit that referenced this issue May 23, 2023
* ZIPs were updated to remove ambiguity, this was tracked in #1267.

* #2105 was fixed by #3039 and #2379 was closed by #3069

* #2230 was a duplicate of #2231 which was closed by #2511

* #3235 was obsoleted by #2156 which was fixed by #3505

* #1850 was fixed by #2944, #1851 was fixed by #2961 and #2902 was fixed by #2969

* We migrated to Rust 2021 edition in Jan 2022 with #3332

* #1631 was closed as not needed

* #338 was fixed by #3040 and #1162 was fixed by #3067

* #2079 was fixed by #2445

* #4794 was fixed by #6122

* #1678 stopped being an issue

* #3151 was fixed by #3934

* #3204 was closed as not needed

* #1213 was fixed by #4586

* #1774 was closed as not needed

* #4633 was closed as not needed

* Clarify behaviour of difficulty spacing

Co-authored-by: teor <teor@riseup.net>

* Update comment to reflect implemented behaviour

Co-authored-by: teor <teor@riseup.net>

* Update comment to reflect implemented behaviour when retrying block downloads

Co-authored-by: teor <teor@riseup.net>

* Update `TODO` to remove closed issue and clarify when we might want to fix

Co-authored-by: teor <teor@riseup.net>

* Update `TODO` to remove closed issue and clarify what we might want to change in future

Co-authored-by: teor <teor@riseup.net>

* Clarify benefits of how we do block verification

Co-authored-by: teor <teor@riseup.net>

* Fix rustfmt errors

---------

Co-authored-by: teor <teor@riseup.net>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-rust Area: Updates to Rust code C-design Category: Software design work C-enhancement Category: This is an improvement
Projects
None yet
Development

No branches or pull requests

4 participants