fix(repo): Added panic hooks and reworked graceful shutdown #278

0xterminator · 2024-10-23T07:44:18Z

No description provided.

lostman

I'm not sure if this is what we want. Flushing async streams is tricky and we'd be better off assuming the publisher may die in the middle of publishing a block.

I talked to @0xterminator about it yesterday, and I think that the best and simplest thing we can do is this:

Ensure we publish the entire block N before we start publishing N+1.
Re-publish the last block on startup.

Currently, we look up the last published block L and we start at L+1. I have a PR that changes that to restart at L, precisely because the publisher may die before publishing inputs, receipts, and so on.

lostman · 2024-10-23T10:53:21Z

crates/fuel-streams-core/src/stream/stream_impl.rs

@@ -301,20 +301,6 @@ impl<S: Streamable> Stream<S> {
        }
    }

-    pub async fn flush_await(


lostman · 2024-10-23T10:56:25Z

crates/fuel-streams-publisher/src/publisher.rs

@@ -189,6 +177,26 @@ impl Publisher {
        &self.streams
    }

+    fn set_panic_hook(&mut self) {


This overrides the default panic hook, so if the publisher panics, we won't get the panic message.

We will, the first thing I am doing in printing the stack above!

lostman · 2024-10-23T10:57:41Z

crates/fuel-streams-publisher/src/publisher.rs

+            let nats_client = nats_client.clone();
+            let fuel_service = Arc::clone(&fuel_service);
+            handle.spawn(async move {
+                Publisher::flush_await_all_streams(&nats_client).await;


Flushing streams after panic is unreliable. async may be in a compromised state.

Its a usual practice to at least attempt some cleanup. Usually if we die, we might just end up in a really bad state. Also , note that both methods flush_await_all_streams and stop_fuel do not throw exceptions, i.e. they cannot fail whatsoever.

Yes, you can do sync cleanup but not grab the Tokio runtime. Here's an example:

use tokio::runtime::Handle; use std::panic; fn set_panic_hook() { panic::set_hook(Box::new(|panic_info| { println!("Panic occurred: {}", panic_info); // can panic let handle = Handle::current(); // can panic handle.spawn(async { println!("Performing async cleanup in panic hook..."); // can panic tokio::time::sleep(tokio::time::Duration::from_secs(1)).await; println!("Async cleanup completed."); }); })); } #[tokio::main] async fn main() { set_panic_hook(); panic!("This is a test panic!"); }

Panic occurred: panicked at src/main.rs:27:5: This is a test panic! Performing async cleanup in panic hook... <EOF>

Run it several times and you'll see that you may not even make it to

Performing async cleanup in panic hook...

@lostman that happens because the owner process, main dies, right? At that point, we're leveraging the auto-destructor from Tokio -- so graceful shutdown of processes still happens and we simply need our operations to be recoverable from panics.

To make it more fine-grained, we could use CancellationToken & TaskTracker and similar APIs (like tokio-graceful-shutdown) but since our operations are supposed to be idempotent, I think this solution is quite ergonomic. wdyt?

Jurshsmith

LGTM 👍🏾

…278)" This reverts commit 1817350.

0xterminator requested review from luizstacio, pedronauck, lostman and Jurshsmith as code owners October 23, 2024 07:44

0xterminator force-pushed the feat/eugene/elastic branch from 322efa4 to 4292875 Compare October 23, 2024 07:46

fix(repo): Added panic hooks and reworked graceful shutdown

85e18c2

0xterminator force-pushed the feat/eugene/elastic branch from 4292875 to 85e18c2 Compare October 23, 2024 08:06

lostman suggested changes Oct 23, 2024

View reviewed changes

Jurshsmith reviewed Oct 24, 2024

View reviewed changes

Jurshsmith approved these changes Oct 24, 2024

View reviewed changes

pedronauck approved these changes Oct 24, 2024

View reviewed changes

pedronauck merged commit 1817350 into main Oct 24, 2024
15 checks passed

pedronauck deleted the feat/eugene/elastic branch October 24, 2024 18:10

lostman added a commit that referenced this pull request Oct 24, 2024

Revert "fix(repo): Added panic hooks and reworked graceful shutdown (#…

a45718e

…278)" This reverts commit 1817350.

lostman mentioned this pull request Oct 24, 2024

feat(publisher): Revert "Added panic hooks and reworked graceful shutdown" #293

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(repo): Added panic hooks and reworked graceful shutdown #278

fix(repo): Added panic hooks and reworked graceful shutdown #278

0xterminator commented Oct 23, 2024

lostman left a comment

lostman Oct 23, 2024

0xterminator Oct 24, 2024

lostman Oct 23, 2024

0xterminator Oct 24, 2024

lostman Oct 23, 2024

0xterminator Oct 24, 2024

lostman Oct 24, 2024

Jurshsmith Oct 24, 2024

Jurshsmith Oct 24, 2024 •

edited

Loading

Jurshsmith left a comment

fix(repo): Added panic hooks and reworked graceful shutdown #278

fix(repo): Added panic hooks and reworked graceful shutdown #278

Conversation

0xterminator commented Oct 23, 2024

lostman left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Jurshsmith Oct 24, 2024 • edited Loading

Choose a reason for hiding this comment

Jurshsmith left a comment

Choose a reason for hiding this comment

Jurshsmith Oct 24, 2024 •

edited

Loading