Return back stale out-of-pool scheduler by timeout #1690

ryoqun · 2024-06-11T14:50:10Z

Problem

Specifically, there are following several termination conditions which it must handle respectively:

...
5. There could be many active (= taken-out-of-the-pool) unified schedulers under very forky network conditions for extended duration of lack of rooting. In that case, many native os threads will be created just to sit idling collectively because each unified scheduler instance separately creates and manages its own set of threads individually (for perf reasons). So, idling scheduler should be returned back to the pool. Then, eventually those pooled-back schedulers will be retired according to the (3) after the forky situation is resolved.

Summary of Changes

This pr addresses (5).

The impl is a bit involved.

In a gist, BankWithScheduler now registers a callback (TimeoutListener) to solScCleaner to detect its own stalemate from outside. for that end, solScCleaner unconditionally invokes the callback after fixed duration (>> 1 slot = 400ms). Then the callback returns back schedulers to the pool, if the bank isn't yet frozen. So, there's new state of BankWithScheduler in which it has stale scheduler. that's modeled with new SchedulerStatus with tri-state enum, replacing Option.

This whole setup is needed, because this stale scheduler returning must be done transparently from the view point of transaction scheduling. Accordingly, the scheduling code path is adjusted to revive the stale state when it detects and re-take the scheduler from the pool (= called resuming in code) with the last-saved ResultWithTimings in the SchedulerStaus::Stale.

Thanks to the nature of this ouf-of-band mechanism, there's no addtional overhead in the latency-sensitive code path and it works even if there's no newly-arriving transactions at all.

Lastly, note that encapsulating this inside solana-unifinied-scheduler would necessitate introduction of some kind of synchronization mechanism. Also, this is why i used the term session in the crate not to associate 1-to-1 for slot-to-session. it's now 1-to-N. (most of time, it's practically still 1-to-1, though).

ryoqun · 2024-06-11T15:27:17Z

I still need to add tests and write some comments. but i think this is ready for review in impl wise...

runtime/src/installed_scheduler_pool.rs

apfitzge · 2024-06-12T02:25:38Z

runtime/src/installed_scheduler_pool.rs

+
+                self.with_active_scheduler(f)
+            }
+            SchedulerStatus::Unavailable => panic!(),


again should have an error message here.

Unavailable seems to be used for more than just transitions (at first glance), so I am still trying to convince myself this cannot happen.

done: 06b7bff

runtime/src/installed_scheduler_pool.rs

apfitzge · 2024-06-12T02:58:32Z

runtime/src/installed_scheduler_pool.rs

+            scheduler.maybe_transition_from_active_to_stale(|scheduler| {
+                let (result_with_timings, uninstalled_scheduler) =
+                    scheduler.wait_for_termination(false);
+                uninstalled_scheduler.return_to_pool();
+                (scheduler_pool, result_with_timings)
+            })


want to make sure I understand this correctly.
The closure here is called before we transition to Stale. So we wait for termination while the scheduler is still in active state.

I guess I'm not sure what happens here; it seems this intended to kill an extremely long running scheduler. But description of wait_for_termination says

/// This function blocks the current thread while waiting for the scheduler to complete all of /// the executions for the scheduled transactions and to return the finalized /// `ResultWithTimings`. This function still blocks for short period of time even in the case /// of aborted schedulers to gracefully shutdown the scheduler (like thread joining).

So if it's extremely long-running, aren't we still blocked here? I assume I am missing something.

The closure here is called before we transition to Stale. So we wait for termination while the scheduler is still in active state.

yes. this is true.

it seems this intended to kill an extremely long running scheduler.

well, this isn't true. the intention here is to gracefully terminate extremely long idling scheduler. This is to reclaim the idling os native threads to return them to the scheduler pool.

So if it's extremely long-running, aren't we still blocked here? I assume I am missing something.

this is correct. The situation is same both for blockstore processor and unified scheduler from the very beginnings.

ah, also note that even while the scheduler constantly running with new transactions are arriving at the time of TimeoutListener triggering, this wait_for_termiantion() will eventually return. That's because we're taking the write lock and it will stop newer transactions are entered to the scheduler. So it's guaranteed that there's some amount of bounded work before completion of wait_for_termination() at this point, assuming there's no infinite loop bug in svm.

also, added some explanation from this convo: 06b7bff

ryoqun · 2024-06-12T07:00:44Z

runtime/src/installed_scheduler_pool.rs

+                // Re-register a new timeout listener only after acquiring the read lock;
+                // Otherwise, the listener would again put scheduler into Stale before the read
+                // lock, causing panic below.


oops forgot to mention this is extremely rare race condition...

ryoqun · 2024-06-12T07:08:51Z

unified-scheduler-pool/src/lib.rs

+
+        let (result, timings) = bank.wait_for_completed_scheduler().unwrap();
+        assert_matches!(result, Ok(()));
+        assert_eq!(timings.metrics[ExecuteTimingType::CheckUs], 246);


this is asserting that ResultWithTimings are correctly carried over across active=>stale=>active transitions.

apfitzge

Small nit on comments, and maybe a clone we can remove - otherwise happy with PR. Feel free to merge and do small follow-up PR for those items if we agree

apfitzge · 2024-06-12T08:21:13Z

runtime/src/installed_scheduler_pool.rs

@@ -290,8 +290,15 @@ impl WaitReason {
 #[allow(clippy::large_enum_variant)]
 #[derive(Debug)]
 pub enum SchedulerStatus {
+    /// Unified scheduler is disabled or installed scheduler is consumed by wait_for_termination().
+    /// Note that transition to Unavailable from {Active, Stale} is one-way (i.e. one-time).


this doesn't seem strictly true, because it seems to also be used as an intermediate state when transitioning between active <-> stale?

wow, nice you're spotting this. I've omitted for simplicity here. I will do a follow-up for completeness.

this will be addressed by: #1797

apfitzge · 2024-06-12T08:50:38Z

runtime/src/installed_scheduler_pool.rs

@@ -519,15 +526,15 @@ impl BankWithSchedulerInner {
                );
                Err(SchedulerAborted)
            }
-            SchedulerStatus::Stale(_pool, _result_with_timings) => {
+            SchedulerStatus::Stale(pool, _result_with_timings) => {
+                let pool = pool.clone();


why do we need to clone the pool?

because this pool is borrowed and we need to re-lock scheduler with the write access:

error[E0505]: cannot move out of `scheduler` because it is borrowed --> runtime/src/installed_scheduler_pool.rs:532:22 | 517 | let scheduler = self.scheduler.read().unwrap(); | --------- binding `scheduler` declared here 518 | match &*scheduler { | --------- borrow of `scheduler` occurs here ... 532 | drop(scheduler); | ^^^^^^^^^ move out of `scheduler` occurs here ... 552 | pool.register_timeout_listener(self.do_create_timeout_listener()); | ---- borrow later used here For more information about this error, try `rustc --explain E0505`. error: could not compile `solana-runtime` (lib) due to 1 previous error

* Return back stale out-of-pool scheduler by timeout * Add tests * Document and proper panic messages * Avoid recursion * Tweak comments

ryoqun mentioned this pull request Jun 11, 2024

Make unified schedler abort on tx execution errors #1211

Merged

Return back stale out-of-pool scheduler by timeout

23f8c65

ryoqun force-pushed the stale-taken-scheduler-returning branch from 8e84265 to 23f8c65 Compare June 11, 2024 15:16

ryoqun requested a review from apfitzge June 11, 2024 15:25

apfitzge reviewed Jun 12, 2024

View reviewed changes

ryoqun added 3 commits June 12, 2024 12:42

Add tests

81b1f86

Document and proper panic messages

06b7bff

Avoid recursion

2dbb57b

ryoqun requested a review from apfitzge June 12, 2024 06:52

ryoqun commented Jun 12, 2024

View reviewed changes

Tweak comments

6984409

apfitzge approved these changes Jun 13, 2024

View reviewed changes

ryoqun merged commit 3342215 into anza-xyz:master Jun 13, 2024
40 checks passed

This was referenced Jun 17, 2024

Don't panic when pausing stale unified schedulers #1761

Merged

Improve SchedulerStatus code and test as follow-up #1797

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Return back stale out-of-pool scheduler by timeout #1690

Return back stale out-of-pool scheduler by timeout #1690

ryoqun commented Jun 11, 2024 •

edited

Loading

ryoqun commented Jun 11, 2024 •

edited

Loading

apfitzge Jun 12, 2024

ryoqun Jun 12, 2024

apfitzge Jun 12, 2024

ryoqun Jun 12, 2024

ryoqun Jun 12, 2024 •

edited

Loading

ryoqun Jun 12, 2024

ryoqun Jun 12, 2024

ryoqun Jun 12, 2024

apfitzge left a comment

apfitzge Jun 12, 2024

ryoqun Jun 13, 2024

ryoqun Jun 20, 2024

apfitzge Jun 12, 2024

ryoqun Jun 13, 2024

Return back stale out-of-pool scheduler by timeout #1690

Return back stale out-of-pool scheduler by timeout #1690

Conversation

ryoqun commented Jun 11, 2024 • edited Loading

Problem

Summary of Changes

ryoqun commented Jun 11, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ryoqun Jun 12, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

apfitzge left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ryoqun commented Jun 11, 2024 •

edited

Loading

ryoqun commented Jun 11, 2024 •

edited

Loading

ryoqun Jun 12, 2024 •

edited

Loading