-
Notifications
You must be signed in to change notification settings - Fork 238
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Return back stale out-of-pool scheduler by timeout #1690
Return back stale out-of-pool scheduler by timeout #1690
Conversation
8e84265
to
23f8c65
Compare
I still need to add tests and write some comments. but i think this is ready for review in impl wise... |
|
||
self.with_active_scheduler(f) | ||
} | ||
SchedulerStatus::Unavailable => panic!(), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
again should have an error message here.
Unavailable
seems to be used for more than just transitions (at first glance), so I am still trying to convince myself this cannot happen.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done: 06b7bff
scheduler.maybe_transition_from_active_to_stale(|scheduler| { | ||
let (result_with_timings, uninstalled_scheduler) = | ||
scheduler.wait_for_termination(false); | ||
uninstalled_scheduler.return_to_pool(); | ||
(scheduler_pool, result_with_timings) | ||
}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
want to make sure I understand this correctly.
The closure here is called before we transition to Stale. So we wait for termination while the scheduler is still in active state.
I guess I'm not sure what happens here; it seems this intended to kill an extremely long running scheduler. But description of wait_for_termination
says
/// This function blocks the current thread while waiting for the scheduler to complete all of
/// the executions for the scheduled transactions and to return the finalized
/// `ResultWithTimings`. This function still blocks for short period of time even in the case
/// of aborted schedulers to gracefully shutdown the scheduler (like thread joining).
So if it's extremely long-running, aren't we still blocked here? I assume I am missing something.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The closure here is called before we transition to Stale. So we wait for termination while the scheduler is still in active state.
yes. this is true.
it seems this intended to kill an extremely long running scheduler.
well, this isn't true. the intention here is to gracefully terminate extremely long idling scheduler. This is to reclaim the idling os native threads to return them to the scheduler pool.
So if it's extremely long-running, aren't we still blocked here? I assume I am missing something.
this is correct. The situation is same both for blockstore processor and unified scheduler from the very beginnings.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ah, also note that even while the scheduler constantly running with new transactions are arriving at the time of TimeoutListener triggering, this wait_for_termiantion()
will eventually return. That's because we're taking the write lock and it will stop newer transactions are entered to the scheduler. So it's guaranteed that there's some amount of bounded work before completion of wait_for_termination()
at this point, assuming there's no infinite loop bug in svm.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
also, added some explanation from this convo: 06b7bff
// Re-register a new timeout listener only after acquiring the read lock; | ||
// Otherwise, the listener would again put scheduler into Stale before the read | ||
// lock, causing panic below. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oops forgot to mention this is extremely rare race condition...
|
||
let (result, timings) = bank.wait_for_completed_scheduler().unwrap(); | ||
assert_matches!(result, Ok(())); | ||
assert_eq!(timings.metrics[ExecuteTimingType::CheckUs], 246); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is asserting that ResultWithTimings are correctly carried over across active=>stale=>active transitions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Small nit on comments, and maybe a clone we can remove - otherwise happy with PR. Feel free to merge and do small follow-up PR for those items if we agree
@@ -290,8 +290,15 @@ impl WaitReason { | |||
#[allow(clippy::large_enum_variant)] | |||
#[derive(Debug)] | |||
pub enum SchedulerStatus { | |||
/// Unified scheduler is disabled or installed scheduler is consumed by wait_for_termination(). | |||
/// Note that transition to Unavailable from {Active, Stale} is one-way (i.e. one-time). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this doesn't seem strictly true, because it seems to also be used as an intermediate state when transitioning between active <-> stale?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
wow, nice you're spotting this. I've omitted for simplicity here. I will do a follow-up for completeness.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this will be addressed by: #1797
@@ -519,15 +526,15 @@ impl BankWithSchedulerInner { | |||
); | |||
Err(SchedulerAborted) | |||
} | |||
SchedulerStatus::Stale(_pool, _result_with_timings) => { | |||
SchedulerStatus::Stale(pool, _result_with_timings) => { | |||
let pool = pool.clone(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why do we need to clone the pool?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
because this pool
is borrowed and we need to re-lock scheduler
with the write access:
error[E0505]: cannot move out of `scheduler` because it is borrowed
--> runtime/src/installed_scheduler_pool.rs:532:22
|
517 | let scheduler = self.scheduler.read().unwrap();
| --------- binding `scheduler` declared here
518 | match &*scheduler {
| --------- borrow of `scheduler` occurs here
...
532 | drop(scheduler);
| ^^^^^^^^^ move out of `scheduler` occurs here
...
552 | pool.register_timeout_listener(self.do_create_timeout_listener());
| ---- borrow later used here
For more information about this error, try `rustc --explain E0505`.
error: could not compile `solana-runtime` (lib) due to 1 previous error
* Return back stale out-of-pool scheduler by timeout * Add tests * Document and proper panic messages * Avoid recursion * Tweak comments
Problem
From #1211:
Summary of Changes
This pr addresses (5).
The impl is a bit involved.
In a gist,
BankWithScheduler
now registers a callback (TimeoutListener
) tosolScCleaner
to detect its own stalemate from outside. for that end,solScCleaner
unconditionally invokes the callback after fixed duration (>> 1 slot = 400ms). Then the callback returns back schedulers to the pool, if the bank isn't yet frozen. So, there's new state ofBankWithScheduler
in which it has stale scheduler. that's modeled with newSchedulerStatus
with tri-stateenum
, replacingOption
.This whole setup is needed, because this stale scheduler returning must be done transparently from the view point of transaction scheduling. Accordingly, the scheduling code path is adjusted to revive the stale state when it detects and re-take the scheduler from the pool (= called resuming in code) with the last-saved
ResultWithTimings
in theSchedulerStaus::Stale
.Thanks to the nature of this ouf-of-band mechanism, there's no addtional overhead in the latency-sensitive code path and it works even if there's no newly-arriving transactions at all.
Lastly, note that encapsulating this inside
solana-unifinied-scheduler
would necessitate introduction of some kind of synchronization mechanism. Also, this is why i used the term session in the crate not to associate 1-to-1 for slot-to-session. it's now 1-to-N. (most of time, it's practically still 1-to-1, though).