-
Notifications
You must be signed in to change notification settings - Fork 4.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix BankingStage packet starvation (1.9) #25236
Conversation
from @carllin:
this should solve 2. note that this doesn't solve 1, but it should definitely make starvation way better |
core/src/banking_stage.rs
Outdated
|
||
let (_, slot_metrics_checker_check_slot_boundary_time) = Measure::this( | ||
|_| { | ||
let current_poh_bank = { | ||
// TODO (B): constant spinning here may have an impact on poh, discuss in PR |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
something worth discussing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if you have one packet in buffered_packet_batches that exceeds cost model, it'll probably spin super fast grabbing this lock a ton, so might be good idea to rate limit it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah this is a good point, we can wrap it in a check that only checks this once every SLOT_BOUNDARY_CHECK_MS: usize = 10
Does it even make sense to mark transactions that aren't processed due to cost limits as retryable in |
ideally they should probably be moved somewhere else (maybe a separate buffer?) and on a new slot inserted back into banking stage at the front. perhaps spread across packet batches to try to get better parallelism as opposed to one batch? thats probably an issue for a separate PR. if 1.10 coming out soon, i have a feeling this will be "good enough" for 1.9 |
core/src/banking_stage.rs
Outdated
Self::process_buffered_packets( | ||
&my_pubkey, | ||
&socket, | ||
poh_recorder, | ||
cluster_info, | ||
&mut buffered_packet_batches, | ||
&forward_option, | ||
transaction_status_sender.clone(), | ||
&gossip_vote_sender, | ||
&banking_stage_stats, | ||
&recorder, | ||
data_budget, | ||
&qos_service, | ||
&mut slot_metrics_tracker, | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This also has a poh lock grabbing inside, so i think it's worth wrapping this in a if !process_buffered_packets.is_empty()
check
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ah yeah, that's probably something we should keep an eye on too. fixed this comment
core/src/banking_stage.rs
Outdated
|
||
let (_, slot_metrics_checker_check_slot_boundary_time) = Measure::this( | ||
|_| { | ||
let current_poh_bank = { | ||
// TODO (B): constant spinning here may have an impact on poh, discuss in PR |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah this is a good point, we can wrap it in a check that only checks this once every SLOT_BOUNDARY_CHECK_MS: usize = 10
@carllin can you take a look when you get a chance? |
Problem
Summary of Changes
NOTE: