Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use EMA to compute QUIC streamer load for staked connections #34586

Merged
merged 14 commits into from
Jan 11, 2024

Conversation

pgarg66
Copy link
Contributor

@pgarg66 pgarg66 commented Dec 24, 2023

Problem

The limit for staked connection streams doesn't use current load. If the load is less, more streams can be allowed to the connections.

Summary of Changes

  • Add support for computing load using EMA
  • Update limit calculation logic to use the load
  • Add/update unit tests.

Fixes #

Copy link

codecov bot commented Dec 24, 2023

Codecov Report

Attention: 14 lines in your changes are missing coverage. Please review.

Comparison is base (5cb30cf) 81.8% compared to head (8883a7a) 81.8%.
Report is 1 commits behind head on master.

Additional details and impacted files
@@            Coverage Diff            @@
##           master   #34586     +/-   ##
=========================================
- Coverage    81.8%    81.8%   -0.1%     
=========================================
  Files         823      823             
  Lines      222725   222910    +185     
=========================================
+ Hits       182332   182447    +115     
- Misses      40393    40463     +70     

@pgarg66 pgarg66 marked this pull request as ready for review December 24, 2023 02:03
streamer/src/nonblocking/quic.rs Outdated Show resolved Hide resolved
streamer/src/nonblocking/quic.rs Outdated Show resolved Hide resolved
lijunwangs
lijunwangs previously approved these changes Jan 2, 2024
Copy link
Contributor

@lijunwangs lijunwangs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks good to me

streamer/src/nonblocking/quic.rs Outdated Show resolved Hide resolved
streamer/src/nonblocking/quic.rs Outdated Show resolved Hide resolved
streamer/src/nonblocking/quic.rs Outdated Show resolved Hide resolved
let mut last_throttling_instant = tokio::time::Instant::now();
let mut streams_in_current_interval = 0;
let staked_stream = matches!(peer_type, ConnectionPeerType::Staked) && params.stake > 0;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why do we need to check both of these things this deep in the call stack? seems like it should've been resolved much higher

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When we initially did the QUIC implementation, I remember that peers with stake = 0 were still marked as ConnectionPeerType::Staked.
The inverse is not true though. I can change it to this. Would that be prefered?

let staked_stream = params.stake > 0;

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's leave it as is for now and fix ConnectionPeerType::Staked where params.stake == 0 in a followup. either the enum is misnamed or we're patching up a bug all over the place

@pgarg66 pgarg66 requested a review from t-nelson January 4, 2024 03:08
@pgarg66
Copy link
Contributor Author

pgarg66 commented Jan 5, 2024

@lijunwangs, @t-nelson, do you have any other feedback?

@pgarg66 pgarg66 requested a review from t-nelson January 5, 2024 22:21
@t-nelson
Copy link
Contributor

t-nelson commented Jan 8, 2024

can we switch all of the as casts to explicit From for upcasts and TryFrom for downcasts? as silently truncates downcasts, which is not the bug we want to be chasing. we can log/metric downcast failures this way

@pgarg66
Copy link
Contributor Author

pgarg66 commented Jan 9, 2024

can we switch all of the as casts to explicit From for upcasts and TryFrom for downcasts? as silently truncates downcasts, which is not the bug we want to be chasing. we can log/metric downcast failures this way

Done.

@t-nelson
Copy link
Contributor

t-nelson commented Jan 9, 2024

lgtm! how are the canary boxes looking?

@pgarg66
Copy link
Contributor Author

pgarg66 commented Jan 9, 2024

lgtm! how are the canary boxes looking?

The nodes are up, but I am not seeing much streams getting opened. Likely because the clients are not sending transactions directly to these machine. On tpu-forwards port I do see some streams, but very sporadic. So the EMA tends to be closer to 0 all the time.

I'll try to run a private cluster.

@pgarg66
Copy link
Contributor Author

pgarg66 commented Jan 9, 2024

lgtm! how are the canary boxes looking?

The nodes are up, but I am not seeing much streams getting opened. Likely because the clients are not sending transactions directly to these machine. On tpu-forwards port I do see some streams, but very sporadic. So the EMA tends to be closer to 0 all the time.

I'll try to run a private cluster.

The testing on private cluster showed that the number of streams in an EMA interval was always tending to be 1 or 0. This is because the streams are short lived, as each transaction opens a stream, sends the transaction and closes the stream. It makes more sense to accumulate number of streams that were opened in the recent EMA interval, and 0 it out once the EMA has been updated.

Copy link
Contributor

@t-nelson t-nelson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm! let's move the ema stuff out to its own module in a followup. quic.rs is getting a little bloated

@pgarg66 pgarg66 merged commit 904700c into solana-labs:master Jan 11, 2024
35 checks passed
@pgarg66 pgarg66 deleted the load-ema branch January 11, 2024 18:05
lijunwangs pushed a commit to lijunwangs/solana that referenced this pull request Apr 18, 2024
…labs#34586)

* Use EMA to compute QUIC streamer load for staked connections

* change min load to 25% of max load

* reduce max PPS from 500K to 250K

* update ema_function to account for missing intervals

* replace f64 math with u128

* track streams across all connections for a peer

* u128  -> u64

* replace ' as ' type conversion to from and try_from

* add counter for u64 overflow

* reset recent stream load on ema interval

* do not use same counter for unstaked connections from a peer IP
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants