Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Congestion model] Implement SoftBackpressure strategy. #10894

Closed
wants to merge 1 commit into from

Conversation

robin-near
Copy link
Contributor

@robin-near robin-near commented Mar 28, 2024

at this point I don't know what I'm doing anymore.

It has no queues. It currently achieves this:

WORKLOAD                 STRATEGY                                 BURNT GAS    TRANSACTIONS FINISHED MEDIAN TX DELAY   MAX QUEUE LEN  MAX QUEUE SIZE  MAX QUEUE PGAS
Balanced                 Soft backpressure                        4049 PGas                    38200             308             417        247.4 KB              70
Increasing Size          Soft backpressure                        4024 PGas                    37971             204             657        256.0 MB              45
Extreme Increasing Size  Soft backpressure                        3876 PGas                    36055             218            1450          1.4 GB              36
Shard War                Soft backpressure                        3892 PGas                    31353             256           12896        603.6 MB              81
All To One               Soft backpressure                        1046 PGas                     9894             380             272        251.6 KB              70
Indirect All To One      Soft backpressure                        1157 PGas                     9782             327            1122        912.4 KB             246
Linear Imbalance         Soft backpressure                        3905 PGas                   318152              84            1731        672.8 KB              63
Big Linear Imbalance     Soft backpressure                        3870 PGas                   533304             214            2931          4.2 GB              10

@robin-near robin-near requested a review from a team as a code owner March 28, 2024 00:39
Copy link

codecov bot commented Mar 28, 2024

Codecov Report

Attention: Patch coverage is 13.07190% with 133 lines in your changes are missing coverage. Please review.

Project coverage is 71.47%. Comparing base (e8635c6) to head (b06b188).
Report is 1 commits behind head on master.

Files Patch % Lines
...congestion-model/src/strategy/soft_backpressure.rs 13.51% 127 Missing and 1 partial ⚠️
...ools/congestion-model/src/model/chunk_execution.rs 0.00% 3 Missing ⚠️
tools/congestion-model/src/main.rs 0.00% 2 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master   #10894      +/-   ##
==========================================
- Coverage   71.50%   71.47%   -0.03%     
==========================================
  Files         759      760       +1     
  Lines      151480   151738     +258     
  Branches   151480   151738     +258     
==========================================
+ Hits       108310   108456     +146     
- Misses      38675    38792     +117     
+ Partials     4495     4490       -5     
Flag Coverage Δ
backward-compatibility 0.24% <ø> (-0.01%) ⬇️
db-migration 0.24% <ø> (-0.01%) ⬇️
genesis-check 1.43% <ø> (-0.01%) ⬇️
integration-tests 37.07% <0.00%> (-0.05%) ⬇️
linux 69.95% <13.07%> (-0.03%) ⬇️
linux-nightly 70.96% <13.07%> (-0.03%) ⬇️
macos 54.40% <13.07%> (-0.04%) ⬇️
pytests 1.66% <ø> (-0.01%) ⬇️
sanity-checks 1.44% <ø> (-0.01%) ⬇️
unittests 67.11% <13.07%> (-0.05%) ⬇️
upgradability 0.29% <ø> (-0.01%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Copy link
Contributor

@jakmeier jakmeier left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for putting this idea into code and providing the summary table! @robin-near

I can't say I understand in details how the tx limits are computed exactly, let alone the motivations behind it. 🙈 A sentence or two in plain english describing the main ideas would go a long way here. 😉

But it looks like you measure how much outstanding attached gas we have globally in the system, you attribute it to each shard that accepted the work in the first place, and then you somehow limit how much each shard is allowed to take in this round based on how much they contributed to the work already in the system. Is this a fair high-level overview?

If this is right, this seems like a very sound way to address the most fundamental problem, namely the problem of limiting total inflow to total outflow. Also, the approach to only look at which shard a transaction was originally accepted in, rather than which shard sent the receipt, is completely novel compared to strategies we evaluated so far.

The main concern I have is that it's not a very localized algorithm. We have to share a lot of information between shards. I was hoping we can find a more scalable algorithm. But it's probably not a big issue.

Also, I wonder how we could argue to find any sort of (soft or hard) limit for a single queue in the system. Our main goal for the project was to limit how much memory a validator needs for receipts. But I failed to find such an argument for this strategy. As far as I can tell, the algorithm does not even attempt to limit the length of a single delayed receipts queue, right? In theory, all globally outstanding receipts could be waiting in the same delayed receipts queue, right? I could be completely off, of course. So please help me understand this better!

Overall, I would love to merge this soon, so I can also play around with it and see how it performs. Documentation can follow later.

let mut dropped_txs = Vec::new();
while ctx.gas_burnt() < TX_GAS_LIMIT {
if let Some(tx) = ctx.incoming_transactions().pop_front() {
let sender = ctx.tx_sender(tx);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is always the own shard, hence equivalent to self.shard_id.unwrap()


#[derive(Default, Clone, Debug)]
struct ShardState {
orig_rec_txn_input_gas: HashMap<(ShardId, ShardId), GGas>,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Observation: It looks like this requires n^2 memory in the chunk header, or n^3 in the block. This is probably fine for small numbers of shards but it put a limit to how scalable it would be.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm also not very fond of the (shard x shard) approach, as beyond a few shards this explodes quickly, and limiting a high dimensional input is unlikely to be effective. For example if we limit (56, 80) because 80 is overloaded and we saw a lot of traffic coming from (56, 80), next block we may see a (61, 80). But if we limit all of (x, 80) then because we make an independent decision on x (source shard), we'd have to apply a very low limit and end up not being able to take in any transactions at all. I'm not sure what to do.

Comment on lines +139 to +140
pub fn tx_sender(&self, id: TransactionId) -> ShardId {
self.transactions[id].sender_shard
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note: The transaction sender isn't directly available on a receipt today. Only indirectly, it happens to be the same as the signer id account. But we almost broke that when implementing meta transactions.
I guess we can add it to the protocol specification that we always need this information available. But I just want it to be clear that adding this method to the model and using it in a strategy has this consequence is a deviation of today's protocol specification.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, I see, yeah... this data is on a different shard. That does seem like a non-trivial problem.

.map(|receipt| receipt.transaction_id())
.collect::<Vec<_>>();
for tx in orig_tx_ids {
let tx_sender = ctx.tx_sender(tx);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, interesting, you are reading the gas from the original transaction sender shard, not based on which shard forwarded this receipt to us. That's a new idea. But it makes a lot of sense, we probably want to immediately apply backpressure to the source, not the intermediaries. I think it could also be useful in other strategies.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I originally was thinking of "which shard forwarded this receipt to us" but gravitated towards going all the way to the source, as that makes more sense in this strategy where the only thing we can do is limit the source. I agree that it can be useful in other strategies as well. One caveat is that it's not very easy to define the exact behavior we want here. What I've implemented here is an approximation; the more accurate technique would be to start from each txn and look at the aggregate computation it induces in each shard, but that requires a lot more bookkeeping, and since txns take different lengths in time, it's not clear how to define a recent window to use for estimation.


let incoming_receipt_gas: GGas =
ctx.incoming_receipts().iter().map(|receipt| receipt.attached_gas).sum();
let prev_gas_burnt = ctx.gas_burnt();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
let prev_gas_burnt = ctx.gas_burnt();
let tx_gas_burnt = ctx.gas_burnt();

This is the gas burnt for transactions in this chunk, right? Not something from the previous chunk.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah yes I meant tx_gas_burnt.

@robin-near robin-near closed this May 1, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants