[Congestion model] Implement SoftBackpressure strategy. #10894

robin-near · 2024-03-28T00:39:57Z

at this point I don't know what I'm doing anymore.

It has no queues. It currently achieves this:

WORKLOAD                 STRATEGY                                 BURNT GAS    TRANSACTIONS FINISHED MEDIAN TX DELAY   MAX QUEUE LEN  MAX QUEUE SIZE  MAX QUEUE PGAS
Balanced                 Soft backpressure                        4049 PGas                    38200             308             417        247.4 KB              70
Increasing Size          Soft backpressure                        4024 PGas                    37971             204             657        256.0 MB              45
Extreme Increasing Size  Soft backpressure                        3876 PGas                    36055             218            1450          1.4 GB              36
Shard War                Soft backpressure                        3892 PGas                    31353             256           12896        603.6 MB              81
All To One               Soft backpressure                        1046 PGas                     9894             380             272        251.6 KB              70
Indirect All To One      Soft backpressure                        1157 PGas                     9782             327            1122        912.4 KB             246
Linear Imbalance         Soft backpressure                        3905 PGas                   318152              84            1731        672.8 KB              63
Big Linear Imbalance     Soft backpressure                        3870 PGas                   533304             214            2931          4.2 GB              10

codecov · 2024-03-28T01:03:30Z

Codecov Report

Attention: Patch coverage is 13.07190% with 133 lines in your changes are missing coverage. Please review.

Project coverage is 71.47%. Comparing base (e8635c6) to head (b06b188).
Report is 1 commits behind head on master.

Files	Patch %	Lines
...congestion-model/src/strategy/soft_backpressure.rs	13.51%	127 Missing and 1 partial ⚠️
...ools/congestion-model/src/model/chunk_execution.rs	0.00%	3 Missing ⚠️
tools/congestion-model/src/main.rs	0.00%	2 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##           master   #10894      +/-   ##
==========================================
- Coverage   71.50%   71.47%   -0.03%     
==========================================
  Files         759      760       +1     
  Lines      151480   151738     +258     
  Branches   151480   151738     +258     
==========================================
+ Hits       108310   108456     +146     
- Misses      38675    38792     +117     
+ Partials     4495     4490       -5

Flag	Coverage Δ
backward-compatibility	`0.24% <ø> (-0.01%)`	⬇️
db-migration	`0.24% <ø> (-0.01%)`	⬇️
genesis-check	`1.43% <ø> (-0.01%)`	⬇️
integration-tests	`37.07% <0.00%> (-0.05%)`	⬇️
linux	`69.95% <13.07%> (-0.03%)`	⬇️
linux-nightly	`70.96% <13.07%> (-0.03%)`	⬇️
macos	`54.40% <13.07%> (-0.04%)`	⬇️
pytests	`1.66% <ø> (-0.01%)`	⬇️
sanity-checks	`1.44% <ø> (-0.01%)`	⬇️
unittests	`67.11% <13.07%> (-0.05%)`	⬇️
upgradability	`0.29% <ø> (-0.01%)`	⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

jakmeier

Thank you for putting this idea into code and providing the summary table! @robin-near

I can't say I understand in details how the tx limits are computed exactly, let alone the motivations behind it. 🙈 A sentence or two in plain english describing the main ideas would go a long way here. 😉

But it looks like you measure how much outstanding attached gas we have globally in the system, you attribute it to each shard that accepted the work in the first place, and then you somehow limit how much each shard is allowed to take in this round based on how much they contributed to the work already in the system. Is this a fair high-level overview?

If this is right, this seems like a very sound way to address the most fundamental problem, namely the problem of limiting total inflow to total outflow. Also, the approach to only look at which shard a transaction was originally accepted in, rather than which shard sent the receipt, is completely novel compared to strategies we evaluated so far.

The main concern I have is that it's not a very localized algorithm. We have to share a lot of information between shards. I was hoping we can find a more scalable algorithm. But it's probably not a big issue.

Also, I wonder how we could argue to find any sort of (soft or hard) limit for a single queue in the system. Our main goal for the project was to limit how much memory a validator needs for receipts. But I failed to find such an argument for this strategy. As far as I can tell, the algorithm does not even attempt to limit the length of a single delayed receipts queue, right? In theory, all globally outstanding receipts could be waiting in the same delayed receipts queue, right? I could be completely off, of course. So please help me understand this better!

Overall, I would love to merge this soon, so I can also play around with it and see how it performs. Documentation can follow later.

jakmeier · 2024-03-28T14:40:32Z

tools/congestion-model/src/strategy/soft_backpressure.rs

+        let mut dropped_txs = Vec::new();
+        while ctx.gas_burnt() < TX_GAS_LIMIT {
+            if let Some(tx) = ctx.incoming_transactions().pop_front() {
+                let sender = ctx.tx_sender(tx);


this is always the own shard, hence equivalent to self.shard_id.unwrap()

jakmeier · 2024-03-28T15:09:05Z

tools/congestion-model/src/strategy/soft_backpressure.rs

+
+#[derive(Default, Clone, Debug)]
+struct ShardState {
+    orig_rec_txn_input_gas: HashMap<(ShardId, ShardId), GGas>,


Observation: It looks like this requires n^2 memory in the chunk header, or n^3 in the block. This is probably fine for small numbers of shards but it put a limit to how scalable it would be.

I'm also not very fond of the (shard x shard) approach, as beyond a few shards this explodes quickly, and limiting a high dimensional input is unlikely to be effective. For example if we limit (56, 80) because 80 is overloaded and we saw a lot of traffic coming from (56, 80), next block we may see a (61, 80). But if we limit all of (x, 80) then because we make an independent decision on x (source shard), we'd have to apply a very low limit and end up not being able to take in any transactions at all. I'm not sure what to do.

jakmeier · 2024-03-28T15:25:59Z

tools/congestion-model/src/model/chunk_execution.rs

+    pub fn tx_sender(&self, id: TransactionId) -> ShardId {
+        self.transactions[id].sender_shard


Note: The transaction sender isn't directly available on a receipt today. Only indirectly, it happens to be the same as the signer id account. But we almost broke that when implementing meta transactions.
I guess we can add it to the protocol specification that we always need this information available. But I just want it to be clear that adding this method to the model and using it in a strategy has this consequence is a deviation of today's protocol specification.

Oh, I see, yeah... this data is on a different shard. That does seem like a non-trivial problem.

jakmeier · 2024-03-28T15:30:32Z

tools/congestion-model/src/strategy/soft_backpressure.rs

+            .map(|receipt| receipt.transaction_id())
+            .collect::<Vec<_>>();
+        for tx in orig_tx_ids {
+            let tx_sender = ctx.tx_sender(tx);


Oh, interesting, you are reading the gas from the original transaction sender shard, not based on which shard forwarded this receipt to us. That's a new idea. But it makes a lot of sense, we probably want to immediately apply backpressure to the source, not the intermediaries. I think it could also be useful in other strategies.

Yeah, I originally was thinking of "which shard forwarded this receipt to us" but gravitated towards going all the way to the source, as that makes more sense in this strategy where the only thing we can do is limit the source. I agree that it can be useful in other strategies as well. One caveat is that it's not very easy to define the exact behavior we want here. What I've implemented here is an approximation; the more accurate technique would be to start from each txn and look at the aggregate computation it induces in each shard, but that requires a lot more bookkeeping, and since txns take different lengths in time, it's not clear how to define a recent window to use for estimation.

jakmeier · 2024-03-28T15:36:34Z

tools/congestion-model/src/strategy/soft_backpressure.rs

+
+        let incoming_receipt_gas: GGas =
+            ctx.incoming_receipts().iter().map(|receipt| receipt.attached_gas).sum();
+        let prev_gas_burnt = ctx.gas_burnt();


Suggested change

let prev_gas_burnt = ctx.gas_burnt();

let tx_gas_burnt = ctx.gas_burnt();

This is the gas burnt for transactions in this chunk, right? Not something from the previous chunk.

Ah yes I meant tx_gas_burnt.

[Congestion model] Implement SoftBackpressure strategy.

b06b188

robin-near requested a review from a team as a code owner March 28, 2024 00:39

robin-near requested review from Longarithm, jakmeier and wacban March 28, 2024 00:39

jakmeier approved these changes Mar 28, 2024

View reviewed changes

robin-near closed this May 1, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Congestion model] Implement SoftBackpressure strategy. #10894

[Congestion model] Implement SoftBackpressure strategy. #10894

robin-near commented Mar 28, 2024 •

edited

Loading

codecov bot commented Mar 28, 2024

jakmeier left a comment

jakmeier Mar 28, 2024

jakmeier Mar 28, 2024

robin-near Mar 28, 2024

jakmeier Mar 28, 2024

robin-near Mar 28, 2024

jakmeier Mar 28, 2024

robin-near Mar 28, 2024

jakmeier Mar 28, 2024

robin-near Mar 28, 2024

		pub fn tx_sender(&self, id: TransactionId) -> ShardId {
		self.transactions[id].sender_shard

	let prev_gas_burnt = ctx.gas_burnt();
	let tx_gas_burnt = ctx.gas_burnt();

[Congestion model] Implement SoftBackpressure strategy. #10894

[Congestion model] Implement SoftBackpressure strategy. #10894

Conversation

robin-near commented Mar 28, 2024 • edited Loading

codecov bot commented Mar 28, 2024

Codecov Report

jakmeier left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

robin-near commented Mar 28, 2024 •

edited

Loading