-
Notifications
You must be signed in to change notification settings - Fork 622
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Congestion model] Implement SoftBackpressure strategy. #10894
Conversation
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #10894 +/- ##
==========================================
- Coverage 71.50% 71.47% -0.03%
==========================================
Files 759 760 +1
Lines 151480 151738 +258
Branches 151480 151738 +258
==========================================
+ Hits 108310 108456 +146
- Misses 38675 38792 +117
+ Partials 4495 4490 -5
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for putting this idea into code and providing the summary table! @robin-near
I can't say I understand in details how the tx limits are computed exactly, let alone the motivations behind it. 🙈 A sentence or two in plain english describing the main ideas would go a long way here. 😉
But it looks like you measure how much outstanding attached gas we have globally in the system, you attribute it to each shard that accepted the work in the first place, and then you somehow limit how much each shard is allowed to take in this round based on how much they contributed to the work already in the system. Is this a fair high-level overview?
If this is right, this seems like a very sound way to address the most fundamental problem, namely the problem of limiting total inflow to total outflow. Also, the approach to only look at which shard a transaction was originally accepted in, rather than which shard sent the receipt, is completely novel compared to strategies we evaluated so far.
The main concern I have is that it's not a very localized algorithm. We have to share a lot of information between shards. I was hoping we can find a more scalable algorithm. But it's probably not a big issue.
Also, I wonder how we could argue to find any sort of (soft or hard) limit for a single queue in the system. Our main goal for the project was to limit how much memory a validator needs for receipts. But I failed to find such an argument for this strategy. As far as I can tell, the algorithm does not even attempt to limit the length of a single delayed receipts queue, right? In theory, all globally outstanding receipts could be waiting in the same delayed receipts queue, right? I could be completely off, of course. So please help me understand this better!
Overall, I would love to merge this soon, so I can also play around with it and see how it performs. Documentation can follow later.
let mut dropped_txs = Vec::new(); | ||
while ctx.gas_burnt() < TX_GAS_LIMIT { | ||
if let Some(tx) = ctx.incoming_transactions().pop_front() { | ||
let sender = ctx.tx_sender(tx); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is always the own shard, hence equivalent to self.shard_id.unwrap()
|
||
#[derive(Default, Clone, Debug)] | ||
struct ShardState { | ||
orig_rec_txn_input_gas: HashMap<(ShardId, ShardId), GGas>, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Observation: It looks like this requires n^2 memory in the chunk header, or n^3 in the block. This is probably fine for small numbers of shards but it put a limit to how scalable it would be.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm also not very fond of the (shard x shard) approach, as beyond a few shards this explodes quickly, and limiting a high dimensional input is unlikely to be effective. For example if we limit (56, 80) because 80 is overloaded and we saw a lot of traffic coming from (56, 80), next block we may see a (61, 80). But if we limit all of (x, 80) then because we make an independent decision on x (source shard), we'd have to apply a very low limit and end up not being able to take in any transactions at all. I'm not sure what to do.
pub fn tx_sender(&self, id: TransactionId) -> ShardId { | ||
self.transactions[id].sender_shard |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note: The transaction sender isn't directly available on a receipt today. Only indirectly, it happens to be the same as the signer id account. But we almost broke that when implementing meta transactions.
I guess we can add it to the protocol specification that we always need this information available. But I just want it to be clear that adding this method to the model and using it in a strategy has this consequence is a deviation of today's protocol specification.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, I see, yeah... this data is on a different shard. That does seem like a non-trivial problem.
.map(|receipt| receipt.transaction_id()) | ||
.collect::<Vec<_>>(); | ||
for tx in orig_tx_ids { | ||
let tx_sender = ctx.tx_sender(tx); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, interesting, you are reading the gas from the original transaction sender shard, not based on which shard forwarded this receipt to us. That's a new idea. But it makes a lot of sense, we probably want to immediately apply backpressure to the source, not the intermediaries. I think it could also be useful in other strategies.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I originally was thinking of "which shard forwarded this receipt to us" but gravitated towards going all the way to the source, as that makes more sense in this strategy where the only thing we can do is limit the source. I agree that it can be useful in other strategies as well. One caveat is that it's not very easy to define the exact behavior we want here. What I've implemented here is an approximation; the more accurate technique would be to start from each txn and look at the aggregate computation it induces in each shard, but that requires a lot more bookkeeping, and since txns take different lengths in time, it's not clear how to define a recent window to use for estimation.
|
||
let incoming_receipt_gas: GGas = | ||
ctx.incoming_receipts().iter().map(|receipt| receipt.attached_gas).sum(); | ||
let prev_gas_burnt = ctx.gas_burnt(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
let prev_gas_burnt = ctx.gas_burnt(); | |
let tx_gas_burnt = ctx.gas_burnt(); |
This is the gas burnt for transactions in this chunk, right? Not something from the previous chunk.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah yes I meant tx_gas_burnt.
at this point I don't know what I'm doing anymore.
It has no queues. It currently achieves this: