feat(stateless_validation): Limit size of outgoing receipts to keep size of source_receipt_proofs under control #11492

jancionear · 2024-06-05T15:38:19Z

This is a basic fix for: #11295

The problem is that the size of source_receipt_proofs could be really large in some scenarios. If all 6 shards send a 4MB outgoing receipt to shard 0, then source_receipt_proofs for shard 0 will be of size 6 * 4MB = 24MB.
That's way too big, the network probably won't be able to distribute that in time. And as we add more shards to the network, the effect will get worse and worse.

This fix deals with the problem by allowing only one chosen shard to send large receipts to the other shard. All other shards are only allowed to send ~100kB of receipts. So instead of 6 shards sending 4MB, we end up with 5 shards sending 100kB and one shard sending 4MB, which adds up to 4.5MB, much more manageable.

The mapping of "who is able to send a lot of outgoing receipts to whom" changes with every block height:

In this example at block height 2:

shard 0 can send:
- 100kB of receipts to shard 0
- 100kB of receipts to shard 1
- 4.5MB of receipts to shard 2
shard 1 can send:
- 4.5MB of receipts to shard 0
- 100kB of receipts to shard 1
- 100kB of receipts to shard 2
shard 2 can send:
- 100kB of receipts to shard 0
- 4.5MB of receipts to shard 1
- 100kB of receipts to shard 2

At every height a receiving shard will receive large receipts from only one shard, so the size of source_receipt_proofs stays under control.
The mapping changes, so every large receipt will eventually be sent out when the mapping allows to send it to the destination.

The new limits are:

outgoing_receipts_usual_size_limit: 102_400 (100kiB)
outgoing_receipts_big_size_limit: 4_718_592 (4.5MiB - a bit larger than the 4MiB receipt limit to make sure that 4MiB receipts can get through)

Flaws

This is a basic solution which has some flaws. It limits the witness size, but it affects throughput in certain scenarios.
It can serve as a starting point for further improvements, something that we can get in before the code freeze.

Flaw 1: big receipts can block small receipts
Shard tries to send outgoing receipts in the order in which they were created. When a large receipt is at the front of the queue, the shard won't send anything until it can send out this large receipt. This means that the shard might not send out anything for a few blocks.
This could be fixed by having a separate queue for large outgoing receipts.
Flaw 2: missing chunks
When a chunk is missing, the next chunk receives receipts from two block heights. This means that it could receive two 4MB receipts. This could be fixed by disallowing sending large receipts to shard that just had missing chunks

TODO

The implementation is pretty much ready, I should probably write some tests, but for now I have to do other stuff.
Posting the PR as is for now.

Add two new runtime parameters which describe the limits on size of outgoing receipts. /// The standard size limit for outgoing receipts aimed at a single shard. /// This limit is pretty small to keep the size of source_receipt_proofs under control. pub outgoing_receipts_usual_size_limit: usize, /// Large size limit for outgoing receipts to a shard, used when it's safe /// to send a lot of receipts without making the state witness too large. pub outgoing_receipts_big_size_limit: usize,

There's going to be a size limit and a gas limit, so let's rename the function that calculates the gas limit accordingly to avoid confusion.

…oofs under control Main implementation, described in the PR

This test tried to send a large (~170kB) receipt and expected the receipt to be processed immediately. The receipt wasn't processed immediately because it's aboce the usual outgoing size limit. The easiest way to fix this is to increase the limit for this particular test.

This test kept failing on the check: ``` assert!(congestion_info.receipt_bytes() > next_congestion_info.receipt_bytes()); ``` It seems flaky, sometimes it passes and sometimes it doesn't. I think it's related to the recent changes (tx and receipt size limits). Let's change the way in which we test that the chain is doing progress to measuring burned gas on the congested shard, this is more reliable and makes the test pass.

jancionear · 2024-06-05T17:29:20Z

test_in_memory_trie_node_consistency is failing in the CI, but I don't think it is because of this PR, it also fails on master: #11496

codecov · 2024-06-05T17:37:53Z

Codecov Report

Attention: Patch coverage is 86.60714% with 15 lines in your changes missing coverage. Please review.

Project coverage is 71.40%. Comparing base (337a81e) to head (c0ee219).
Report is 2 commits behind head on master.

Files	Patch %	Lines
core/primitives/src/errors.rs	0.00%	5 Missing ⚠️
runtime/runtime/src/verifier.rs	55.55%	4 Missing ⚠️
core/parameters/src/parameter_table.rs	33.33%	0 Missing and 2 partials ⚠️
core/parameters/src/view.rs	50.00%	2 Missing ⚠️
core/primitives/src/types.rs	85.71%	0 Missing and 1 partial ⚠️
runtime/runtime/src/congestion_control.rs	95.23%	0 Missing and 1 partial ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##           master   #11492      +/-   ##
==========================================
- Coverage   71.42%   71.40%   -0.02%     
==========================================
  Files         788      788              
  Lines      159889   159926      +37     
  Branches   159889   159926      +37     
==========================================
- Hits       114203   114202       -1     
- Misses      40719    40754      +35     
- Partials     4967     4970       +3

Flag	Coverage Δ
backward-compatibility	`0.23% <0.00%> (-0.01%)`	⬇️
db-migration	`0.23% <0.00%> (-0.01%)`	⬇️
genesis-check	`1.36% <0.00%> (-0.01%)`	⬇️
integration-tests	`37.65% <53.57%> (-0.03%)`	⬇️
linux	`68.83% <42.85%> (+0.01%)`	⬆️
linux-nightly	`70.93% <86.60%> (-0.02%)`	⬇️
macos	`52.44% <21.64%> (-0.01%)`	⬇️
pytests	`1.57% <0.00%> (-0.01%)`	⬇️
sanity-checks	`1.37% <0.00%> (-0.01%)`	⬇️
unittests	`66.11% <82.82%> (-0.02%)`	⬇️
upgradability	`0.28% <0.00%> (-0.01%)`	⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

wacban

LGTM but please see my comment about aligning special and allowed shards.

wacban · 2024-06-06T09:44:26Z

core/parameters/src/parameter.rs

+    OutgoingReceiptsUsualSizeLimit,
+    OutgoingReceiptsBigSizeLimit,


nit: Can you add the comments here as well?

wacban · 2024-06-06T09:48:11Z

runtime/runtime/src/congestion_control.rs

+/// Limits for outgoing receipts to a shard.
+/// Receipts are sent out until the limit is hit, after that they're buffered.
+pub(crate) struct OutgoingLimit {
+    pub gas: Gas,
+    pub size: usize,
+}


wacban · 2024-06-06T09:53:19Z

runtime/runtime/src/congestion_control.rs

+                    let mut gas_limit =
+                        other_congestion_control.outgoing_gas_limit(apply_state.shard_id);
+                    if shard_id == apply_state.shard_id {
+                        // No gas limits on receipts that stay on the same shard. Backpressure
+                        // wouldn't help, the receipt takes the same memory if buffered or
+                        // in the delayed receipts queue.
+                        gas_limit = Gas::MAX;
+                    }
+
+                    let size_limit = if special_shard == Some(shard_id) {
+                        apply_state.config.witness_config.outgoing_receipts_big_size_limit
+                    } else {
+                        apply_state.config.witness_config.outgoing_receipts_usual_size_limit
+                    };


Should we align the allowed shard and special shard to be the same?

It reduces complexity slightly

It would relieve me from thinking whether we can ever send a byteful and gasful receipt unless the stars align.

This would require us to make some change to the allowed shard - currently it's only set under congestion we would need to always set it. This is a deviation from the congestion control NEP but I think it's fine here.

Yeah I think I second the suggestion to make them the same

How exactly should that work?
AFAIU when all shards are fully congested one shard is allowed to process receipts to avoid deadlocks.
But I can't set the special shard to this allowed shard for every sender shard, because then multiple senders would be allowed to send large receipts to this one shard, and source_receipt_proofs for the allowed shards could explode.
I don't fully understand how it works in congestion control, so maybe I'm missing something.

In CC, every shard pucks an allowed shard from the set of other shards. This shard is allowed to send to the congested shard.

AFAIU when all shards are fully congested one shard is allowed to process receipts to avoid deadlocks.

No, allowed shard kicks in a soon as a single shard is fully congested. It guarantees ALL shards always have at least one shard allowed sending to them. Processing receipts is never stopped or even limited in our congestion control design, only the forwarding is limited.

But I can't set the special shard to this allowed shard for every sender shard, because then multiple senders would be allowed to send large receipts to this one shard

No, there is only one allowed shard per receiving shard per block height.

Does that help and clear things up?

Thanks, I see that I misunderstood how that works. Looks like a very similar concept. I'll read into the code and try to merge them.

I think we'll need to set the allowed shard always, not only when the shard is congested. Otherwise you should be pretty much able to use it as is.

One small difference is that special_shard can be equal to own_shard, but allowed_shard is supposed to be different from own_shard. I guess I'll just allow allowed_shard to be the same as own_shard. It might lower the throughput in super-high congestion scenarios, but it's at most 1/num_shards of a difference.

runtime/runtime/src/congestion_control.rs

wacban · 2024-06-06T10:00:45Z

runtime/runtime/src/congestion_control.rs

@@ -462,3 +488,32 @@ fn overflow_storage_err() -> StorageError {
 fn safe_add_gas_to_u128(a: u128, b: Gas) -> Result<u128, IntegerOverflowError> {
    a.checked_add(b as u128).ok_or(IntegerOverflowError {})
 }
+
+/// Each shard receieves receipts from all other shards, but it could happen


typo receives

wacban · 2024-06-06T10:03:15Z

runtime/runtime/src/congestion_control.rs

-        if *forward_limit > gas_to_forward {
+        let size_to_forward = receipt_size(&receipt)?;
+
+        if forward_limit.gas > gas_to_forward && forward_limit.size > size_to_forward {


sanity check - We should either make really sure that the special forward limit is larger than maximum potential receipt size or make it a soft limit in the special case. I think the former is already the case but just making sure in case the receipt size is a soft limit or some other subtlety.

outgoing_receipts_big_size_limit is 4.5MiB, which was chosen to be a bit above 4MiB to allow 4MiB receipts to pass. It's a hard limit, so it has to be a bit bigger.

But now I'm starting to have doubts, do we have a size limit for receipts? I thought that there'd be something analogous to max_transaction_size, but I don't see it 0_o. Maybe we'll have to add max_receipt_size limit as well.

Hehe I was aiming for doubts :)
The issue is that just one receipt over the limit would block the queue forever. Making it a soft limit for the special shard should solve this problem but there may be better alternatives. Are large receipts a more general problem that we need to solve or is it only an issue in cross shard receipts?

I am not entirely sure but I believe a single receipts can be much larger than 4MB today

To quote past Jakob, writing about potentially introducing a receipt size limit:

~5MB for bandwidth per receipts seems reasonable to me. A receipt can produce lots of network bandwidth by making a cross contract call. Today, it's possible to make many calls at once and attach an argument of up to 4MB to each. We could say the total limit per receipt is 5MB, then users can still make a single cross contract call with a 4MB argument, or two calls with 2MB each. But they couldn't do two calls with 4MB each. I don't think that's a problem for anyone.

source: #9378 (comment)

Based on that, a receipt could be around 100MB if it contains 25 function calls with 4 MB payload each. The gas cost for creating all those actions would only be below 250 Tgas, so plenty of room for the initial call and promise overhead costs.

Based on that, a receipt could be around 100MB if it contains 25 function calls with 4 MB payload each. The gas cost for creating all those actions would only be below 250 Tgas, so plenty of room for the initial call and promise overhead costs.

Damn, that's not good. I guess we'll have to add a receipt size limit then. I hope it's safe to do it, rejecting receipts sounds dangerous 0_O

Oh my I sparked more doubts than I was hoping for. If the receipts can be so large then indeed even a soft limit would not be sufficient. @jancionear Do you want to share the news on zulip or at the sync meeting later?

https://near.zulipchat.com/#narrow/stream/295306-contract-runtime/topic/How.20large.20can.20internal.20receipts.20get.3F/near/443077081

wacban · 2024-06-06T10:06:36Z

runtime/runtime/src/congestion_control.rs

+    if num_shards == 0 {
+        return None;
+    }


This seems off. Can you clarify under what circumstances this may happen? Also if we can avoid it this would be great but I can't tell without context.

Yeah I'll remove it, it's a remnant of the previous version where special shard couldn't be the same as the source shard 👍

On second thought, num_shards comes from ApplyState::congestion_info, and I'm not 100% sure that it'll always have nonzero length. It's safer to assume that it can be zero, and just not choose a special shard then.

let num_shards = apply_state.congestion_info.len(); let special_shard: Option<ShardId> = choose_special_shard(apply_state.shard_id, apply_state.block_height, num_shards);

jakmeier

Hm, to be frank, this seems a bit rushed to get in before the code freeze tomorrow. I have several concerns that I would like to have more time to get a better understanding.

Concern 1: Adding a receipt size limit

We don't even have a receipt limit today. Here we implicitly add one and we add even more restrictions on top.

Even just adding a simple receipt size limit is something I thought we would need approval by the working group in a formal NEP, since it restricts what users can do on Near Protocol. It might seem like no problem to us but since it's not a purely technical decisions, I don't think it's ours to make. At least as far as I understand the protocol development process.

Concern 2: Impact on throughput

In the congestion control limits, we put a lot of effort in double-checking we don't throttle throughput unnecessarily, doing simulations with traffic representing important contracts and also extreme cases. Is there an investigation or analysis to look at for this limit?

Like, how many times in the last month was a shard sending more than 100kB of receipts to another shard? This seems like important information to decide if we can tolerate blocking an entire shard on a single large receipt until it is chosen to be special.

Another question that comes to mind: Will aurora be affects by this? (I think their submit calls can have decently large receipt sizes.)

Concern 3: Interaction with congestion control

For congestion control, we fine-tuned 10 parameters that are now in a sort of balance. Changing them individually showed major differences in how it performs.

With this new factor of how buffers are filled, we might need to redo our simulations on congestion control. Buffered receipts were a very rare occasion in our simulations that only happens when the receiving shard has a congestion problem. Thus we fine-tuned for a small outgoing buffer and we apply pretty aggressive backpressure on the signal of a full outgoing buffer. This includes the rejections of ALL new incoming transactions to that shard as soon as we have >= 500 Tgas of receipt in the buffer.

I suspect your change would breach the 500 Tgas buffer almost everytime we have a slightly large receipt. In that case, we would reject all incoming transactions and even clean out all the transaction in the pool, which currently results in an unfortunate timeout on the user side without proper handling. I think we would have to incrase the outgoing buffer limit substantially, and then fine-tune again all other parameters to find a new balance.

Sorry for all the critique, I know this is only trying to solve an important issue. And sorry if these issue have already been discussed elsewhere. I don't want to block you from progress, just letting you know of what I think could go wrong if we merge the PR as it is right now.

jakmeier · 2024-06-06T14:23:13Z

integration-tests/src/node/runtime_node.rs

+    /// Same as `RuntimeNode::new`, but allows to modify the RuntimeConfig.
+    pub fn new_with_modified_config(
+        account_id: &AccountId,
+        modify_config: impl Fn(&mut RuntimeConfig),


Optioanl/nit: FnOnce would make it a bit more general, which could be useful if the closure needs to capture objects like CongestionControlConfig and overwrite it on the field inside.

jakmeier · 2024-06-06T14:27:49Z

runtime/runtime/src/congestion_control.rs

+/// Receipts are sent out until the limit is hit, after that they're buffered.
+pub(crate) struct OutgoingLimit {
+    pub gas: Gas,
+    pub size: usize,


Optional: Could we make it u32 or u64 instead of usize? I don't really like worrying about potential architecture difference. Like, what if we want to put the entire runtime inside a Wasm runtime (it has been discussed before) which has usize as 32 bits instead of 64? Suddenly we might have subtle differences. In my opinion, it's better to avoid usize in structs like this.

jakmeier · 2024-06-06T14:29:30Z

runtime/runtime/src/congestion_control.rs

+            // One of the shards is designated as "special". We are allowed to send more receipts to this one "special" shard.
+            let num_shards = apply_state.congestion_info.len();
+            let special_shard: Option<ShardId> =
+                choose_special_shard(apply_state.shard_id, apply_state.block_height, num_shards);


Optional: Can we find a better name than "special"? :D Something that indicated that the chosen shard has higher bytes throughput.

Maybe "high_flow"? Because there'll be a lot of receipts flowing there.
I don't like thinking too much about names because it's hard to find a really good name and it feels a bit like bikeshedding.
Do you have any suggestions?

Agreed, naming is hard and can quickly derail a discussion into bikeshedding. And yet, in my opinion, bad names are one of the highest contributors to unreadable and unmaintainable code. In this case, it's not quite as bad so I suggested it as an optional change to reconsider.

I like "high_flow" better than "special". I think it can also be a good name if we combine "allowed" from congestion control and this shard into the same. But your opinion is as valid as mine.

And again, renaming this is optional and not something to block on.

I think it can also be a good name if we combine "allowed" from congestion control and this shard into the same.

Yeah, merging with allowed should fix the problem :)

jakmeier · 2024-06-06T14:30:15Z

runtime/runtime/src/congestion_control.rs

+                    let mut gas_limit =
+                        other_congestion_control.outgoing_gas_limit(apply_state.shard_id);
+                    if shard_id == apply_state.shard_id {
+                        // No gas limits on receipts that stay on the same shard. Backpressure
+                        // wouldn't help, the receipt takes the same memory if buffered or
+                        // in the delayed receipts queue.
+                        gas_limit = Gas::MAX;
+                    }
+
+                    let size_limit = if special_shard == Some(shard_id) {
+                        apply_state.config.witness_config.outgoing_receipts_big_size_limit
+                    } else {
+                        apply_state.config.witness_config.outgoing_receipts_usual_size_limit
+                    };


Yeah I think I second the suggestion to make them the same

jancionear · 2024-06-06T15:24:21Z

Hm, to be frank, this seems a bit rushed to get in before the code freeze tomorrow. I have several concerns that I would like to have more time to get a better understanding.

Yeah it isn't as thought out as it should be :/

I wanted to quickly implement some sort of receipt limit before the code freeeze because it feels wrong to release SV to mainnet without limiting the receipts. Having a limit that may impact throughput is better than having a dead network if the witness gets too large. This approach seemed simple and maybe good enough.

But now with all the concerns I don't have as much confidence in this solution, I'll think about it more. Maybe we should postpone the freeze :/ or treat this limit as a bugfix.

Thanks for the feedback!

jancionear · 2024-06-06T17:28:09Z

Well despite the flaws this solution has, the decision from the last meeting is to merge it before the code freeze, to at least have some sort of limit.
Improvements to deal with the flaws can be implemented in follow-up PRs, and added to the SV launch as needed.

I'll make an effort to make this PR mergable by tomorrow.

jancionear · 2024-06-07T14:19:27Z

v2:

Fix nits/typos from review
Use allowed_shard to decide who can send more receipts, get rid of special_shard
- Always set allowed_shard, even when not congested
- allowed_shard can now be the same as own_shard
- Disable validation checks for allowed_shards, @wacban said that there's a bug in the preexisting code, and he's working on a fix
Add a 4MiB receipt size limit. I scanned the last year of blockchain history and all receipts were smaller than 4MB (4_000_000 bytes), so it shouldn't break anything.
Don't add a new protocol version, keep all stateless validation changes in protocol 87

@wacban @jakmeier please take a look

I'll try to test this change somehow.

jancionear · 2024-06-07T14:25:54Z

In the past year (31,536,000 blocks) there were 29,607 transactions or receipts larger than 100 kB, that's ~1 per 1000 blocks. I think the 100kB limit shouldn't cause any problems under normal mainnet traffic.

wacban

LGTM

wacban · 2024-06-07T14:25:37Z

chain/client/src/tests/process_blocks.rs

+// TODO(congestion_control) validate allowed shard
+// #[test]
+// fn test_bad_congestion_info_corrupt_allowed_shard() {
+//    test_bad_congestion_info_impl(BadCongestionInfoMode::CorruptAllowedShard);
+// }


nit: instead of commenting it out it would be better to mark this test ignore

Ohh that's nice, 👍

wacban · 2024-06-07T14:27:02Z

core/parameters/res/runtime_configs/87.yaml

 max_transaction_size: {old: 4_194_304, new: 1_572_864}
+max_receipt_size: {old: 999_999_999_999_999, new: 4_194_304}


Would it make sense to use the same value for transaction and receipt limits? It seems a bit more intuitive. Do you have any data on whether there were any receipts above that limit in the history?

It might be possible to lower the receipt limit further, but it would break some receipts, there are some 3.9MB ones. It would require some more thinking about, for now 4MiB seems like a safe and good enough choice, we can lower it in a follow-up PR.
https://near.zulipchat.com/#narrow/stream/295306-contract-runtime/topic/How.20large.20can.20internal.20receipts.20get.3F/near/443309601

wacban · 2024-06-07T14:29:13Z

core/parameters/src/parameter.rs

+    /// The standard size limit for outgoing receipts aimed at a single shard.
+    /// This limit is pretty small to keep the size of source_receipt_proofs under control.
+    OutgoingReceiptsUsualSizeLimit,


Can you clarify that this is a limit on the total sum of receipts and not on every individual receipt? Same below.

wacban · 2024-06-07T14:31:47Z

core/primitives/src/congestion_info.rs

+    pub fn outgoing_size_limit(
+        &self,
+        sender_shard: ShardId,
+        runtime_config: &RuntimeConfig,


Makes me wonder if we should move the new parameters to the congestion control config.

Ok, moved to `CongestionControlConfig"

wacban · 2024-06-07T14:36:42Z

core/primitives/src/congestion_info.rs

                .get(index as usize)
                .expect("`checked_rem` should have ensured array access is in bound");
        }
-        // checked_rem failed, hence other_shards.len() is 0
+        // checked_rem failed, hence all_shards.len() is 0


The all_shards.len() == 0 condition is messed up but it is the case right now since we get it from the congestion info. I will refactor that in a follow up.

wacban · 2024-06-07T14:39:04Z

core/primitives/src/congestion_info.rs

    ) -> ShardId {
-        if congestion_level < 1.0 {


Note to self - this is fine and shouldn't affect congestion control since in the method to get outgoing gas limit we also check if congestion level is equal to 1.0.

wacban · 2024-06-07T14:40:32Z

core/primitives/src/types.rs

@@ -922,6 +922,15 @@ pub mod chunk_extra {
                Self::V3(v3) => v3.congestion_info.into(),
            }
        }
+
+        // Dirty workaround for broken allowed shard validation
+        pub fn with_zeroed_allowed_shard(&self) -> ChunkExtra {


Can you add a todo to get rid of it? I'll do it later.

wacban · 2024-06-07T14:43:33Z

integration-tests/src/tests/client/features/delegate_action.rs

+    let node = RuntimeNode::new_with_modified_config(&relayer, |runtime_config| {
+        // Increase the outgoing receipts limit to allow the large receipt to be processed immediately.
+        // Without this change the receipt would be processed somewhere in the next few blocks.
+        runtime_config.witness_config.outgoing_receipts_usual_size_limit = 200_000;
+    });


nit: Again it feels like those should be part of the congestion control config and the RuntimeConfigStore::test_congestion_control_disabled() should maybe included that override.

wacban · 2024-06-07T14:45:34Z

runtime/runtime/src/congestion_control.rs

+                    let mut gas_limit =
+                        other_congestion_control.outgoing_gas_limit(apply_state.shard_id);
+                    if shard_id == apply_state.shard_id {
+                        // No gas limits on receipts that stay on the same shard. Backpressure
+                        // wouldn't help, the receipt takes the same memory if buffered or
+                        // in the delayed receipts queue.
+                        gas_limit = Gas::MAX;
+                    }


nit:

let gas_limit = if if shard_id != apply_state.shard_id { other_congestion_control.outgoing_gas_limit(apply_state.shard_id) } else { // No gas limits on receipts that stay on the same shard. Backpressure // wouldn't help, the receipt takes the same memory if buffered or // in the delayed receipts queue. Gas::MAX };

wacban · 2024-06-07T14:50:21Z

runtime/runtime/src/verifier.rs

+    let receipt_size: u64 =
+        borsh::to_vec(receipt).unwrap().len().try_into().expect("Can't convert usize to u64");
+    if receipt_size > limit_config.max_receipt_size {
+        return Err(ReceiptValidationError::ReceiptSizeExceeded {
+            size: receipt_size,
+            limit: limit_config.max_receipt_size,
+        });
+    }
+


Seems fine to me but I'll leave that to @jakmeier to judge if this is the right way to do it in runtime.

jakmeier

Sorry, I wasn't able to review again yesterday. But I reviewed it now and couldn't find anything worth mentioning.

In the past year (31,536,000 blocks) there were 29,607 transactions or receipts larger than 100 kB, that's ~1 per 1000 blocks. I think the 100kB limit shouldn't cause any problems under normal mainnet traffic.

Thanks for collecting this data! Sounds like with the traffic of the last year, we would experience a hick-up around once every 20min on average.
A couple of extra seconds latency for a few dozen transactions seems tolerable. But probably we need something to avoid the transaction pool clearing effect. I will bring it up in the congestion control meeting to see what we can do about it.

* Refactored the Block::shards_congestion_info to block_congestion_info -> BlockCongestionInfo * Changed the HashMap to BTreeMap because hash map adds randomness. * Added back the strict allowed shard check that was removed in #11492 * Removed some temporary hacks from the same PR. This restores the deterministic validation of allowed shard. This is the minimum that we need for the release. It is not in line with the Congestion Control NEP that leaves the chunk producer's freedom to choose the allowed shard as they see fit. I am happy to keep it as it is in this PR and implement the chunk producer's "free will" in subsequent releases if we find it beneficial. Best reviewed commit by commit. The first commit is titled refactor but it actually has an important change - HashMap -> BTreeMap. Sorry about that.

jancionear added the A-stateless-validation Area: stateless validation label Jun 5, 2024

jancionear requested review from wacban and jakmeier June 5, 2024 15:38

jancionear requested a review from a team as a code owner June 5, 2024 15:38

jancionear added 5 commits June 5, 2024 16:07

Rename CongestionControl::outgoing_limit to outgoing_gas_limit

747078d

There's going to be a size limit and a gas limit, so let's rename the function that calculates the gas limit accordingly to avoid confusion.

Limit size of outgoing receipts to keep the size of source_receipt_pr…

6287f4e

…oofs under control Main implementation, described in the PR

jancionear force-pushed the outsized branch from ea35ef9 to 3cbb76a Compare June 5, 2024 16:08

fix json.snap file using cargo insta review

3ce4357

wacban reviewed Jun 6, 2024

View reviewed changes

jakmeier reviewed Jun 6, 2024

View reviewed changes

jancionear added 9 commits June 7, 2024 11:26

Merge branch 'master' into outsized

d688614

Add comments on the Parameter enum

1ed28b1

fix receieves typo

dcb9ccf

Fn -> FnOnce

f749388

Use u64 instead of usize

c17e0f0

Use allowed_shard instead of special_shard

cc6ff44

Disable allowed_shard validation, it's buggy and Wac is working on a fix

c17d2cb

Add receipt size limit

8f8a73b

Move the changes to protocol version 87, don't add a new one

33ab33c

wacban approved these changes Jun 7, 2024

View reviewed changes

Use #[ignore] instead of commenting out test

72e5724

jancionear added 5 commits June 7, 2024 15:08

Add comment saying that the limit is aggregate, not per-receipt

cc019c8

Add one more TODO about allowed_shard validation

1736fc6

Refactor setting gas limit to avoid mutability

409e40e

Update rpc_errors_schema.json to fix CI

230efcf

Move outgoing size limits to CongestionControlConfig

c0ee219

jancionear added this pull request to the merge queue Jun 7, 2024

Merged via the queue into near:master with commit 43f5a90 Jun 7, 2024
29 checks passed

jancionear deleted the outsized branch June 7, 2024 17:26

jakmeier reviewed Jun 8, 2024

View reviewed changes

wacban mentioned this pull request Jun 9, 2024

feat(congestion_control) - validate allowed shard #11526

Merged

jancionear mentioned this pull request Jun 14, 2024

Limit size of source_receipt_proofs inside ChunkStateWitness #11295

Open

jancionear mentioned this pull request Aug 1, 2024

document the new ReceiptValidationError::ReceiptSizeExceeded near/docs#2194

Merged

		OutgoingReceiptsUsualSizeLimit,
		OutgoingReceiptsBigSizeLimit,

		max_transaction_size: {old: 4_194_304, new: 1_572_864}
		max_receipt_size: {old: 999_999_999_999_999, new: 4_194_304}

feat(stateless_validation): Limit size of outgoing receipts to keep size of source_receipt_proofs under control #11492

feat(stateless_validation): Limit size of outgoing receipts to keep size of source_receipt_proofs under control #11492

Conversation

jancionear commented Jun 5, 2024 • edited Loading

Flaws

TODO

jancionear commented Jun 5, 2024

codecov bot commented Jun 5, 2024 • edited Loading

Codecov Report

wacban left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jancionear Jun 7, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jakmeier left a comment

Choose a reason for hiding this comment

Concern 1: Adding a receipt size limit

Concern 2: Impact on throughput

Concern 3: Interaction with congestion control

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jancionear Jun 7, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jancionear commented Jun 6, 2024 • edited Loading

jancionear commented Jun 6, 2024

jancionear commented Jun 7, 2024

jancionear commented Jun 7, 2024

wacban left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jakmeier left a comment

Choose a reason for hiding this comment

jancionear commented Jun 5, 2024 •

edited

Loading

codecov bot commented Jun 5, 2024 •

edited

Loading

jancionear Jun 7, 2024 •

edited

Loading

jancionear Jun 7, 2024 •

edited

Loading

jancionear commented Jun 6, 2024 •

edited

Loading