-
Notifications
You must be signed in to change notification settings - Fork 622
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: add a metric which counts how many chunks couldn't fit all transactions from the pool #10422
Conversation
When a chunk is produced, Client calls `prepare_transactions()`, which fetches transactions from the pool and adds them to the chunk. `prepare_transactions()` adds transactions until some limit is hit, a limit on gas, time, number of transactions, chunk size, etc. Currently `prepare_transacations()` doesn't report which limit was hit, it just returns a list of processed transactions. Let's add a way to access this information, it's useful to know what's the bottleneck of chunk production. The return type of `prepare_transactions()` is changed to a struct which contains the list of prepared transactions and the limit which was hit. This allows the caller to find out what limited the produced chunk. In the future the information about hitting a limit can be exposed in the metrics.
…dn't fit all transactions Add a metric which counts the number produced chunks where some transactions from the transaciton pool didn't fit into the chunk due to a limit. The metric has a label `limited_by`, which lets us know which limit was hit when producing the chunk. This will help to find out what's the bottleneck of chunk production in scenarios of high chain congestion.
The name of the metric is kinda awkward: I spent some time thinking of a better one, but in the end I settled on this one. The thing I like about it is that reading this metric name is enough to understand exactly what it means. Adjectives like Naming is hard. I'm open for suggestions, here are some AI-generated alternatives (IMO all of them worse):
|
Codecov ReportAttention:
Additional details and impacted files@@ Coverage Diff @@
## master #10422 +/- ##
==========================================
+ Coverage 71.91% 72.02% +0.11%
==========================================
Files 720 720
Lines 145051 144902 -149
Branches 145051 144902 -149
==========================================
+ Hits 104308 104362 +54
+ Misses 35954 35758 -196
+ Partials 4789 4782 -7
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks!
There is some unrelated file attached to this PR at the end please remove it
cc @staffik or @nagisa Do you happen to know where that wallet_contract.was
file is coming from?
hint for grafana - in case you want to use that dashboard - you can set nice labels by setting the label to custom and formatting only fields that you are interested in e.g. {{node_id}}
@@ -311,6 +311,22 @@ pub struct ApplyChunkShardContext<'a> { | |||
pub is_first_block_with_chunk_of_version: bool, | |||
} | |||
|
|||
#[derive(Debug, Clone, PartialEq, Eq)] | |||
pub struct PreparedTransactions { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you add a small comment please?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: typically we use tripple / for struct and method comments
/// Contains transactions ...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, fixed 👍
} | ||
|
||
#[derive(Debug, Clone, Copy, PartialEq, Eq, strum::AsRefStr)] | ||
pub enum PrepareTransactionsLimit { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ditto
chain/client/src/client.rs
Outdated
let Self { chain, sharded_tx_pool, epoch_manager, runtime_adapter: runtime, .. } = self; | ||
|
||
let shard_id = shard_uid.shard_id as ShardId; | ||
let next_epoch_id = epoch_manager.get_epoch_id_from_prev_block(prev_block_header.hash())?; | ||
let protocol_version = epoch_manager.get_epoch_protocol_version(&next_epoch_id)?; | ||
|
||
let transactions = if let Some(mut iter) = sharded_tx_pool.get_pool_iterator(shard_uid) { | ||
let prepared = if let Some(mut iter) = sharded_tx_pool.get_pool_iterator(shard_uid) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: prepared_transactions to keep it consistent or just transactions to indicate it's going to be the transactins field of PreparedTransactions
nearcore/src/runtime/mod.rs
Outdated
result.limited_by = Some(PrepareTransactionsLimit::ReceiptCount); | ||
break; | ||
} | ||
if let Some(tlimit) = &time_limit { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: if let Some(time_limit) = &time_limit
would be more rusty way to do it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My instinct is to avoid shadowing variables at all cost, it's easier on the brain when one name corresponds to one object. And I guess I had many bad experiences with bugs caused by shadowing things.
But okay, I'll rename tlimit
totime_limit
if that's more rusty.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Generally speaking I agree with you, in this particular case, one could argue it's still the same thing, the only difference is the type and that is checked by the compiler.
also it takes a lot of willpower than I'm willing to admit to not look for a better name ;) |
oops, previously I ran locust loadtests on this machine ( |
In a previous commit I remove wallet_contract.wasm thinking that it shouldn't be tracked by git, but it turns out that it should. Let's take wallet_contract.wasm from master and add it to the git. The effect will be that there will be no change to wallet_contract.wasm
v2:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks!
@@ -311,6 +311,22 @@ pub struct ApplyChunkShardContext<'a> { | |||
pub is_first_block_with_chunk_of_version: bool, | |||
} | |||
|
|||
#[derive(Debug, Clone, PartialEq, Eq)] | |||
pub struct PreparedTransactions { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: typically we use tripple / for struct and method comments
/// Contains transactions ...
I don’t. I have noticed that running the full test suite will generate those files for me as well though. If we don't expect it to be checked in, we should put it in |
@nagisa It is generated by |
In some circumstances, `runtime/near-wallet-contract/build.rs` is triggered and it regenerates the Wallet Contract WASM file for unrelated PRs: #10422 (comment). The `Wallet Contract` will be generated differently (#10269 (comment)), but for now I would like to include this tiny PR to stops auto-generating the WASM file by default. To test that the WASM file is not auto-regenerated, add a space to `runtime/near-wallet-contract/wallet-contract/src/lib.rs` and run `cargo build --features nightly`.
The code coverage check didn't pass because there are no tests for the new metric. |
When a chunk is produced, Client calls
prepare_transactions()
, which fetches transactions from the transaction pool and adds them to the chunk.prepare_transactions()
adds transactions until some limit is hit, a limit on gas, time, number of transactions, chunk size, etc.Currently there's no way to know if some limit was hit and which limit was it. Let's add a way to access this information, it's useful to know what's the bottleneck of chunk production.
This PR adds a new metric which counts how many of the produced chunks couldn't fit all transactions from the transaction pool due to hitting a limit. It has two labels:
shard_id
- chunk's shard andlimited_by
- which of the limits was hit inprepare_transactions()
.The need for this metric was discussed in #10310 (comment). The hope is that it'll allow us to figure out what's the bottleneck of chunk production in scenarios of high chain congestion.
Right now in cases where
produce_chunk
takes a lot of time we can only make theories why that's the case, but there's no observability into what's going inside. This metric could help with that.To test that the metric works I set up a mocknet network with a build from this branch and put a load on it using locust.
I reduced
produce_chunk_add_transactions_time_limit
from 200ms to 2ms to easily trigger the metric. It worked, as can be observed on the grafana dashboard: https://nearinc.grafana.net/d/eada7f01-b2dc-4df8-8a9f-ec4ec411159e/jancio-mocknet-stats?orgId=1&from=1705135200000&to=1705135800000