-
Notifications
You must be signed in to change notification settings - Fork 2.6k
Add a block production benchmark #10104
Add a block production benchmark #10104
Conversation
|
||
// You should have received a copy of the GNU General Public License | ||
// along with this program. If not, see <https://www.gnu.org/licenses/>. | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe a small doc on what this benchmark does, and what assumptions it makes?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm.... well, I could certainly write something out, but considering how relatively simple the benchmark is (and the fact that I'm not yet super familiar with all of the internals) I'm not entirely sure what extra information I could add with such a comment. (:
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice, ty! Looks good now :)
let max_transfer_count = { | ||
let mut transfer_count = 0; | ||
let mut block_builder = client.new_block(Default::default()).unwrap(); | ||
block_builder.push(extrinsic_set_time(1 + 1500)).unwrap(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
block_builder.push(extrinsic_set_time(1 + 1500)).unwrap(); | |
block_builder.push(extrinsic_set_time(1 + MILLISECS_PER_BLOCK)).unwrap(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, right, you're right that I should use a proper constant for this, however this actually can't be MILLISECS_PER_BLOCK
. (:
Since BABE isn't actually running and we're using the BlockBuilder
directly to author the blocks the current timeslot in BABE doesn't actually progress, so the code fails with a Timestamp slot must match 'CurrentSlot'
assertion if we actually set the time to the next block.
So what I'm doing here is incrementing the timestamp only by the MinimumPeriod
(which is 1500, half of the
SLOT_DURATION
/MILLISECS_PER_BLOCK
), so BABE doesn't complain since it's still within the first timeslot.
If we had to actually import more blocks than the first one this would obviously not work, but we don't.
Is this actually a problem here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No should be fine.
b.iter_batched( | ||
|| { | ||
let mut extrinsics = Vec::with_capacity(max_transfer_count + 1); | ||
extrinsics.push(extrinsic_set_time(1 + 1500)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
extrinsics.push(extrinsic_set_time(1 + 1500)); | |
extrinsics.push(extrinsic_set_time(1 + MILLISECS_PER_BLOCK)); |
use criterion::{criterion_group, criterion_main, BatchSize, Criterion, Throughput}; | ||
|
||
use node_cli::service::{create_extrinsic, FullClient}; | ||
use node_runtime::{constants::currency::*, BalancesCall}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
use node_runtime::{constants::currency::*, BalancesCall}; | |
use node_runtime::{constants::{currency::*, time::MILLISECS_PER_BLOCK}, BalancesCall}; |
Benchmark result after switching to WASM:
|
Results with the execution switched to
|
Hmm, that is much better than I expected :D We may need to use the entire basic authorship machinery, but for the beginning it is probably okay. |
Also, mostly just for fun I've checked - bumping
|
Creating all of those extrinsics takes up *a lot* of time, up to the point where the majority of the time is actually spent *outside* of the code which we want to benchmark here. So let's only do it once.
Nice :P I meant we should switch using This makes it more complicated, however I assume that this will slow it down a little bit. If not, our code seems to be good enough performancewise :P |
Hmm... okay, so how about we add this benchmark mostly as-is, and add another one with that extra machinery included? (I think this benchmark should still be somewhat useful anyway? If nothing else it can be used to easily compare native/interpreted/compiled executions and/or check if new versions of |
Yeah, as already said before, I'm fine with merging as is :) |
I'm more "worried" that there are no optimization opportunities xD |
Okay, one more thing should be done. Please add a second variant that is building blocks with proof generation enabled. (this is just a setting of the block builder) |
Added a variant of the benchmark with proof recording; the difference in performance is minimal:
|
informant_output_format: Default::default(), | ||
wasm_runtime_overrides: None, | ||
}; | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I feel like Configuration should really have a Default impl. Then you can Configuration { impl_name: "BenchmarkImpl".into(), ..Default::default }
.
I feel like we don't care about most of these details here.
Ahh yeah, we always read the same 2 positions. So, this is cached :P But let us merge this and then in a follow up we should try to use unique accounts for sending and receiving, for every tx. |
Sounds good to me; I'll add the unique accounts in a separate PR. |
bot merge |
* Add a block production benchmark * Simplify the block production benchmark * Cleanups; switch execution strategy to WASM * Switch WASM execution to `Compiled` * Reduce the setup cost of the benchmark Creating all of those extrinsics takes up *a lot* of time, up to the point where the majority of the time is actually spent *outside* of the code which we want to benchmark here. So let's only do it once. * Add a variant of the block production benchmark with proof recording
* Add a block production benchmark * Simplify the block production benchmark * Cleanups; switch execution strategy to WASM * Switch WASM execution to `Compiled` * Reduce the setup cost of the benchmark Creating all of those extrinsics takes up *a lot* of time, up to the point where the majority of the time is actually spent *outside* of the code which we want to benchmark here. So let's only do it once. * Add a variant of the block production benchmark with proof recording
This PR adds a block production benchmark.
Here are the results on my machine (Threadripper 3970X):
cc @bkchr Before I go off and try to profile this I'd appreciate if you could take a look and tell me whenever I haven't done anything particularly stupid here.
Fixes #9978