[Tests] Improve benchmark precision #2538

furszy · 2021-08-31T20:36:11Z

Coming mainly from bitcoin#11517, which improves microbenchmarking with multiple features:

* inline performance critical code
* Average runtime is specified and used to calculate iterations.
* Console: show median of multiple runs
* plot: show box plot
* filter benchmarks
* specify scaling factor
* ignore src/test and src/bench in command line check script
* number of iterations instead of time
* Replaced runtime in BENCHMARK makro number of iterations.
* Added -? to bench_pivx
* Benchmark plotly.js URL, width, height can be customized
* Fixed incorrect precision warning

Plus added CMake support for the benchmarking framework as well.

-- Needed for a coming PR that i'm cooking that contains a new benchmark --

random-zebra · 2021-09-09T12:46:56Z

Great backport.

src/bench/bench_pivx -filter="SHA256$" -printer=plot -evals=20 > benchSHA.html

splendid. Love the features that martinus added upstream ❤️

There is an issue with DeserializeAndCheckBlockTest though.
It currently segfaults during CheckBlock (specifically in CheckBlockSignature) due to

ARG_CHECK(secp256k1_ecmult_context_is_built(&ctx->ecmult_ctx));

gdb seems to suggest that secp256k1_ecdsa_verify is being called with a null context:

Program received signal SIGSEGV, Segmentation fault.
secp256k1_ecdsa_verify (ctx=0x0, sig=sig@entry=0x7fffffffce50, 
    msg32=msg32@entry=0x7fffffffcfa0 "\243MU\360(_@!(%\253\064/\r\206\004\063\177;\201Q\341R\350#0:\276\071\200\357v|", pubkey=pubkey@entry=0x7fffffffce10) at src/secp256k1.c:308
308	    ARG_CHECK(secp256k1_ecmult_context_is_built(&ctx->ecmult_ctx));
(gdb) bt
#0  secp256k1_ecdsa_verify (ctx=0x0, sig=sig@entry=0x7fffffffce50, 
    msg32=msg32@entry=0x7fffffffcfa0 "\243MU\360(_@!(%\253\064/\r\206\004\063\177;\201Q\341R\350#0:\276\071\200\357v|", pubkey=pubkey@entry=0x7fffffffce10) at src/secp256k1.c:308
#1  0x0000555555a8421f in CPubKey::Verify (this=this@entry=0x7fffffffcf00, hash=..., 
    vchSig=std::vector of length 70, capacity 70 = {...}) at pubkey.cpp:184
#2  0x0000555555677e13 in CheckBlockSignature (block=...) at blocksignature.cpp:97
#3  0x00005555555f42d3 in CheckBlock (block=..., state=..., fCheckPOW=fCheckPOW@entry=true, 
    fCheckMerkleRoot=fCheckMerkleRoot@entry=true, fCheckSig=fCheckSig@entry=true) at validation.cpp:2803
#4  0x00005555555c85d1 in DeserializeAndCheckBlockTest (state=...) at bench/checkblock.cpp:43
#5  0x00005555555c5ac5 in std::_Function_handler<void (benchmark::State&), void (*)(benchmark::State&)>::_M_invoke(std::_Any_data const&, benchmark::State&) (__functor=..., __args#0=...)
    at /usr/include/c++/7/bits/std_function.h:316
#6  0x00005555555b16ce in std::function<void (benchmark::State&)>::operator()(benchmark::State&) const (
    this=this@entry=0x5555563a7a10, __args#0=...) at /usr/include/c++/7/bits/std_function.h:706
#7  0x00005555555b0153 in benchmark::BenchRunner::RunAll (printer=..., num_evals=5, 
    scaling=<optimized out>, filter=..., is_list_only=false) at bench/bench.cpp:122
#8  0x00005555555a2b73 in main (argc=<optimized out>, argv=<optimized out>) at bench/bench_pivx.cpp:67

Initializing a const ECCVerifyHandle before calling ECC_Start() in main() fixes the segfault, but... I'm not sure how was it working before then (we might consider porting bitcoin#17275 in any case).

furszy · 2021-09-09T13:57:30Z

Might be the case of a dangling ECCVerifyHandle somewhere.. but yeah, in any case, added bitcoin#13722, bitcoin#17275 and the ECCVerifyHandle init in main.
While we do the ecliptic curve start in main for every bench test, we are forced to initialize the handler there as well.

random-zebra · 2021-09-09T15:10:55Z

cmake build fails (when libnatpmp-dev found) due to missing link to NAT-PMP libraries:

../../libSERVER_A.a(mapport.cpp.o): In function `NatpmpInit(natpmp_t*)':
/PIVX/src/mapport.cpp:54: undefined reference to `initnatpmp'
../../libSERVER_A.a(mapport.cpp.o): In function `NatpmpDiscover(natpmp_t*, in_addr&)':
/PIVX/src/mapport.cpp:62: undefined reference to `sendpublicaddressrequest'
/PIVX/src/mapport.cpp:67: undefined reference to `readnatpmpresponseorretry'
../../libSERVER_A.a(mapport.cpp.o): In function `NatpmpMapping(natpmp_t*, in_addr const&, unsigned short, bool&)':
/PIVX/src/mapport.cpp:88: undefined reference to `sendnewportmappingrequest'
/PIVX/src/mapport.cpp:93: undefined reference to `readnatpmpresponseorretry'
../../libSERVER_A.a(mapport.cpp.o): In function `ProcessNatpmp()':
/PIVX/src/mapport.cpp:135: undefined reference to `sendnewportmappingrequest'
/PIVX/src/mapport.cpp:144: undefined reference to `closenatpmp'
collect2: error: ld returned 1 exit status
src/bench/CMakeFiles/bench_pivx.dir/build.make:326: recipe for target 'src/bench/bench_pivx' failed
make[2]: *** [src/bench/bench_pivx] Error 1
CMakeFiles/Makefile2:1302: recipe for target 'src/bench/CMakeFiles/bench_pivx.dir/all' failed
make[1]: *** [src/bench/CMakeFiles/bench_pivx.dir/all] Error 2
make[1]: *** Waiting for unfinished jobs....

Need to add

if(NAT-PMP_FOUND)
    target_link_libraries(bench_pivx PRIVATE ${NAT-PMP_LIBRARY})
    target_include_directories(bench_pivx PRIVATE ${NAT-PMP_INCLUDE_DIR})
endif()

to CMakeLists.txt

furszy · 2021-09-09T15:37:41Z

done, added NAT-PMP link. Squashed in 70b1485.

random-zebra · 2021-09-15T15:50:29Z

Even though the documentation says

Choose a num_iters_for_one_second that takes roughly 1 second. The goal is that all benchmarks should take approximately the same time, and scaling factor can be used that the total time is appropriate for your system.

with the current hardcoded num_iters_for_one_second the numbers seem all over the place here (so can't be just scaled).
Running a single evaluation for each bench (evals=1), I get (note the total column):

# Benchmark,       iter.,      total,    min,          max,          median
------------       ------      ------    ----          ----          ------
Base58CheckEncode, 320000,     7.51618,  2.34881e-05,  2.34881e-05,  2.34881e-05
Base58Decode,      800000,     5.98559,  7.48199e-06,  7.48199e-06,  7.48199e-06
Base58Encode,      470000,     8.46528,  1.80112e-05,  1.80112e-05,  1.80112e-05
BenchTimeDep.,     100000000,  0.67938,  6.79383e-09,  6.79383e-09,  6.79383e-09
BenchTimeMillis,   6000000,    1.83173,  3.05289e-07,  3.05289e-07,  3.05289e-07
BenchTimeMillisS., 6000000,    1.78481,  2.97469e-07,  2.97469e-07,  2.97469e-07
BenchTimeMock,     300000000,  5.41958,  1.80653e-08,  1.80653e-08,  1.80653e-08
CCQSPrevectorJob,  1400,       11.6094,  0.00829243,   0.00829243,   0.00829243
CHACHA20_1MB,      340,        1.0536,   0.00309883,   0.00309883,   0.00309883
CHACHA20_256BYTES, 250000,     0.19130,  7.6523e-07,   7.6523e-07,   7.6523e-07
CHACHA20_64BYTES,  500000,     0.10807,  2.16149e-07,  2.16149e-07,  2.16149e-07
DCheckBlockT,      160,        0.036216, 0.000226352,  0.000226352,  0.00022635
DBlockT,           130,        0.002566, 1.97411e-05,  1.97411e-05,  1.97411e-05
FastRandom_1bit,   440000000,  2.35796,  5.359e-09,    5.359e-09,    5.359e-09
FastRandom_32bit,  110000000,  3.24959,  2.95417e-08,  2.95417e-08,  2.95417e-08
LockedPool,        530,        10.1038,  0.0190638,    0.0190638,    0.0190638
PrevectorClearNT,  28300,      7.27909,  0.000257212,  0.000257212,  0.000257212
PrevectorClearT,   88600,      3.66173,  4.13288e-05,  4.13288e-05,  4.13288e-05
PrevectorDestNT,   28800,      7.31417,  0.000253964,  0.000253964,  0.000253964
PrevectorDestT,    88900,      3.78905,  4.26215e-05,  4.26215e-05,  4.26215e-05
PrevectorResNT,    28900,      7.43385,  0.000257227,  0.000257227,  0.000257227
PrevectorResT,     90300,      3.68028,  4.07562e-05,  4.07562e-05,  4.07562e-05
RIPEMD160,         440,        1.62406,  0.00369106,   0.00369106,   0.00369106
SHA1,              570,        1.92262,  0.00337301,   0.00337301,   0.00337301
SHA256,            340,        2.31816,  0.00681811,   0.00681811,   0.00681811
SHA256_32b,        4700000,    2.26302,  4.81495e-07,  4.81495e-07,  4.81495e-07
SHA512,            330,        1.45384,  0.00440558,   0.00440558,   0.00440558
Sleep100ms,        10,         1.002,    0.1002,       0.1002,       0.1002
Trig,              12000000,   0.47018,  3.91823e-08,  3.91823e-08,  3.91823e-08

With the following changes:

BENCHMARK(Base58Encode, 58 * 1000);
BENCHMARK(Base58CheckEncode, 43 * 1000);
BENCHMARK(Base58Decode, 136 * 1000);
BENCHMARK(BenchTimeDeprecated, 158000000);
BENCHMARK(BenchTimeMillis, 3460000);
BENCHMARK(BenchTimeMillisSys, 3460000);
BENCHMARK(BenchTimeMock, 59500000);
BENCHMARK(CCheckQueueSpeedPrevectorJob, 122);
BENCHMARK(CHACHA20_64BYTES, 5000000);
BENCHMARK(CHACHA20_256BYTES, 1250000);
BENCHMARK(CHACHA20_1MB, 338);
BENCHMARK(DeserializeBlockTest, 56500);
BENCHMARK(DeserializeAndCheckBlockTest, 4500);
BENCHMARK(FastRandom_1bit, 195 * 1000 * 1000);
BENCHMARK(FastRandom_32bit, 34 * 1000 * 1000);
BENCHMARK(LockedPool, 58);
PREVECTOR_TEST(Clear, 4042, 25300)
PREVECTOR_TEST(Destructor, 4058, 24400)
PREVECTOR_TEST(Resize, 4059, 25200)
BENCHMARK(RIPEMD160, 275);
BENCHMARK(SHA1, 305);
BENCHMARK(SHA256, 156);
BENCHMARK(SHA256_32b, 2300 * 1000);
BENCHMARK(SHA512, 240);

I get total times closer to 1 sec:

# Benchmark,       iter.,      total,    min,          max,          median
------------       ------      ------    ----          ----          ------
Base58CheckEncode, 43000,      1.00515,  2.33755e-05,  2.33755e-05,  2.33755e-05
Base58Decode,      136000,     1.01659,  7.47495e-06,  7.47495e-06,  7.47495e-06
Base58Encode,      58000,      1.02217,  1.76236e-05,  1.76236e-05,  1.76236e-05
BenchTimeDep.,     158000000,  1.04806,  6.63328e-09,  6.63328e-09,  6.63328e-09
BenchTimeMillis,   3460000,    1.02533,  2.96338e-07,  2.96338e-07,  2.96338e-07
BenchTimeMillisS., 3460000,    1.02366,  2.95855e-07,  2.95855e-07,  2.95855e-07
BenchTimeMock,     59500000,   1.00954,  1.69671e-08,  1.69671e-08,  1.69671e-08
CCQSPrevectorJob,  122,        1.03534,  0.00848638,   0.00848638,   0.00848638
CHACHA20_1MB,      338,        1.04125,  0.00308062,   0.00308062,   0.00308062
CHACHA20_256BYTES, 1280000,    0.97272,  7.5994e-07,   7.5994e-07,   7.5994e-07
CHACHA20_64BYTES,  5000000,    1.00826,  2.01652e-07,  2.01652e-07,  2.01652e-07
DCheckBlockT,      4700,       1.03128,  0.000219421,  0.000219421,  0.000219421
DBlockT,           57000,      1.00041,  1.75511e-05,  1.75511e-05,  1.75511e-05
FastRandom_1bit,   195000000,  1.0181,   5.22105e-09,  5.22105e-09,  5.22105e-09
FastRandom_32bit,  34000000,   0.99274,  2.91983e-08,  2.91983e-08,  2.91983e-08
LockedPool,        59,         0.99454,  0.0168566,    0.0168566,    0.0168566
PrevectorClearNT,  4042,       1.01434,  0.00025095,   0.00025095,   0.00025095
PrevectorClearT,   25300,      1.03624,  4.0958e-05,   4.0958e-05,   4.0958e-05
PrevectorDestNT,   4058,       1.02054,  0.00025149,   0.00025149,   0.00025149
PrevectorDestT,    24800,      0.98466,  3.97042e-05,  3.97042e-05,  3.97042e-05
PrevectorResNT,    4059,       1.00031,  0.000246443,  0.000246443,  0.000246443
PrevectorResT,     25200,      1.00477,  3.98719e-05,  3.98719e-05,  3.98719e-05
RIPEMD160,         277,        0.98832,  0.00356797,   0.00356797,   0.00356797
SHA1,              305,        0.99946,  0.00327693,   0.00327693,   0.00327693
SHA256,            156,        1.0001,   0.00641087,   0.00641087,   0.00641087
SHA256_32b,        2280000,    1.04067,  4.56433e-07,  4.56433e-07,  4.56433e-07
SHA512,            240,        1.00319,  0.00417997,   0.00417997,   0.00417997
Sleep100ms,        10,         1.0022,   0.10022,      0.10022,      0.10022
Trig,              26000000,   1.00525,  3.86635e-08,  3.86635e-08,  3.86635e-08

It would be useful to see the results on different systems to compare.

furszy · 2021-10-01T21:04:08Z

Running a single evaluation for each bench:

Benchmark	evals	iterations	total	min	max	median
Base58CheckEncode	1	320000	1.3768	4.30251e-06	4.30251e-06	4.30251e-06
Base58Decode	1	800000	0.866909	1.08364e-06	1.08364e-06	1.08364e-06
Base58Encode	1	470000	1.24672	2.6526e-06	2.6526e-06	2.6526e-06
BenchTimeDeprecated	1	100000000	17.3216	1.73216e-07	1.73216e-07	1.73216e-07
BenchTimeMillis	1	6000000	0.952316	1.58719e-07	1.58719e-07	1.58719e-07
BenchTimeMillisSys	1	6000000	0.936247	1.56041e-07	1.56041e-07	1.56041e-07
BenchTimeMock	1	300000000	0.908323	3.02774e-09	3.02774e-09	3.02774e-09
CCheckQueueSpeedPrevectorJob	1	1400	5.76881	0.00412058	0.00412058	0.00412058
CHACHA20_1MB	1	340	0.720688	0.00211967	0.00211967	0.00211967
CHACHA20_256BYTES	1	250000	0.131619	5.26475e-07	5.26475e-07	5.26475e-07
CHACHA20_64BYTES	1	500000	0.0691793	1.38359e-07	1.38359e-07	1.38359e-07
DeserializeAndCheckBlockTest	1	160	0.0180586	0.000112866	0.000112866	0.000112866
DeserializeBlockTest	1	130	0.00158144	1.21649e-05	1.21649e-05	1.21649e-05
FastRandom_1bit	1	440000000	1.03505	2.35238e-09	2.35238e-09	2.35238e-09
FastRandom_32bit	1	110000000	1.17205	1.0655e-08	1.0655e-08	1.0655e-08
LockedPool	1	530	0.77278	0.00145807	0.00145807	0.00145807
PrevectorClearNontrivial	1	28300	0.443614	1.56754e-05	1.56754e-05	1.56754e-05
PrevectorClearTrivial	1	88600	1.34821	1.52169e-05	1.52169e-05	1.52169e-05
PrevectorDestructorNontrivial	1	28800	0.446652	1.55088e-05	1.55088e-05	1.55088e-05
PrevectorDestructorTrivial	1	88900	1.34646	1.51458e-05	1.51458e-05	1.51458e-05
PrevectorResizeNontrivial	1	28900	0.449847	1.55656e-05	1.55656e-05	1.55656e-05
PrevectorResizeTrivial	1	90300	1.36857	1.51558e-05	1.51558e-05	1.51558e-05
RIPEMD160	1	440	1.23305	0.00280239	0.00280239	0.00280239
SHA1	1	570	1.13645	0.00199377	0.00199377	0.00199377
SHA256	1	340	1.67581	0.00492885	0.00492885	0.00492885
SHA256_32b	1	4700000	1.6126	3.43106e-07	3.43106e-07	3.43106e-07
SHA512	1	330	1.03	0.0031212	0.0031212	0.0031212
Sleep100ms	1	10	1.0394	0.10394	0.10394	0.10394
Trig	1	12000000	0.149759	1.24799e-08	1.24799e-08	1.24799e-08

furszy · 2021-10-01T21:22:55Z

After the changes, same evaluations, i get the following results:

Benchmark	evals	iterations	total	min	max	median
Base58CheckEncode	1	43000	0.18438	4.28791e-06	4.28791e-06	4.28791e-06
Base58Decode	1	136000	0.154272	1.13435e-06	1.13435e-06	1.13435e-06
Base58Encode	1	58000	0.159478	2.74963e-06	2.74963e-06	2.74963e-06
BenchTimeDeprecated	1	158000000	26.9816	1.7077e-07	1.7077e-07	1.7077e-07
BenchTimeMillis	1	3460000	0.555892	1.60662e-07	1.60662e-07	1.60662e-07
BenchTimeMillisSys	1	3460000	0.552805	1.5977e-07	1.5977e-07	1.5977e-07
BenchTimeMock	1	59500000	0.183366	3.08179e-09	3.08179e-09	3.08179e-09
CCheckQueueSpeedPrevectorJob	1	122	0.503528	0.00412728	0.00412728	0.00412728
CHACHA20_1MB	1	338	0.715717	0.00211751	0.00211751	0.00211751
CHACHA20_256BYTES	1	1250000	0.650627	5.20502e-07	5.20502e-07	5.20502e-07
CHACHA20_64BYTES	1	`5000000`	0.700708	1.40142e-07	1.40142e-07	1.40142e-07
DeserializeAndCheckBlockTest	1	4500	0.515684	0.000114596	0.000114596	0.000114596
DeserializeBlockTest	1	56500	0.720881	1.2759e-05	1.2759e-05	1.2759e-05
FastRandom_1bit	1	195000000	0.457536	2.34634e-09	2.34634e-09	2.34634e-09
FastRandom_32bit	1	34000000	0.366599	1.07823e-08	1.07823e-08	1.07823e-08
LockedPool	1	58	0.0868	0.00149655	0.00149655	0.00149655
PrevectorClearNontrivial	1	4042	0.0655623	1.62203e-05	1.62203e-05	1.62203e-05
PrevectorClearTrivial	1	25300	0.385769	1.52478e-05	1.52478e-05	1.52478e-05
PrevectorDestructorNontrivial	1	4058	0.0656059	1.61671e-05	1.61671e-05	1.61671e-05
PrevectorDestructorTrivial	1	24400	0.371695	1.52334e-05	1.52334e-05	1.52334e-05
PrevectorResizeNontrivial	1	4059	0.0653406	1.60977e-05	1.60977e-05	1.60977e-05
PrevectorResizeTrivial	1	25200	0.383007	1.51987e-05	1.51987e-05	1.51987e-05
RIPEMD160	1	275	0.799951	0.00290891	0.00290891	0.00290891
SHA1	1	305	0.656839	0.00215357	0.00215357	0.00215357
SHA256	1	156	0.815839	0.00522974	0.00522974	0.00522974
SHA256_32b	1	2300000	0.824155	3.58328e-07	3.58328e-07	3.58328e-07
SHA512	1	240	0.776275	0.00323448	0.00323448	0.00323448
Sleep100ms	1	10	1.02837	0.102837	0.102837	0.102837
Trig	1	12000000	0.155188	1.29323e-08	1.29323e-08	1.29323e-08

* inline performance critical code * Average runtime is specified and used to calculate iterations. * Console: show median of multiple runs * plot: show box plot * filter benchmarks * specify scaling factor * ignore src/test and src/bench in command line check script * number of iterations instead of time * Replaced runtime in BENCHMARK makro number of iterations. * Added -? to bench_bitcoin * Benchmark plotly.js URL, width, height can be customized * Fixed incorrect precision warning

This benchmark's runtime was rather unpredictive on different machines, not really a useful benchmark.

random-zebra · 2021-10-04T12:49:02Z

Yeah, we can leave the numbers as they are. We won't find values that sort-of-work with all possible machines/systems anyway.
This is good to go imo.
But it should be rebased on master first (as there are new bench tests not included here).

furszy · 2021-10-04T15:48:06Z

i rebased it right after sharing my results (3 days ago) and squashed the new benchmarks inside 17c4bcf ;).

random-zebra

ACK a5fcdac

furszy self-assigned this Aug 31, 2021

furszy added the Bench Benchmarking framework label Sep 1, 2021

random-zebra added the Upstream label Sep 9, 2021

random-zebra added this to the 5.4.0 milestone Sep 9, 2021

furszy force-pushed the 2021_improve_bench_framework branch from b7c3576 to dbcaf86 Compare September 9, 2021 15:35

Bugfix: The var is LIBUNIVALUE,not LIBBITCOIN_UNIVALUE

b15c06f

furszy force-pushed the 2021_improve_bench_framework branch from dbcaf86 to 79f8271 Compare October 1, 2021 21:37

furszy and others added 5 commits October 1, 2021 19:00

Build: Add cmake support for benchmarking framework

7f8f030

Removed CCheckQueueSpeed benchmark

0519e22

This benchmark's runtime was rather unpredictive on different machines, not really a useful benchmark.

trivial: Replace CPubKey::operator[] with CPubKey::vch where possible

0c97fbe

pubkey: Assert CPubKey's ECCVerifyHandle precondition

a5fcdac

furszy force-pushed the 2021_improve_bench_framework branch from 79f8271 to a5fcdac Compare October 1, 2021 22:01

random-zebra approved these changes Oct 8, 2021

View reviewed changes

furszy merged commit c3028cc into PIVX-Project:master Oct 13, 2021

furszy deleted the 2021_improve_bench_framework branch November 29, 2022 15:43

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Tests] Improve benchmark precision #2538

[Tests] Improve benchmark precision #2538

furszy commented Aug 31, 2021

random-zebra commented Sep 9, 2021

furszy commented Sep 9, 2021

random-zebra commented Sep 9, 2021 •

edited

Loading

furszy commented Sep 9, 2021

random-zebra commented Sep 15, 2021

furszy commented Oct 1, 2021

furszy commented Oct 1, 2021

random-zebra commented Oct 4, 2021

furszy commented Oct 4, 2021

random-zebra left a comment

[Tests] Improve benchmark precision #2538

[Tests] Improve benchmark precision #2538

Conversation

furszy commented Aug 31, 2021

random-zebra commented Sep 9, 2021

furszy commented Sep 9, 2021

random-zebra commented Sep 9, 2021 • edited Loading

furszy commented Sep 9, 2021

random-zebra commented Sep 15, 2021

furszy commented Oct 1, 2021

furszy commented Oct 1, 2021

random-zebra commented Oct 4, 2021

furszy commented Oct 4, 2021

random-zebra left a comment

Choose a reason for hiding this comment

random-zebra commented Sep 9, 2021 •

edited

Loading