Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Tests] Improve benchmark precision #2538

Merged
merged 6 commits into from
Oct 13, 2021

Conversation

furszy
Copy link

@furszy furszy commented Aug 31, 2021

Coming mainly from bitcoin#11517, which improves microbenchmarking with multiple features:

* inline performance critical code
* Average runtime is specified and used to calculate iterations.
* Console: show median of multiple runs
* plot: show box plot
* filter benchmarks
* specify scaling factor
* ignore src/test and src/bench in command line check script
* number of iterations instead of time
* Replaced runtime in BENCHMARK makro number of iterations.
* Added -? to bench_pivx
* Benchmark plotly.js URL, width, height can be customized
* Fixed incorrect precision warning

Plus added CMake support for the benchmarking framework as well.

-- Needed for a coming PR that i'm cooking that contains a new benchmark --

@furszy furszy self-assigned this Aug 31, 2021
@furszy furszy added the Bench Benchmarking framework label Sep 1, 2021
@random-zebra random-zebra added this to the 5.4.0 milestone Sep 9, 2021
@random-zebra
Copy link

Great backport.

src/bench/bench_pivx -filter="SHA256$" -printer=plot -evals=20 > benchSHA.html

Screenshot from 2021-09-09 13-48-40

splendid. Love the features that martinus added upstream ❤️


There is an issue with DeserializeAndCheckBlockTest though.
It currently segfaults during CheckBlock (specifically in CheckBlockSignature) due to

ARG_CHECK(secp256k1_ecmult_context_is_built(&ctx->ecmult_ctx));

gdb seems to suggest that secp256k1_ecdsa_verify is being called with a null context:

Program received signal SIGSEGV, Segmentation fault.
secp256k1_ecdsa_verify (ctx=0x0, sig=sig@entry=0x7fffffffce50, 
    msg32=msg32@entry=0x7fffffffcfa0 "\243MU\360(_@!(%\253\064/\r\206\004\063\177;\201Q\341R\350#0:\276\071\200\357v|", pubkey=pubkey@entry=0x7fffffffce10) at src/secp256k1.c:308
308	    ARG_CHECK(secp256k1_ecmult_context_is_built(&ctx->ecmult_ctx));
(gdb) bt
#0  secp256k1_ecdsa_verify (ctx=0x0, sig=sig@entry=0x7fffffffce50, 
    msg32=msg32@entry=0x7fffffffcfa0 "\243MU\360(_@!(%\253\064/\r\206\004\063\177;\201Q\341R\350#0:\276\071\200\357v|", pubkey=pubkey@entry=0x7fffffffce10) at src/secp256k1.c:308
#1  0x0000555555a8421f in CPubKey::Verify (this=this@entry=0x7fffffffcf00, hash=..., 
    vchSig=std::vector of length 70, capacity 70 = {...}) at pubkey.cpp:184
#2  0x0000555555677e13 in CheckBlockSignature (block=...) at blocksignature.cpp:97
#3  0x00005555555f42d3 in CheckBlock (block=..., state=..., fCheckPOW=fCheckPOW@entry=true, 
    fCheckMerkleRoot=fCheckMerkleRoot@entry=true, fCheckSig=fCheckSig@entry=true) at validation.cpp:2803
#4  0x00005555555c85d1 in DeserializeAndCheckBlockTest (state=...) at bench/checkblock.cpp:43
#5  0x00005555555c5ac5 in std::_Function_handler<void (benchmark::State&), void (*)(benchmark::State&)>::_M_invoke(std::_Any_data const&, benchmark::State&) (__functor=..., __args#0=...)
    at /usr/include/c++/7/bits/std_function.h:316
#6  0x00005555555b16ce in std::function<void (benchmark::State&)>::operator()(benchmark::State&) const (
    this=this@entry=0x5555563a7a10, __args#0=...) at /usr/include/c++/7/bits/std_function.h:706
#7  0x00005555555b0153 in benchmark::BenchRunner::RunAll (printer=..., num_evals=5, 
    scaling=<optimized out>, filter=..., is_list_only=false) at bench/bench.cpp:122
#8  0x00005555555a2b73 in main (argc=<optimized out>, argv=<optimized out>) at bench/bench_pivx.cpp:67

Initializing a const ECCVerifyHandle before calling ECC_Start() in main() fixes the segfault, but... I'm not sure how was it working before then (we might consider porting bitcoin#17275 in any case).

@furszy
Copy link
Author

furszy commented Sep 9, 2021

Might be the case of a dangling ECCVerifyHandle somewhere.. but yeah, in any case, added bitcoin#13722, bitcoin#17275 and the ECCVerifyHandle init in main.
While we do the ecliptic curve start in main for every bench test, we are forced to initialize the handler there as well.

@random-zebra
Copy link

random-zebra commented Sep 9, 2021

cmake build fails (when libnatpmp-dev found) due to missing link to NAT-PMP libraries:

../../libSERVER_A.a(mapport.cpp.o): In function `NatpmpInit(natpmp_t*)':
/PIVX/src/mapport.cpp:54: undefined reference to `initnatpmp'
../../libSERVER_A.a(mapport.cpp.o): In function `NatpmpDiscover(natpmp_t*, in_addr&)':
/PIVX/src/mapport.cpp:62: undefined reference to `sendpublicaddressrequest'
/PIVX/src/mapport.cpp:67: undefined reference to `readnatpmpresponseorretry'
../../libSERVER_A.a(mapport.cpp.o): In function `NatpmpMapping(natpmp_t*, in_addr const&, unsigned short, bool&)':
/PIVX/src/mapport.cpp:88: undefined reference to `sendnewportmappingrequest'
/PIVX/src/mapport.cpp:93: undefined reference to `readnatpmpresponseorretry'
../../libSERVER_A.a(mapport.cpp.o): In function `ProcessNatpmp()':
/PIVX/src/mapport.cpp:135: undefined reference to `sendnewportmappingrequest'
/PIVX/src/mapport.cpp:144: undefined reference to `closenatpmp'
collect2: error: ld returned 1 exit status
src/bench/CMakeFiles/bench_pivx.dir/build.make:326: recipe for target 'src/bench/bench_pivx' failed
make[2]: *** [src/bench/bench_pivx] Error 1
CMakeFiles/Makefile2:1302: recipe for target 'src/bench/CMakeFiles/bench_pivx.dir/all' failed
make[1]: *** [src/bench/CMakeFiles/bench_pivx.dir/all] Error 2
make[1]: *** Waiting for unfinished jobs....

Need to add

if(NAT-PMP_FOUND)
    target_link_libraries(bench_pivx PRIVATE ${NAT-PMP_LIBRARY})
    target_include_directories(bench_pivx PRIVATE ${NAT-PMP_INCLUDE_DIR})
endif()

to CMakeLists.txt

@furszy furszy force-pushed the 2021_improve_bench_framework branch from b7c3576 to dbcaf86 Compare September 9, 2021 15:35
@furszy
Copy link
Author

furszy commented Sep 9, 2021

done, added NAT-PMP link. Squashed in 70b1485.

@random-zebra
Copy link

Even though the documentation says

Choose a num_iters_for_one_second that takes roughly 1 second. The goal is that all benchmarks should take approximately the same time, and scaling factor can be used that the total time is appropriate for your system.

with the current hardcoded num_iters_for_one_second the numbers seem all over the place here (so can't be just scaled).
Running a single evaluation for each bench (evals=1), I get (note the total column):

# Benchmark,       iter.,      total,    min,          max,          median
------------       ------      ------    ----          ----          ------
Base58CheckEncode, 320000,     7.51618,  2.34881e-05,  2.34881e-05,  2.34881e-05
Base58Decode,      800000,     5.98559,  7.48199e-06,  7.48199e-06,  7.48199e-06
Base58Encode,      470000,     8.46528,  1.80112e-05,  1.80112e-05,  1.80112e-05
BenchTimeDep.,     100000000,  0.67938,  6.79383e-09,  6.79383e-09,  6.79383e-09
BenchTimeMillis,   6000000,    1.83173,  3.05289e-07,  3.05289e-07,  3.05289e-07
BenchTimeMillisS., 6000000,    1.78481,  2.97469e-07,  2.97469e-07,  2.97469e-07
BenchTimeMock,     300000000,  5.41958,  1.80653e-08,  1.80653e-08,  1.80653e-08
CCQSPrevectorJob,  1400,       11.6094,  0.00829243,   0.00829243,   0.00829243
CHACHA20_1MB,      340,        1.0536,   0.00309883,   0.00309883,   0.00309883
CHACHA20_256BYTES, 250000,     0.19130,  7.6523e-07,   7.6523e-07,   7.6523e-07
CHACHA20_64BYTES,  500000,     0.10807,  2.16149e-07,  2.16149e-07,  2.16149e-07
DCheckBlockT,      160,        0.036216, 0.000226352,  0.000226352,  0.00022635
DBlockT,           130,        0.002566, 1.97411e-05,  1.97411e-05,  1.97411e-05
FastRandom_1bit,   440000000,  2.35796,  5.359e-09,    5.359e-09,    5.359e-09
FastRandom_32bit,  110000000,  3.24959,  2.95417e-08,  2.95417e-08,  2.95417e-08
LockedPool,        530,        10.1038,  0.0190638,    0.0190638,    0.0190638
PrevectorClearNT,  28300,      7.27909,  0.000257212,  0.000257212,  0.000257212
PrevectorClearT,   88600,      3.66173,  4.13288e-05,  4.13288e-05,  4.13288e-05
PrevectorDestNT,   28800,      7.31417,  0.000253964,  0.000253964,  0.000253964
PrevectorDestT,    88900,      3.78905,  4.26215e-05,  4.26215e-05,  4.26215e-05
PrevectorResNT,    28900,      7.43385,  0.000257227,  0.000257227,  0.000257227
PrevectorResT,     90300,      3.68028,  4.07562e-05,  4.07562e-05,  4.07562e-05
RIPEMD160,         440,        1.62406,  0.00369106,   0.00369106,   0.00369106
SHA1,              570,        1.92262,  0.00337301,   0.00337301,   0.00337301
SHA256,            340,        2.31816,  0.00681811,   0.00681811,   0.00681811
SHA256_32b,        4700000,    2.26302,  4.81495e-07,  4.81495e-07,  4.81495e-07
SHA512,            330,        1.45384,  0.00440558,   0.00440558,   0.00440558
Sleep100ms,        10,         1.002,    0.1002,       0.1002,       0.1002
Trig,              12000000,   0.47018,  3.91823e-08,  3.91823e-08,  3.91823e-08

With the following changes:

BENCHMARK(Base58Encode, 58 * 1000);
BENCHMARK(Base58CheckEncode, 43 * 1000);
BENCHMARK(Base58Decode, 136 * 1000);
BENCHMARK(BenchTimeDeprecated, 158000000);
BENCHMARK(BenchTimeMillis, 3460000);
BENCHMARK(BenchTimeMillisSys, 3460000);
BENCHMARK(BenchTimeMock, 59500000);
BENCHMARK(CCheckQueueSpeedPrevectorJob, 122);
BENCHMARK(CHACHA20_64BYTES, 5000000);
BENCHMARK(CHACHA20_256BYTES, 1250000);
BENCHMARK(CHACHA20_1MB, 338);
BENCHMARK(DeserializeBlockTest, 56500);
BENCHMARK(DeserializeAndCheckBlockTest, 4500);
BENCHMARK(FastRandom_1bit, 195 * 1000 * 1000);
BENCHMARK(FastRandom_32bit, 34 * 1000 * 1000);
BENCHMARK(LockedPool, 58);
PREVECTOR_TEST(Clear, 4042, 25300)
PREVECTOR_TEST(Destructor, 4058, 24400)
PREVECTOR_TEST(Resize, 4059, 25200)
BENCHMARK(RIPEMD160, 275);
BENCHMARK(SHA1, 305);
BENCHMARK(SHA256, 156);
BENCHMARK(SHA256_32b, 2300 * 1000);
BENCHMARK(SHA512, 240);

I get total times closer to 1 sec:

# Benchmark,       iter.,      total,    min,          max,          median
------------       ------      ------    ----          ----          ------
Base58CheckEncode, 43000,      1.00515,  2.33755e-05,  2.33755e-05,  2.33755e-05
Base58Decode,      136000,     1.01659,  7.47495e-06,  7.47495e-06,  7.47495e-06
Base58Encode,      58000,      1.02217,  1.76236e-05,  1.76236e-05,  1.76236e-05
BenchTimeDep.,     158000000,  1.04806,  6.63328e-09,  6.63328e-09,  6.63328e-09
BenchTimeMillis,   3460000,    1.02533,  2.96338e-07,  2.96338e-07,  2.96338e-07
BenchTimeMillisS., 3460000,    1.02366,  2.95855e-07,  2.95855e-07,  2.95855e-07
BenchTimeMock,     59500000,   1.00954,  1.69671e-08,  1.69671e-08,  1.69671e-08
CCQSPrevectorJob,  122,        1.03534,  0.00848638,   0.00848638,   0.00848638
CHACHA20_1MB,      338,        1.04125,  0.00308062,   0.00308062,   0.00308062
CHACHA20_256BYTES, 1280000,    0.97272,  7.5994e-07,   7.5994e-07,   7.5994e-07
CHACHA20_64BYTES,  5000000,    1.00826,  2.01652e-07,  2.01652e-07,  2.01652e-07
DCheckBlockT,      4700,       1.03128,  0.000219421,  0.000219421,  0.000219421
DBlockT,           57000,      1.00041,  1.75511e-05,  1.75511e-05,  1.75511e-05
FastRandom_1bit,   195000000,  1.0181,   5.22105e-09,  5.22105e-09,  5.22105e-09
FastRandom_32bit,  34000000,   0.99274,  2.91983e-08,  2.91983e-08,  2.91983e-08
LockedPool,        59,         0.99454,  0.0168566,    0.0168566,    0.0168566
PrevectorClearNT,  4042,       1.01434,  0.00025095,   0.00025095,   0.00025095
PrevectorClearT,   25300,      1.03624,  4.0958e-05,   4.0958e-05,   4.0958e-05
PrevectorDestNT,   4058,       1.02054,  0.00025149,   0.00025149,   0.00025149
PrevectorDestT,    24800,      0.98466,  3.97042e-05,  3.97042e-05,  3.97042e-05
PrevectorResNT,    4059,       1.00031,  0.000246443,  0.000246443,  0.000246443
PrevectorResT,     25200,      1.00477,  3.98719e-05,  3.98719e-05,  3.98719e-05
RIPEMD160,         277,        0.98832,  0.00356797,   0.00356797,   0.00356797
SHA1,              305,        0.99946,  0.00327693,   0.00327693,   0.00327693
SHA256,            156,        1.0001,   0.00641087,   0.00641087,   0.00641087
SHA256_32b,        2280000,    1.04067,  4.56433e-07,  4.56433e-07,  4.56433e-07
SHA512,            240,        1.00319,  0.00417997,   0.00417997,   0.00417997
Sleep100ms,        10,         1.0022,   0.10022,      0.10022,      0.10022
Trig,              26000000,   1.00525,  3.86635e-08,  3.86635e-08,  3.86635e-08

It would be useful to see the results on different systems to compare.

@furszy
Copy link
Author

furszy commented Oct 1, 2021

Running a single evaluation for each bench:

Benchmark evals iterations total min max median
Base58CheckEncode 1 320000 1.3768 4.30251e-06 4.30251e-06 4.30251e-06
Base58Decode 1 800000 0.866909 1.08364e-06 1.08364e-06 1.08364e-06
Base58Encode 1 470000 1.24672 2.6526e-06 2.6526e-06 2.6526e-06
BenchTimeDeprecated 1 100000000 17.3216 1.73216e-07 1.73216e-07 1.73216e-07
BenchTimeMillis 1 6000000 0.952316 1.58719e-07 1.58719e-07 1.58719e-07
BenchTimeMillisSys 1 6000000 0.936247 1.56041e-07 1.56041e-07 1.56041e-07
BenchTimeMock 1 300000000 0.908323 3.02774e-09 3.02774e-09 3.02774e-09
CCheckQueueSpeedPrevectorJob 1 1400 5.76881 0.00412058 0.00412058 0.00412058
CHACHA20_1MB 1 340 0.720688 0.00211967 0.00211967 0.00211967
CHACHA20_256BYTES 1 250000 0.131619 5.26475e-07 5.26475e-07 5.26475e-07
CHACHA20_64BYTES 1 500000 0.0691793 1.38359e-07 1.38359e-07 1.38359e-07
DeserializeAndCheckBlockTest 1 160 0.0180586 0.000112866 0.000112866 0.000112866
DeserializeBlockTest 1 130 0.00158144 1.21649e-05 1.21649e-05 1.21649e-05
FastRandom_1bit 1 440000000 1.03505 2.35238e-09 2.35238e-09 2.35238e-09
FastRandom_32bit 1 110000000 1.17205 1.0655e-08 1.0655e-08 1.0655e-08
LockedPool 1 530 0.77278 0.00145807 0.00145807 0.00145807
PrevectorClearNontrivial 1 28300 0.443614 1.56754e-05 1.56754e-05 1.56754e-05
PrevectorClearTrivial 1 88600 1.34821 1.52169e-05 1.52169e-05 1.52169e-05
PrevectorDestructorNontrivial 1 28800 0.446652 1.55088e-05 1.55088e-05 1.55088e-05
PrevectorDestructorTrivial 1 88900 1.34646 1.51458e-05 1.51458e-05 1.51458e-05
PrevectorResizeNontrivial 1 28900 0.449847 1.55656e-05 1.55656e-05 1.55656e-05
PrevectorResizeTrivial 1 90300 1.36857 1.51558e-05 1.51558e-05 1.51558e-05
RIPEMD160 1 440 1.23305 0.00280239 0.00280239 0.00280239
SHA1 1 570 1.13645 0.00199377 0.00199377 0.00199377
SHA256 1 340 1.67581 0.00492885 0.00492885 0.00492885
SHA256_32b 1 4700000 1.6126 3.43106e-07 3.43106e-07 3.43106e-07
SHA512 1 330 1.03 0.0031212 0.0031212 0.0031212
Sleep100ms 1 10 1.0394 0.10394 0.10394 0.10394
Trig 1 12000000 0.149759 1.24799e-08 1.24799e-08 1.24799e-08

@furszy
Copy link
Author

furszy commented Oct 1, 2021

After the changes, same evaluations, i get the following results:

Benchmark evals iterations total min max median
 Base58CheckEncode 1 43000 0.18438 4.28791e-06 4.28791e-06 4.28791e-06
 Base58Decode 1 136000 0.154272 1.13435e-06 1.13435e-06 1.13435e-06
 Base58Encode 1 58000 0.159478 2.74963e-06 2.74963e-06 2.74963e-06
 BenchTimeDeprecated 1 158000000 26.9816 1.7077e-07 1.7077e-07 1.7077e-07
 BenchTimeMillis 1 3460000 0.555892 1.60662e-07 1.60662e-07 1.60662e-07
 BenchTimeMillisSys 1 3460000 0.552805 1.5977e-07 1.5977e-07 1.5977e-07
 BenchTimeMock 1 59500000 0.183366 3.08179e-09 3.08179e-09 3.08179e-09
 CCheckQueueSpeedPrevectorJob 1 122 0.503528 0.00412728 0.00412728 0.00412728
 CHACHA20_1MB 1 338 0.715717 0.00211751 0.00211751 0.00211751
 CHACHA20_256BYTES 1 1250000 0.650627 5.20502e-07 5.20502e-07 5.20502e-07
 CHACHA20_64BYTES 1 5000000 0.700708 1.40142e-07 1.40142e-07 1.40142e-07
 DeserializeAndCheckBlockTest 1 4500 0.515684 0.000114596 0.000114596 0.000114596
 DeserializeBlockTest 1 56500 0.720881 1.2759e-05 1.2759e-05 1.2759e-05
 FastRandom_1bit 1 195000000 0.457536 2.34634e-09 2.34634e-09 2.34634e-09
 FastRandom_32bit 1 34000000 0.366599 1.07823e-08 1.07823e-08 1.07823e-08
 LockedPool 1 58 0.0868 0.00149655 0.00149655 0.00149655
 PrevectorClearNontrivial 1 4042 0.0655623 1.62203e-05 1.62203e-05 1.62203e-05
 PrevectorClearTrivial 1 25300 0.385769 1.52478e-05 1.52478e-05 1.52478e-05
 PrevectorDestructorNontrivial 1 4058 0.0656059 1.61671e-05 1.61671e-05 1.61671e-05
 PrevectorDestructorTrivial 1 24400 0.371695 1.52334e-05 1.52334e-05 1.52334e-05
 PrevectorResizeNontrivial 1 4059 0.0653406 1.60977e-05 1.60977e-05 1.60977e-05
 PrevectorResizeTrivial 1 25200 0.383007 1.51987e-05 1.51987e-05 1.51987e-05
 RIPEMD160 1 275 0.799951 0.00290891 0.00290891 0.00290891
 SHA1 1 305 0.656839 0.00215357 0.00215357 0.00215357
 SHA256 1 156 0.815839 0.00522974 0.00522974 0.00522974
 SHA256_32b 1 2300000 0.824155 3.58328e-07 3.58328e-07 3.58328e-07
 SHA512 1 240 0.776275 0.00323448 0.00323448 0.00323448
 Sleep100ms 1 10 1.02837 0.102837 0.102837 0.102837
 Trig 1 12000000 0.155188 1.29323e-08 1.29323e-08 1.29323e-08

@furszy furszy force-pushed the 2021_improve_bench_framework branch from dbcaf86 to 79f8271 Compare October 1, 2021 21:37
furszy and others added 5 commits October 1, 2021 19:00
* inline performance critical code
* Average runtime is specified and used to calculate iterations.
* Console: show median of multiple runs
* plot: show box plot
* filter benchmarks
* specify scaling factor
* ignore src/test and src/bench in command line check script
* number of iterations instead of time
* Replaced runtime in BENCHMARK makro number of iterations.
* Added -? to bench_bitcoin
* Benchmark plotly.js URL, width, height can be customized
* Fixed incorrect precision warning
This benchmark's runtime was rather unpredictive on different machines, not really a useful benchmark.
@furszy furszy force-pushed the 2021_improve_bench_framework branch from 79f8271 to a5fcdac Compare October 1, 2021 22:01
@random-zebra
Copy link

Yeah, we can leave the numbers as they are. We won't find values that sort-of-work with all possible machines/systems anyway.
This is good to go imo.
But it should be rebased on master first (as there are new bench tests not included here).

@furszy
Copy link
Author

furszy commented Oct 4, 2021

i rebased it right after sharing my results (3 days ago) and squashed the new benchmarks inside 17c4bcf ;).

Copy link

@random-zebra random-zebra left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ACK a5fcdac

@furszy furszy merged commit c3028cc into PIVX-Project:master Oct 13, 2021
@furszy furszy deleted the 2021_improve_bench_framework branch November 29, 2022 15:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bench Benchmarking framework Upstream
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants