Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

nonce iteration optimization #1827

Merged
merged 2 commits into from Sep 10, 2020
Merged

nonce iteration optimization #1827

merged 2 commits into from Sep 10, 2020

Conversation

ghost
Copy link

@ghost ghost commented Sep 9, 2020

std::mutex replaced with std::atomic<uint64_t>::fetch_add
correct nonce iteration for the following cases:
  uint32_t, with_nicehash:
    [0 - 0xFFFFFF]
  uint32_t, without_nicehash:
    [0 - 0xFFFFFFFF]
  uint64_t, without_nicehash:
    [0 - 0x7FFFFFFFFFFFFFFF]
hashrate difference in practise for rx/0 on 4 cores Intel CPU:
  >>> '{:.2f}%'.format((1907.8 - 1904.9) / 1904.9 * 100)
  '0.15%'

https://github.com/cohcho/xmrig/commits/nonce_iteration

shell code for detailed perforamance comparison
%:commits=(nonce_iteration~2 nonce_iteration)
for commit in ${commits}; do;
        git checkout --detach -f $commit
        echo "\nCOMMIT:$commit:$(git rev-parse $commit)"
        b=release;
        B=build/$(git rev-parse HEAD)/$b;
        rm -fr $B
        cmake --log-level=ERROR -S . -B $B -GNinja -DCMAKE_CXX_FLAGS="-march=native"  -DCMAKE_C_FLAGS="-march=native"  -DCMAKE_BUILD_TYPE=$b -DWITH_TLS=0 -DWITH_CUDA=0 -DWITH_OPENCL=0 -DBUILD_STATIC=0 -DWITH_HTTP=0 -DWITH_HWLOC=0 -DWITH_MSR=0 -DWITH_DEBUG_LOG=1 | grep -v 'written to'
        ninja -C $B
        echo "\nTEST:nonce_iteration_perf"
        $B/nonce_iteration_perf
        echo "\nTEST:nonce_iteration"
        $B/nonce_iteration
        echo "\nTEST:xmrig"
        (( $B/xmrig-notls --user-agent '' --url=stratum+tcp://127.0.0.1:19999  --no-color -u '' -p '' --randomx-wrmsr=-1 -t 4 --cpu-no-yield --verbose | grep -v -e CPU -e LIBS -e ABOUT -e MEMORY) &)
        sleep 130
        kill -9 `pidof xmrig-notls`
done;
performance before:
HEAD is now at 8b93c2ff nonce iteration perf test

COMMIT:nonce_iteration~2:8b93c2ff79049ffa93ebc5e636873160d63010a8
-- Configuring done
-- Generating done
ninja: Entering directory `build/8b93c2ff79049ffa93ebc5e636873160d63010a8/release'
[120/164] Building CXX object CMakeFiles/xmrig-notls.dir/src/crypto/rx/RxConfig.cpp.o
../../../src/crypto/rx/RxConfig.cpp: In member function ‘bool xmrig::RxConfig::read(const Value&)’:
../../../src/crypto/rx/RxConfig.cpp:124:19: warning: comparison of integer expressions of different signedness: ‘const int’ and ‘xmrig::RxConfig::ScratchpadPrefetchMode’ [-Wsign-compare]
  124 |         if ((mode >= ScratchpadPrefetchOff) && (mode < ScratchpadPrefetchMax)) {
      |              ~~~~~^~~~~~~~~~~~~~~~~~~~~~~~
../../../src/crypto/rx/RxConfig.cpp:124:54: warning: comparison of integer expressions of different signedness: ‘const int’ and ‘xmrig::RxConfig::ScratchpadPrefetchMode’ [-Wsign-compare]
  124 |         if ((mode >= ScratchpadPrefetchOff) && (mode < ScratchpadPrefetchMax)) {
      |                                                 ~~~~~^~~~~~~~~~~~~~~~~~~~~~~
[164/164] Linking CXX executable xmrig-notls

TEST:nonce_iteration_perf
nicehash[0] nonceSize[8] threads[4] rounds[0000000000000001] round_size[1] max_counter[0000000000cc9d29] dt[3.991]s delay_per_nonce(WorkerJob<1>::nextRound)[297.651303319]ns
nicehash[0] nonceSize[8] threads[4] rounds[0000000000000010] round_size[1] max_counter[00000000059c5a73] dt[3.997]s delay_per_nonce(WorkerJob<1>::nextRound)[42.458350698]ns
nicehash[0] nonceSize[8] threads[4] rounds[0000000000000100] round_size[1] max_counter[0000000019d0b171] dt[3.980]s delay_per_nonce(WorkerJob<1>::nextRound)[9.189370760]ns
nicehash[0] nonceSize[8] threads[4] rounds[0000000000001000] round_size[1] max_counter[000000001b3125f6] dt[4.000]s delay_per_nonce(WorkerJob<1>::nextRound)[8.767139550]ns
nicehash[0] nonceSize[8] threads[4] rounds[0000000000010000] round_size[1] max_counter[000000001b48735c] dt[4.000]s delay_per_nonce(WorkerJob<1>::nextRound)[8.738758498]ns

nicehash[0] nonceSize[4] threads[4] rounds[0000000000000001] round_size[1] max_counter[0000000000cd8cbb] dt[3.992]s delay_per_nonce(WorkerJob<1>::nextRound)[296.361032928]ns
nicehash[0] nonceSize[4] threads[4] rounds[0000000000000010] round_size[1] max_counter[0000000005a5fc9d] dt[4.008]s delay_per_nonce(WorkerJob<1>::nextRound)[42.292185469]ns
nicehash[0] nonceSize[4] threads[4] rounds[0000000000000100] round_size[1] max_counter[000000001be4d9a8] dt[3.998]s delay_per_nonce(WorkerJob<1>::nextRound)[8.542867481]ns
nicehash[0] nonceSize[4] threads[4] rounds[0000000000001000] round_size[1] max_counter[000000001d5df45b] dt[4.001]s delay_per_nonce(WorkerJob<1>::nextRound)[8.119979222]ns
nicehash[0] nonceSize[4] threads[4] rounds[0000000000010000] round_size[1] max_counter[000000001d778e98] dt[4.000]s delay_per_nonce(WorkerJob<1>::nextRound)[8.091608852]ns

nicehash[1] nonceSize[4] threads[4] rounds[0000000000000001] round_size[1] max_counter[0000000000ca7876] dt[4.012]s delay_per_nonce(WorkerJob<1>::nextRound)[302.333904158]ns
nicehash[1] nonceSize[4] threads[4] rounds[0000000000000010] round_size[1] max_counter[0000000005b8aa11] dt[3.988]s delay_per_nonce(WorkerJob<1>::nextRound)[41.547965183]ns
nicehash[1] nonceSize[4] threads[4] rounds[0000000000000100] round_size[1] max_counter[000000002416bed8] dt[4.001]s delay_per_nonce(WorkerJob<1>::nextRound)[6.608345180]ns
nicehash[1] nonceSize[4] threads[4] rounds[0000000000001000] round_size[1] max_counter[000000002695be7e] dt[4.000]s delay_per_nonce(WorkerJob<1>::nextRound)[6.178944741]ns
nicehash[1] nonceSize[4] threads[4] rounds[0000000000010000] round_size[1] max_counter[0000000026c24528] dt[4.000]s delay_per_nonce(WorkerJob<1>::nextRound)[6.151469647]ns


TEST:nonce_iteration
start_nonce[0000000000000000] reserve_count[0000000018c7fac3]
nonce[ff00000000000000] counter[00000000ff000000]
error: too many nonces

TEST:xmrig
 * HUGE PAGES   supported
 * 1GB PAGES    disabled
                L2:1.0 MB L3:6.0 MB 4C/4T
 * DONATE       1%
 * ASSEMBLY     auto:intel
 * POOL #1      stratum+tcp://127.0.0.1:19999 algo auto
[2020-09-09 11:38:30.397] POOLS --------------------------------------------------------------------
[2020-09-09 11:38:30.397] url:       stratum+tcp://127.0.0.1:19999
[2020-09-09 11:38:30.397] host:      127.0.0.1
[2020-09-09 11:38:30.397] port:      19999
[2020-09-09 11:38:30.397] user:
[2020-09-09 11:38:30.397] pass:
[2020-09-09 11:38:30.397] rig-id     (null)
[2020-09-09 11:38:30.397] algo:      invalid
[2020-09-09 11:38:30.397] nicehash:  0
[2020-09-09 11:38:30.397] keepAlive: 0
[2020-09-09 11:38:30.397] --------------------------------------------------------------------------
 * COMMANDS     'h' hashrate, 'p' pause, 'r' resume, 's' results, 'c' connection
[2020-09-09 11:38:30.398] [stratum+tcp://127.0.0.1:19999] state: "unconnected" -> "host-lookup"
[2020-09-09 11:38:30.398] [stratum+tcp://127.0.0.1:19999] state: "host-lookup" -> "connecting"
[2020-09-09 11:38:30.398] [stratum+tcp://127.0.0.1:19999] state: "connecting" -> "connected"
[2020-09-09 11:38:30.398] [stratum+tcp://127.0.0.1:19999] send (399 bytes): "{"id":1,"jsonrpc":"2.0","method":"login","params":{"login":null,"pass":null,"agent":"","algo":["cn/0","cn/1","cn/2","cn/r","cn/fast","cn/half","cn/xao","cn/rto","cn/rwz","cn/zls","cn/double","cn-lite/0","cn-lite/1","cn-heavy/0","cn-heavy/tube","cn-heavy/xhv","cn-pico","cn-pico/tlo","cn/ccx","rx/0","rx/wow","rx/loki","rx/arq","rx/sfx","rx/keva","argon2/chukwa","argon2/wrkz","astrobwt","kawpow"]}}"
[2020-09-09 11:38:30.398] [stratum+tcp://127.0.0.1:19999] received (439 bytes): "{"jsonrpc":"2.0","method":"job","params":{"id":"da28496a171a4d55991e9c35e0a6ad74","blob":"0c0cabbbabfa05b878bf8797292d8172c6bb812766ffa6bfb700191d3eefc90a1d0d23b4f08adf00000000e26e144243ecde5120ee78783a30808951151b12dfac1cced9de92b9775a12bd0e","algo":"rx/0","job_id":"9867a029251c49fe8b9d40c7002ae7c5","target":"01000000","height":2175340,"seed_hash":"6c97c86339b35052fa7f6812dc4ca72580929569af6e683d5470b5b2790922b0","next_seed_hash":""}}"
[2020-09-09 11:38:30.398]  net      new job from 127.0.0.1:19999 diff 4294M algo rx/0 height 2175340
[2020-09-09 11:38:30.398]  cpu      use argon2 implementation AVX2
[2020-09-09 11:38:30.398]  randomx  init dataset algo rx/0 (4 threads) seed 6c97c86339b35052...
[2020-09-09 11:38:30.517]  randomx  allocated 2336 MB (2080+256) huge pages 100% 1168/1168 +JIT (120 ms)
[2020-09-09 11:38:35.503]  randomx  dataset ready (4985 ms)
[2020-09-09 11:38:35.503]  cpu      use profile  *  (4 threads) scratchpad 2048 KB
[2020-09-09 11:38:35.564]  cpu      READY threads 4/4 (4) huge pages 100% 4/4 memory 8192 KB (61 ms)
[2020-09-09 11:39:35.899]  miner    speed 10s/60s/15m 1903.4 475.9 n/a H/s max 1905.4 H/s
[2020-09-09 11:40:36.296]  miner    speed 10s/60s/15m 1905.6 1904.9 n/a H/s max 1905.9 H/s
performance after:
Previous HEAD position was 8b93c2ff nonce iteration perf test
HEAD is now at 03e4fb1f fix nonce_iteration tests

COMMIT:nonce_iteration:03e4fb1fc24bb2191c8e9e4fce908efd262f8d91
-- Configuring done
-- Generating done
ninja: Entering directory `build/03e4fb1fc24bb2191c8e9e4fce908efd262f8d91/release'
[117/164] Building CXX object CMakeFiles/xmrig-notls.dir/src/crypto/rx/RxConfig.cpp.o
../../../src/crypto/rx/RxConfig.cpp: In member function ‘bool xmrig::RxConfig::read(const Value&)’:
../../../src/crypto/rx/RxConfig.cpp:124:19: warning: comparison of integer expressions of different signedness: ‘const int’ and ‘xmrig::RxConfig::ScratchpadPrefetchMode’ [-Wsign-compare]
  124 |         if ((mode >= ScratchpadPrefetchOff) && (mode < ScratchpadPrefetchMax)) {
      |              ~~~~~^~~~~~~~~~~~~~~~~~~~~~~~
../../../src/crypto/rx/RxConfig.cpp:124:54: warning: comparison of integer expressions of different signedness: ‘const int’ and ‘xmrig::RxConfig::ScratchpadPrefetchMode’ [-Wsign-compare]
  124 |         if ((mode >= ScratchpadPrefetchOff) && (mode < ScratchpadPrefetchMax)) {
      |                                                 ~~~~~^~~~~~~~~~~~~~~~~~~~~~~
[164/164] Linking CXX executable xmrig-notls

TEST:nonce_iteration_perf
nicehash[0] nonceSize[8] threads[4] rounds[0000000000000001] round_size[1] max_counter[000000000268436e] dt[4.000]s delay_per_nonce(WorkerJob<1>::nextRound)[99.046298703]ns
nicehash[0] nonceSize[8] threads[4] rounds[0000000000000010] round_size[1] max_counter[000000002109eb98] dt[4.005]s delay_per_nonce(WorkerJob<1>::nextRound)[7.225509608]ns
nicehash[0] nonceSize[8] threads[4] rounds[0000000000000100] round_size[1] max_counter[0000000032152e6c] dt[3.978]s delay_per_nonce(WorkerJob<1>::nextRound)[4.734182419]ns
nicehash[0] nonceSize[8] threads[4] rounds[0000000000001000] round_size[1] max_counter[0000000034768cc4] dt[3.998]s delay_per_nonce(WorkerJob<1>::nextRound)[4.542791616]ns
nicehash[0] nonceSize[8] threads[4] rounds[0000000000010000] round_size[1] max_counter[00000000349ec827] dt[4.000]s delay_per_nonce(WorkerJob<1>::nextRound)[4.531211349]ns

nicehash[0] nonceSize[4] threads[4] rounds[0000000000000001] round_size[1] max_counter[00000000026cd95a] dt[3.994]s delay_per_nonce(WorkerJob<1>::nextRound)[98.151069203]ns
nicehash[0] nonceSize[4] threads[4] rounds[0000000000000010] round_size[1] max_counter[0000000021253690] dt[4.001]s delay_per_nonce(WorkerJob<1>::nextRound)[7.194285982]ns
nicehash[0] nonceSize[4] threads[4] rounds[0000000000000100] round_size[1] max_counter[00000000326bcec9] dt[4.000]s delay_per_nonce(WorkerJob<1>::nextRound)[4.728174531]ns
nicehash[0] nonceSize[4] threads[4] rounds[0000000000001000] round_size[1] max_counter[00000000347e55c2] dt[4.000]s delay_per_nonce(WorkerJob<1>::nextRound)[4.541810412]ns
nicehash[0] nonceSize[4] threads[4] rounds[0000000000010000] round_size[1] max_counter[00000000349beea8] dt[3.999]s delay_per_nonce(WorkerJob<1>::nextRound)[4.531002901]ns

nicehash[1] nonceSize[4] threads[4] rounds[0000000000000001] round_size[1] max_counter[0000000001ff43b8] dt[3.998]s delay_per_nonce(WorkerJob<1>::nextRound)[119.311305670]ns
nicehash[1] nonceSize[4] threads[4] rounds[0000000000000010] round_size[1] max_counter[000000002140de85] dt[3.999]s delay_per_nonce(WorkerJob<1>::nextRound)[7.168400542]ns
nicehash[1] nonceSize[4] threads[4] rounds[0000000000000100] round_size[1] max_counter[00000000326b4b24] dt[4.000]s delay_per_nonce(WorkerJob<1>::nextRound)[4.728578289]ns
nicehash[1] nonceSize[4] threads[4] rounds[0000000000001000] round_size[1] max_counter[00000000346dc888] dt[3.996]s delay_per_nonce(WorkerJob<1>::nextRound)[4.542977947]ns
nicehash[1] nonceSize[4] threads[4] rounds[0000000000010000] round_size[1] max_counter[00000000349b7620] dt[4.001]s delay_per_nonce(WorkerJob<1>::nextRound)[4.532919048]ns


TEST:nonce_iteration
start_nonce[0000000000000000] reserve_count[000000005e2a24a0]
nonce[7f0000005e2a249f] counter[000000007f000000]

TEST:xmrig
 * HUGE PAGES   supported
 * 1GB PAGES    disabled
                L2:1.0 MB L3:6.0 MB 4C/4T
 * DONATE       1%
 * ASSEMBLY     auto:intel
 * POOL #1      stratum+tcp://127.0.0.1:19999 algo auto
[2020-09-09 11:46:14.574] POOLS --------------------------------------------------------------------
[2020-09-09 11:46:14.574] url:       stratum+tcp://127.0.0.1:19999
[2020-09-09 11:46:14.574] host:      127.0.0.1
[2020-09-09 11:46:14.574] port:      19999
[2020-09-09 11:46:14.574] user:
[2020-09-09 11:46:14.574] pass:
[2020-09-09 11:46:14.574] rig-id     (null)
[2020-09-09 11:46:14.574] algo:      invalid
[2020-09-09 11:46:14.574] nicehash:  0
[2020-09-09 11:46:14.574] keepAlive: 0
[2020-09-09 11:46:14.574] --------------------------------------------------------------------------
 * COMMANDS     'h' hashrate, 'p' pause, 'r' resume, 's' results, 'c' connection
[2020-09-09 11:46:14.574] [stratum+tcp://127.0.0.1:19999] state: "unconnected" -> "host-lookup"
[2020-09-09 11:46:14.574] [stratum+tcp://127.0.0.1:19999] state: "host-lookup" -> "connecting"
[2020-09-09 11:46:14.574] [stratum+tcp://127.0.0.1:19999] state: "connecting" -> "connected"
[2020-09-09 11:46:14.574] [stratum+tcp://127.0.0.1:19999] send (399 bytes): "{"id":1,"jsonrpc":"2.0","method":"login","params":{"login":null,"pass":null,"agent":"","algo":["cn/0","cn/1","cn/2","cn/r","cn/fast","cn/half","cn/xao","cn/rto","cn/rwz","cn/zls","cn/double","cn-lite/0","cn-lite/1","cn-heavy/0","cn-heavy/tube","cn-heavy/xhv","cn-pico","cn-pico/tlo","cn/ccx","rx/0","rx/wow","rx/loki","rx/arq","rx/sfx","rx/keva","argon2/chukwa","argon2/wrkz","astrobwt","kawpow"]}}"
[2020-09-09 11:46:14.574] [stratum+tcp://127.0.0.1:19999] received (439 bytes): "{"jsonrpc":"2.0","method":"job","params":{"id":"da28496a171a4d55991e9c35e0a6ad74","blob":"0c0cabbbabfa05b878bf8797292d8172c6bb812766ffa6bfb700191d3eefc90a1d0d23b4f08adf00000000e26e144243ecde5120ee78783a30808951151b12dfac1cced9de92b9775a12bd0e","algo":"rx/0","job_id":"9867a029251c49fe8b9d40c7002ae7c5","target":"01000000","height":2175340,"seed_hash":"6c97c86339b35052fa7f6812dc4ca72580929569af6e683d5470b5b2790922b0","next_seed_hash":""}}"
[2020-09-09 11:46:14.574]  net      new job from 127.0.0.1:19999 diff 4294M algo rx/0 height 2175340
[2020-09-09 11:46:14.574]  cpu      use argon2 implementation AVX2
[2020-09-09 11:46:14.574]  randomx  init dataset algo rx/0 (4 threads) seed 6c97c86339b35052...
[2020-09-09 11:46:14.695]  randomx  allocated 2336 MB (2080+256) huge pages 100% 1168/1168 +JIT (121 ms)
[2020-09-09 11:46:19.685]  randomx  dataset ready (4989 ms)
[2020-09-09 11:46:19.685]  cpu      use profile  *  (4 threads) scratchpad 2048 KB
[2020-09-09 11:46:19.746]  cpu      READY threads 4/4 (4) huge pages 100% 4/4 memory 8192 KB (61 ms)
[2020-09-09 11:47:20.079]  miner    speed 10s/60s/15m 1907.1 476.7 n/a H/s max 1907.8 H/s
[2020-09-09 11:48:20.476]  miner    speed 10s/60s/15m 1908.7 1907.8 n/a H/s max 1908.9 H/s

efficient and correct nonce iteration without duplicates
@xmrig xmrig added this to the v6 milestone Sep 9, 2020
@SChernykh
Copy link
Contributor

KawPow is broken. It uses 64-bit nonce and pool sets starting nonce value with 1, 2 or 3 highest bytes set to specific values. Current code sets all high bytes to 0.

@SChernykh
Copy link
Contributor

KawPow works now. I'll run performance tests again on my side.

Copy link
Contributor

@SChernykh SChernykh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Works with RandomX and KawPow now. Performance is kind of the same, maybe slightly better than original code.

@xmrig xmrig merged commit adf833b into xmrig:dev Sep 10, 2020
@xmrig
Copy link
Owner

xmrig commented Sep 10, 2020

Thank you.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants