Combined hash and fill AES loop #166

SChernykh · 2019-12-01T10:44:44Z

Adds more parallelizm into AES loop so modern CPUs can take advantage of it. Also, scratchpad data moves between L1 and L3 caches only one time which saves time and energy per hash.

src/randomx.h

hyc · 2019-12-01T12:09:41Z

src/randomx.cpp

@@ -363,12 +363,12 @@ extern "C" {
 		machine->getFinalResult(output, RANDOMX_HASH_SIZE);
 	}

-	void randomx_calculate_hash_first(randomx_vm* machine, uint64_t (&tempHash)[8], const void* input, size_t inputSize) {
+	void randomx_calculate_hash_first(randomx_vm* machine, uint64_t *tempHash, const void* input, size_t inputSize) {


For typechecking purposes you could still typedef this param to uint64_t foo[8] and use the type in the parameter lists.

Yeah, uint64_t tempHash[8] could be more readable in the API and it decays into a pointer anyways.

And that would have prevented that sizeof() bug too.

Using the "first" and "next" hash function in randomx.cpp, the miner (mine which based on this repo, not xmirg) receives "Low difficulty share" from pool.

You have to be careful using these new functions, check how benchmark.cpp handles them and where it updates the nonce.

My miner works fine on this workflow

uint64_t tempHash[8]; while (nonce < noncesCount) { nonce = atomicNonce.fetch_add(1); store32(noncePtr, nonce); randomx_calculate_hash_first(vm, tempHash, blockTemplate, sizeof(blockTemplate)); randomx_calculate_hash_next(vm, tempHash, blockTemplate, sizeof(blockTemplate), &hash); result.xorWith(hash); }

But when the randomx_calculate_hash_first func is out of while loop, the miner gets the wrong result.

uint64_t tempHash[8]; store32(noncePtr, nonce); randomx_calculate_hash_first(vm, tempHash, blockTemplate, sizeof(blockTemplate)); while (nonce < noncesCount) { nonce = atomicNonce.fetch_add(1); store32(noncePtr, nonce); randomx_calculate_hash_next(vm, tempHash, blockTemplate, sizeof(blockTemplate), &hash); result.xorWith(hash); }

src/randomx.cpp

Adds more parallelizm into AES loop so modern CPUs can take advantage of it. Also, scratchpad data moves between L1 and L3 caches only one time which saves time and energy per hash.

Combined hash and fill AES loop

b6d2797

Adds more parallelizm into AES loop so modern CPUs can take advantage of it. Also, scratchpad data moves between L1 and L3 caches only one time which saves time and energy per hash.

tevador reviewed Dec 1, 2019

View reviewed changes

src/randomx.h Outdated Show resolved Hide resolved

Removed C++ code from C API

a76ac01

hyc reviewed Dec 1, 2019

View reviewed changes

tevador reviewed Dec 1, 2019

View reviewed changes

src/randomx.cpp Outdated Show resolved Hide resolved

src/randomx.cpp Outdated Show resolved Hide resolved

Fixed incorrect sizeof

5f05388

tevador approved these changes Dec 1, 2019

View reviewed changes

tevador merged commit 219c02e into tevador:master Dec 1, 2019

tevador mentioned this pull request Jul 4, 2020

Update RandomX to v1.1.8 monero-project/monero#6698

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Combined hash and fill AES loop #166

Combined hash and fill AES loop #166

SChernykh commented Dec 1, 2019

hyc Dec 1, 2019

tevador Dec 1, 2019

hyc Dec 1, 2019

c0mm4nd Dec 1, 2019

SChernykh Dec 1, 2019

c0mm4nd Dec 2, 2019

Combined hash and fill AES loop #166

Combined hash and fill AES loop #166

Conversation

SChernykh commented Dec 1, 2019

hyc Dec 1, 2019

Choose a reason for hiding this comment

tevador Dec 1, 2019

Choose a reason for hiding this comment

hyc Dec 1, 2019

Choose a reason for hiding this comment

c0mm4nd Dec 1, 2019

Choose a reason for hiding this comment

SChernykh Dec 1, 2019

Choose a reason for hiding this comment

c0mm4nd Dec 2, 2019

Choose a reason for hiding this comment