Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fail to build on PPC64 #3826

Closed
hegjon opened this issue May 17, 2018 · 103 comments
Closed

Fail to build on PPC64 #3826

hegjon opened this issue May 17, 2018 · 103 comments

Comments

@hegjon
Copy link
Contributor

hegjon commented May 17, 2018

Seems like this is the cause:

cc1: error: unrecognized command line option '-maes'
cc1: error: unrecognized command line option '-march=native'

Full log: https://kojipkgs.fedoraproject.org//work/tasks/3387/27023387/build.log

@nioroso-x3
Copy link
Contributor

Are you on big endian or little endian?
Some months ago I managed to modify the CMakeLists.txt enough to get it to compile, but in big endian ppc64 there are just too many endiannes problems.
The daemon is unable to connect to anything and the cli wallet creates invalid keys.

@hegjon
Copy link
Contributor Author

hegjon commented May 18, 2018

Big Endian.

CPU info:

Architecture:        ppc64
CPU op-mode(s):      32-bit, 64-bit
Byte Order:          Big Endian
CPU(s):              4
On-line CPU(s) list: 0-3
Thread(s) per core:  1
Core(s) per socket:  1
Socket(s):           4
NUMA node(s):        1
Model:               2.1 (pvr 004b 0201)
Model name:          POWER8 (architected), altivec supported
Hypervisor vendor:   KVM
Virtualization type: para
L1d cache:           64K
L1i cache:           32K
NUMA node0 CPU(s):   0-3

@hegjon
Copy link
Contributor Author

hegjon commented May 18, 2018

Some months ago I managed to modify the CMakeLists.txt enough to get it to compile, but in big endian ppc64 there are just too many endiannes problems.

Would it be possible to white list those platforms that support -maes instead of having a list of the known platforms that does not support the flag?

@hyc
Copy link
Collaborator

hyc commented May 18, 2018

Yeah I think there's far too much dependency on little-endian math scattered throughout the code. Feel free to track all occurrences down and submit patches to fix them.

@nioroso-x3
Copy link
Contributor

I created a pull request for the modified CMakeLists.
It compiles fine in gentoo ppc64, and almost in macos leopard.
Any pointers on where little endian math is most used? Most of the crypto source files seem to handle endianness fine.

@hegjon
Copy link
Contributor Author

hegjon commented May 21, 2018

I created a pull request for the modified CMakeLists.

Thanks.

Why is it needed to set the -mcpu flags?

@nioroso-x3
Copy link
Contributor

nioroso-x3 commented May 22, 2018

Why is it needed to set the -mcpu flags?

-mcpu=native for some reason fails with G5 or ppc970 gcc, so I just set it manually to the lowest common denominator, a G4 for 32 bit, G5 for 64 bit and Power8 for 64 bit little endian.

@jtgrassie
Copy link
Contributor

@nioroso-x3

Any pointers on where little endian math is most used? Most of the crypto source files seem to handle endianness fine.

See:
https://github.com/monero-project/monero/blob/master/src/crypto/crypto-ops.c

A lot of the math in there is LE dependent.

@jtgrassie
Copy link
Contributor

Also, the crypto implementations in the src/crypto directory are the x86 implementations, for example, chacha has a different implementation for PowerPC: https://cr.yp.to/chacha.html.

@moneromooo-monero
Copy link
Collaborator

Is it any better nowadays ? That particular thing got fixed a while ago.

@nioroso-x3
Copy link
Contributor

Is it any better nowadays ? That particular thing got fixed a while ago.

Monero is endian agnostic now? I haven't booted my power mac in a while, so I havent tested anything.

@moneromooo-monero
Copy link
Collaborator

Well, we won't know for sure until someone reports another problem :)

@nioroso-x3
Copy link
Contributor

Just compiled latest git, monerod fails all handshakes and monero-wallet-cli creates invalid keys and is unable to open valid wallets.

@moneromooo-monero
Copy link
Collaborator

Can you start with unit_tests please ?

@moneromooo-monero
Copy link
Collaborator

And crypto tests. "make release-test" should run them all.

@nioroso-x3
Copy link
Contributor

This is the output:
Running tests...
Test project /home/jribeiro/Development/monero/build/Linux/master/release
Start 1: hash-target
1/15 Test #1: hash-target ...................... Passed 0.51 sec
Start 2: core_tests
2/15 Test #2: core_tests .......................***Exception: Other7354.57 sec
Start 3: cncrypto
3/15 Test #3: cncrypto .........................***Failed 52.76 sec
Start 4: unit_tests
4/15 Test #4: unit_tests .......................***Exception: Other335.84 sec
Start 5: difficulty
5/15 Test #5: difficulty ....................... Passed 0.23 sec
Start 6: hash-fast
6/15 Test #6: hash-fast ........................ Passed 0.08 sec
Start 7: hash-slow
7/15 Test #7: hash-slow ........................***Failed 1.36 sec
Start 8: hash-slow-1
8/15 Test #8: hash-slow-1 ......................***Failed 1.61 sec
Start 9: hash-slow-2
9/15 Test #9: hash-slow-2 ......................***Failed 4.72 sec
Start 10: hash-tree
10/15 Test #10: hash-tree ........................ Passed 0.19 sec
Start 11: hash-extra-blake
11/15 Test #11: hash-extra-blake ................. Passed 0.04 sec
Start 12: hash-extra-groestl
12/15 Test #12: hash-extra-groestl ...............***Failed 0.75 sec
Start 13: hash-extra-jh
13/15 Test #13: hash-extra-jh .................... Passed 0.03 sec
Start 14: hash-extra-skein
14/15 Test #14: hash-extra-skein ................. Passed 0.04 sec
Start 15: hash-variant2-int-sqrt
15/15 Test #15: hash-variant2-int-sqrt ...........***Exception: Other166.52 sec

47% tests passed, 8 tests failed out of 15

Total Test time (real) = 7919.46 sec

The following tests FAILED:
2 - core_tests (OTHER_FAULT)
3 - cncrypto (Failed)
4 - unit_tests (OTHER_FAULT)
7 - hash-slow (Failed)
8 - hash-slow-1 (Failed)
9 - hash-slow-2 (Failed)
12 - hash-extra-groestl (Failed)
15 - hash-variant2-int-sqrt (OTHER_FAULT)
Errors while running CTest

The exceptions are because I killed the tests that were taking too long.
Attached is the core_tests log.

core_tests.log.gz

@moneromooo-monero
Copy link
Collaborator

Can you please run and attach the output of these two commands:

./build/Linux/master/release/tests/crypto/cncrypto-tests tests/crypto/tests.txt
./build/Linux/master/release/tests/unit_tests/unit_tests

@nioroso-x3
Copy link
Contributor

Ok, done.
unit_tests.log.gz
cncrypt-tests.log.gz

@moneromooo-monero
Copy link
Collaborator

Are you running with commit hash aa1d321 ?

@nioroso-x3
Copy link
Contributor

The short HEAD outputs 5c85da5, I cloned the repo yesterday.

@moneromooo-monero
Copy link
Collaborator

The main problem seems to be Keccak being wrong. Which is a shame since the last change to Keccak was to allegedly make it work on big endian architectures :)
That is used for the PRNG, which in turn makes most of the crypto tests fail.

@moneromooo-monero
Copy link
Collaborator

As for the IP errors, this should fix it:

diff --git a/contrib/epee/include/net/local_ip.h b/contrib/epee/include/net/local_ip.h
index 52c5855b9..90e6a07b0 100644
--- a/contrib/epee/include/net/local_ip.h
+++ b/contrib/epee/include/net/local_ip.h
@@ -27,6 +27,15 @@
 
 #pragma once
 
+namespace
+{
+  static inline uint32_t leip(uint32_t x)
+  {
+    x = ((x & 0x00ff00ff) << 8) | ((x & 0xff00ff00) >> 8);
+    return (x << 16) | (x >> 16);
+  }
+}
+
 namespace epee
 {
   namespace net_utils
@@ -34,6 +43,7 @@ namespace epee
     inline
     bool is_ip_local(uint32_t ip)
     {
+      ip = leip(ip);
       /*
       local ip area
       10.0.0.0 <97> 10.255.255.255 
@@ -57,6 +67,7 @@ namespace epee
     inline
     bool is_ip_loopback(uint32_t ip)
     {
+      ip = leip(ip);
       if( (ip | 0xffffff00) == 0xffffff7f)
         return true;
       //MAKE_IP

@moneromooo-monero
Copy link
Collaborator

moneromooo-monero commented Oct 20, 2018

This might fix the keccak part (edited, needs more parentheses):

diff --git a/src/crypto/keccak.c b/src/crypto/keccak.c
index b5946036e..ee20adb2d 100644
--- a/src/crypto/keccak.c
+++ b/src/crypto/keccak.c
@@ -145,7 +145,7 @@ void keccak1600(const uint8_t *in, size_t inlen, uint8_t *md)
 #define IS_ALIGNED_64(p) (0 == (7 & ((const char*)(p) - (const char*)0)))
 #define KECCAK_PROCESS_BLOCK(st, block) { \
     for (int i_ = 0; i_ < KECCAK_WORDS; i_++){ \
-        ((st))[i_] ^= ((block))[i_]; \
+        ((st))[i_] ^= swap64le(((block))[i_]); \
     }; \
     keccakf(st, KECCAK_ROUNDS); }
 

@moneromooo-monero
Copy link
Collaborator

The IP stuff is wrong, the IPs should be in network byte order. If they're not, then somhting else is probably wrong.

@moneromooo-monero
Copy link
Collaborator

Or maybe not. That will need thinking.

@nioroso-x3
Copy link
Contributor

The patches didnt change much.

Test project /home/jribeiro/Development/monero/build/Linux/master/release
Start 1: hash-target
1/15 Test #1: hash-target ...................... Passed 0.42 sec
Start 2: core_tests
2/15 Test #2: core_tests .......................***Exception: Other5903.01 sec
Start 3: cncrypto
3/15 Test #3: cncrypto .........................***Failed 67.77 sec
Start 4: unit_tests
4/15 Test #4: unit_tests .......................***Failed 849.29 sec
Start 5: difficulty
5/15 Test #5: difficulty ....................... Passed 0.24 sec
Start 6: hash-fast
6/15 Test #6: hash-fast ........................ Passed 0.08 sec
Start 7: hash-slow
7/15 Test #7: hash-slow ........................***Failed 1.37 sec
Start 8: hash-slow-1
8/15 Test #8: hash-slow-1 ......................***Failed 1.79 sec
Start 9: hash-slow-2
9/15 Test #9: hash-slow-2 ......................***Failed 4.75 sec
Start 10: hash-tree
10/15 Test #10: hash-tree ........................ Passed 0.03 sec
Start 11: hash-extra-blake
11/15 Test #11: hash-extra-blake ................. Passed 0.04 sec
Start 12: hash-extra-groestl
12/15 Test #12: hash-extra-groestl ...............***Failed 0.68 sec
Start 13: hash-extra-jh
13/15 Test #13: hash-extra-jh .................... Passed 0.03 sec
Start 14: hash-extra-skein
14/15 Test #14: hash-extra-skein ................. Passed 0.04 sec
Start 15: hash-variant2-int-sqrt
15/15 Test #15: hash-variant2-int-sqrt ........... Passed 1178.90 sec

cncrypto-tests.log.gz
unit_tests.log.gz

@moneromooo-monero
Copy link
Collaborator

Well, you're going to have to debug it I'm afraid, or wait for someone else with big endian hardware to have a look. Since it was recently "fixed" for big endian, it should be mostly there already.

@nioroso-x3
Copy link
Contributor

I have zero knowledge of how monero works, what should I look into first?
Get the functions of keccak.c to have the same output as little endian machines?

@moneromooo-monero
Copy link
Collaborator

Since the (primary) culprit seems to be Keccak, you don't need to know how monero works, just how to build it. The Keccak output for a given input should indeed be identical on big and little endian archs. I suspect once Keccak's fixed, a lot of stuff will start working. We can then see what's next in line.
Thanks

@jtgrassie
Copy link
Contributor

Here is the failing core_tests log (Ubuntu 16 PowerPC BE 32bit).
LastTest.log.tar.gz

@nioroso-x3
Copy link
Contributor

core_tests also gets stuck in Fedora 25 ppc64.
That also uses gcc 6.4, I'll test the newest gcc just in case

@nioroso-x3
Copy link
Contributor

Core tests passes completely when using llvm3.9 on fedora 25 and llvm (clang) 7.0 in gentoo, looks like gcc is buggy for ppc64 lol

First log is for gentoo in release, second for fedora in debug, looks like at the end there is a double free error, but everything passes for core_tests.

core_tests_llvm7_release.log.gz

core_tests_llvm39_debug.log.gz

make_f25_llvm39.log.gz

@moneromooo-monero
Copy link
Collaborator

And now... does it sync the blockchain ? :)

@nioroso-x3
Copy link
Contributor

Nope, its not syncing.
bitmonero_gentoo.log.gz

@moneromooo-monero
Copy link
Collaborator

#4866

@nioroso-x3
Copy link
Contributor

New bitmonero log after 4866
What does that patch fix?

bitmonero.tar.gz

@moneromooo-monero
Copy link
Collaborator

It fixes values read/written from/to the network differently on little endian and big endian archs.

@moneromooo-monero
Copy link
Collaborator

And I see at least another one that needs fixing.

@moneromooo-monero
Copy link
Collaborator

I updated 4866,

@nioroso-x3
Copy link
Contributor

New log
bitmonero.tar.gz

@moneromooo-monero
Copy link
Collaborator

I found more places that need endian fixing. I'll post when I've fixed all I see.

@moneromooo-monero
Copy link
Collaborator

4866 updated again.

@nioroso-x3
Copy link
Contributor

New log, also unit_tests is getting stuck after mnemonics test, core_tests passes.

bitmonero.log.gz
unit_tests.log.gz

@moneromooo-monero
Copy link
Collaborator

We can receive packet :)
Looks like the payload is also endian dependent though. Not fun.

@nioroso-x3
Copy link
Contributor

Will this bug be fixed? I'm willing to provide ssh access to a machine for testing.

@moneromooo-monero
Copy link
Collaborator

I can debug as a background task if I have access to such a machine.

@nioroso-x3
Copy link
Contributor

Post a ssh public key, I can give you access to my G5 with gentoo. It has clang-8 and gcc-8.2.
You'll have access at monerodevs@nerv-la.ddns.net:223

@moneromooo-monero
Copy link
Collaborator

ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQDDEGd0x3Tkn/Ht1gKZlQY2T0oEpPEenGGPqzPMHMvHJ8S/PLbkVAFfNLDuBdshnm3r/4eMYspBO8/Pa55ICrURwhLk/aQ5vuNwvoReSib5omItheNM5ALWZpVfNTBZct1raryBIaDOUn9SvfLhZzhKojRSrFF4P5Nitn4aMjcGiKklIdFluQ0cIOmA4yY2DY8x6NPECVtPsJrwc89CMlPtlXNd8TgAWy8PvEQb7H9T6XaW4Mn1fGwT52+70q/Eyo4iNrGuLx74obvtAd3nCugTJykE1dXIiQQ3FtmtPqZCOQfaAVteKWvUPWYs4yc+b7LCqf06YvFhw+FfkS04F0gV user@host

@nioroso-x3
Copy link
Contributor

You should have access now.

@nioroso-x3
Copy link
Contributor

PPC64le (little endian) is failing some tests:
Test project /monero/build/Linux/master/debug
Start 1: hash-target
1/19 Test #1: hash-target ...................... Passed 2.23 sec
Start 2: core_tests
2/19 Test #2: core_tests .......................***Failed 12970.89 sec
Start 3: cncrypto
3/19 Test #3: cncrypto ......................... Passed 19.66 sec
Start 4: cnv4-jit
4/19 Test #4: cnv4-jit ......................... Passed 1210.97 sec
Start 5: unit_tests
5/19 Test #5: unit_tests .......................***Failed 896.62 sec
Start 6: difficulty
6/19 Test #6: difficulty ....................... Passed 0.09 sec
Start 7: wide_difficulty
7/19 Test #7: wide_difficulty ..................***Failed 0.03 sec
Start 8: block_weight
8/19 Test #8: block_weight ..................... Passed 111.12 sec
Start 9: hash-fast
9/19 Test #9: hash-fast ........................ Passed 0.06 sec
Start 10: hash-slow
10/19 Test #10: hash-slow ........................ Passed 0.62 sec
Start 11: hash-slow-1
11/19 Test #11: hash-slow-1 ...................... Passed 0.69 sec
Start 12: hash-slow-2
12/19 Test #12: hash-slow-2 ...................... Passed 1.71 sec
Start 13: hash-slow-4
13/19 Test #13: hash-slow-4 ...................... Passed 5.99 sec
Start 14: hash-tree
14/19 Test #14: hash-tree ........................ Passed 0.02 sec
Start 15: hash-extra-blake
15/19 Test #15: hash-extra-blake ................. Passed 0.04 sec
Start 16: hash-extra-groestl
16/19 Test #16: hash-extra-groestl ............... Passed 0.05 sec
Start 17: hash-extra-jh
17/19 Test #17: hash-extra-jh .................... Passed 0.03 sec
Start 18: hash-extra-skein
18/19 Test #18: hash-extra-skein ................. Passed 0.02 sec
Start 19: hash-variant2-int-sqrt
19/19 Test #19: hash-variant2-int-sqrt ........... Passed 473.87 sec

I couldnt find the .log for the wide-difficulty test, what is the filename?
core_and_unit_tests.zip

@nioroso-x3
Copy link
Contributor

hash-slow-2 and hash-slow-4 are failing in big endian ppc64
Test project /home/jribeiro/Development/monero-ori/build/Linux/master/debug
Start 1: hash-target
1/19 Test #1: hash-target ...................... Passed 2.34 sec
Start 2: core_tests
2/19 Test #2: core_tests .......................***Failed 686.95 sec
Start 3: cncrypto
3/19 Test #3: cncrypto ......................... Passed 41.94 sec
Start 4: cnv4-jit
4/19 Test #4: cnv4-jit ......................... Passed 2062.62 sec
Start 5: unit_tests
5/19 Test #5: unit_tests .......................***Failed 609.90 sec
Start 6: difficulty
6/19 Test #6: difficulty ....................... Passed 0.25 sec
Start 7: wide_difficulty
7/19 Test #7: wide_difficulty .................. Passed 38.04 sec
Start 8: block_weight
8/19 Test #8: block_weight ..................... Passed 184.81 sec
Start 9: hash-fast
9/19 Test #9: hash-fast ........................ Passed 0.23 sec
Start 10: hash-slow
10/19 Test #10: hash-slow ........................ Passed 1.37 sec
Start 11: hash-slow-1
11/19 Test #11: hash-slow-1 ...................... Passed 1.80 sec
Start 12: hash-slow-2
12/19 Test #12: hash-slow-2 ......................***Failed 6.17 sec
Start 13: hash-slow-4
13/19 Test #13: hash-slow-4 ......................***Failed 10.52 sec
Start 14: hash-tree
14/19 Test #14: hash-tree ........................ Passed 0.20 sec
Start 15: hash-extra-blake
15/19 Test #15: hash-extra-blake ................. Passed 0.04 sec
Start 16: hash-extra-groestl
16/19 Test #16: hash-extra-groestl ............... Passed 0.05 sec
Start 17: hash-extra-jh
17/19 Test #17: hash-extra-jh .................... Passed 0.04 sec
Start 18: hash-extra-skein
18/19 Test #18: hash-extra-skein ................. Passed 0.04 sec
Start 19: hash-variant2-int-sqrt
19/19 Test #19: hash-variant2-int-sqrt ........... Passed 1222.28 sec

core_and_unit_tests_be.zip

@moneromooo-monero
Copy link
Collaborator

It should all be in LastTest.log

@moneromooo-monero
Copy link
Collaborator

#5544

@moneromooo-monero
Copy link
Collaborator

Thanks much for the G5 access. The patch above fixes most issues. There's still a failure in serialization unit tests, which I think is due to using boost code that's not endianness nice (not 100% sure). I think all the rest is fixed (but it takes massive amounts of time to build/test on that G5 so I've not run a full test run).

@moneromooo-monero
Copy link
Collaborator

The serialization test failure is now also fixed, same PR.

@nioroso-x3
Copy link
Contributor

Monero and wallet are fully syncing and working on your PR, tested on my G5 and on a newer power8 in be mode!
I built two Fedora 27 VMs on a newer POWER8 server I got my hands into, big endian and little endian, you may use them for testing and building as you wish, it should be way faster then the G5.

monerodevs@nerv-la.ddns.net:1234 <- BE Fedora 27, 6 threads 8gb ram
monerodevs@nerv-la.ddns.net:4321 <- LE Fedora 27, 6 threads 8gb ram

@moneromooo-monero
Copy link
Collaborator

Thanks, I'll try to go build/test from time to time and fix any problems.

@selsta
Copy link
Collaborator

selsta commented Apr 8, 2022

Seems resolved.

@selsta selsta closed this as completed Apr 8, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

7 participants