-
-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Bootstrap failure on ARMv7 #10602
Comments
you must match the |
@vtjnash That's the case AFAICT, as the values of these two flags passed to gcc when building Julia are exactly the same as those used to build LLVM (that's the point of a distribution, after all). But I'm not sure how code generation from Julia works: what flags are used in that case? Those used to build LLVM? Some other default? Can that be the problem? Another surprising fact is that changing the value of |
I get yet another error on Fedora rawhide (gcc 5 with
https://kojipkgs.fedoraproject.org//work/tasks/7279/9357279/build.log @ViralBShah Any suggestions? EDIT: |
I suspect it is a julia bug, and we should run it through valgrind or something to figure things out. It is quite common for the build to die in the system image phase in different places on different ARM architectures. Cc: @ihnorton |
I wonder if Jameson's patches for ppc will help - especially the bit relaying to cpuid stuff. |
i didn't patch any cpuid stuff, just deleted some dead code to make it easier to build. fwiw, a little known fact is that the |
Yes, I did mean disabling it. The contents suggest hard coded values for armv7, which probably don't work out well on other arm processors. I will try using SYS.ji from elsewhere. |
i don't think those values were actually read anywhere that mattered. you might also try enabling MEMDEBUG (in options.h) and see if that helps |
I ran the bootstrap inside Valgrind, with
https://kojipkgs.fedoraproject.org//work/tasks/4521/9414521/build.log |
I am working on setting up a machine at scaleway.com. That should hopefully allow people to get in and fix things quickly. |
Cool. Though the Fedora build system is not that bad, I can easily make some local changes and send them for build. (Not sure about the speed of the build machine compared with Scaleway's.) |
Crap, the build timed out after 24h. I tried using gdb instead of Valgrind, but a bug prevents it from working on Fedora arm at the moment. So if you have a Fedora/RHEL image on Scaleway, I'd be happy to try there. |
I'll repeat the offer I made on julia-users: if someone wants a shell account on a Raspberry Pi 2 to test builds, I'm happy to provide one (or more). Just let me know. I have been unsuccessful in getting Julia to build (see #10235) and would welcome someone with some expertise here. |
My chromebook environment got wiped away due to an error on my part during reboot. It will be a while for me to set it up again. This may not be sooner than next week. |
I am guessing this one is also the ARM architecture not being detected correctly as in #10917 |
@ViralBShah Shouldn't passing |
I would have thought so, but could you try Can you post the |
"could you try JULIA_CPU_ARCH=arm1176jzf-s? That is the one from Raspberry Pi 1, that ought to be conservative enough." Not sure.. This is for ARM11 chip, that is pre-ARMv7 (yes, know it's confusing..). It seems from the Wikipedia RPi article, a few operating systems work on RPi 2, not all that work for the original. [Still, the GPU - everything else I think besides the ARM core, is the same.] ARM cores are not always fully upward compatible. E.g. 26-bit addressing mode was dropped at some point. [And ARMv8-A e.g. isn't with ARMv7 for kernel space, but are for user space.] I'm not sure I can help more, not even sure what "I get failures during the bootstrap step" means.. and do not have either Pi/ARM-chip except in my ARMv7 phone I'm using and ARMv6 phone I'm not using, but could root/whatever.. and my original Acorn Archimedes (kind of broken..), think I had some programmers ref./ARM? manuals at some point.. |
@nalimilan Can you try once again? |
Please reopen if reproducible. |
@ViralBShah Did you mean to close this? |
Yes - thank you. |
@ViralBShah Sorry for not replying earlier. Now I have more time to investigate this. The error has moved with 0.4.0. With LLVM 3.7, the build now stops in inference.jl:
https://kojipkgs.fedoraproject.org//work/tasks/1325/11491325/build.log I get the same error with With LLVM 3.3, it fails even earlier: https://kojipkgs.fedoraproject.org//work/tasks/1327/11491327/build.log
If you can give me access to an ARM machine with a Fedora image I can try to debug this further (though I'm not really the best person to do that -- I can give anybody simple instructions to create a similar build only from a Fedora image and the standard Julia sources). |
I've just found a direct SSH access to the same kind of machine. Turns out building Julia with
primes.jl:118 is this const PRIMES = primes(2^16) Please give me any instructions you can think of to debug this, both with and without |
I've found a shorter way of reproducing the bug: list = Int[]
sizehint!(list, floor(Int, 2^16 / log(2^16))) but calling If I change the call to |
Interesting. If I add But that second failure doesn't happen if I do not add
|
I think I nailed one of the culprits: when building with
|
Is there any solution to check memory accesses when building random.jl without running the full bootstrap phase under Valgrind? I feel like it's going to take the whole week... (and all that for possibly a broken trace!) |
Also, if it builds, the random tests fail. |
I wonder if malloc does not return 8-aligned pointers by default on arm, and if that could possibly be the issue here. Cc @ihnorton |
The
There are some tips for debugging bootstrap errors, here: http://docs.julialang.org/en/release-0.4/devdocs/debuggingtips/#debugging-during-julia-s-build-process-bootstrap |
@ihnorton @ViralBShah Thanks. So I've made some progress. When building using the system LLVM, the "undefined
There's only one call to When building with
Under what circumstances wouldn't Julia use the hard float ABI, despite all the system being configured to do that? Can it come from a CPU detection issue in LLVM? I've tried passing |
Unfortunately, I get the same ld error about VFP when building LLVM and Julia using |
@ihnorton Most of those memory alignment assertions are only enabled when Regarding the ld error about VFP, I've checked by calling |
The Valgrind run stopped for an unknown reason (probably OOM) after days, before reaching
|
I've just tried with latest master, and I still get the same error with
With
The gdb backtrace when breaking at
Here's what Valgrind (with
|
Have you checked if this is a stackoverflow? |
Adding |
No, I mean |
Ah. No, |
Yeah, assuming this is the same system as #13752 (comment), I'll not be too surprised if the ABI issue could mess up bootstrap... |
In addition to that indicated above with primes.jl and random.jl (seg fault), the following happens. Both random.jl and dSFMT.jl are left out of sysimg.jl to proceed. Various other files are modified to remove any references to Random provided functions including
further on...
caused by REPL documentation
and finally, the full message
So
|
Is this still an issue? |
Yes... |
Probably should open a new issue (since so much has changed) if still a problem on the same machine. |
When trying to build a nightly RPM on Fedora 22 armv7hl, I get failures during the bootstrap step. Depending on the value of
JULIA_TARGET_CPU
I use, the error is different:native
(JULIA_TARGET_CPU
not passed):https://kojipkgs.fedoraproject.org//work/tasks/26/9290026/build.log
cortex-a8
:https://kojipkgs.fedoraproject.org//work/tasks/1460/9291460/build.log
I'm not familiar with ARM at all, so I'm not sure which one is the more reasonable (maybe none). Note that I'm using
USE_SYSTEM_LLVM=1
, which is LLVM 3.5. A difficulty is that gcc, which is used in the build, uses-march=armv7-a
on Fedora armv7hl, but LLVM does not accept this as a target. This is why I triedJULIA_CPU_TARGET=cortex-a8
. Fedora's LLVM is built using these settings:http://pkgs.fedoraproject.org/cgit/llvm.git/tree/llvm.spec?h=f22&id=5aea06bdf020fd2fc750286d397e51e01a94a765#n404
I see that ARM.inc advises
--with-cpu=cortex-a9 --with-fpu=neon
instead. Do you think that can be an issue?BTW, note
JCFLAGS
includes-fsigned-char
.The text was updated successfully, but these errors were encountered: