-
Notifications
You must be signed in to change notification settings - Fork 66
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Precompilation error on the Nvidia Jetson platform #101
Comments
If you're willing to help, I think we can get things to work. First of all, it looks like
const REGISTER_SIZE = $register_size # bytes per register
const REGISTER_COUNT = $register_count # how many floating point registers?
const REGISTER_CAPACITY = $register_capacity # not used, but it was size * count
const FP256 = $(cpufeature(CpuId.FP256)) # not used, was supposed to distinguish zen1
const CACHELINE_SIZE = $(cachelinesize()) #
const CACHE_SIZE = $cache_size # NTuple{3,Int} (L1, L2, L3), but I should generalize code that uses it for different numbers of cache levels
const NUM_CORES = $num_cores # number of physical cores
const FMA3 = $(cpufeature(CpuId.FMA3)) # does it have fused multiply-add? This is used specifically for whether it has the `vfmadd231` instruction, to see if it can use asm call for that particular variant
const AVX2 = $(cpufeature(CpuId.AVX2)) # Does it have SIMD integer support?
const AVX512F = $(cpufeature(CpuId.AVX512F)) # Does it have AVX512?
const AVX512ER = $(cpufeature(CpuId.AVX512ER)) # does it have hardware exp2, and accurate hardware inverse and inverse square root?
const AVX512PF = $(cpufeature(CpuId.AVX512PF)) # avx512 prefetch extensions
const AVX512VL = $(cpufeature(CpuId.AVX512VL)) # do avx512 instructions work with shorter registers?
const AVX512BW = $(cpufeature(CpuId.AVX512BW)) # avx512 with 8- and 16-bit integer support?
const AVX512DQ = $(cpufeature(CpuId.AVX512DQ)) # avx512 with 32- and 64-bit integer support?
const AVX512CD = $(cpufeature(CpuId.AVX512CD)) # conflict detection, includes SIMD count-leading-zeroes Do you happen to know a lot of low level details about ARM? Or how to query them? Some of these constants could be renamed to generalize them across instruction sets, others can be split into general and specific versions (e.g., From there, we'd have to make sure SIMDPirates and SLEEFPirates work as intended. |
Hi thanks for your prompt response. Unfortunately I've got no practical experience with ARM chips or any low level instruction set programming . However I'm willing to help if needed. It seems like as you have mentioned the problem lies within the CpuId.jl. julia> using CpuId
[ Info: Precompiling CpuId [adafc99b-e345-5852-983c-f28acb93d879]
error: couldn't allocate output register for constraint '{ax}'
ERROR: Failed to precompile CpuId [adafc99b-e345-5852-983c-f28acb93d879] to /home/coz/.julia/compiled/v1.4/CpuId/vMZBF_LPpax.ji.
Stacktrace:
[1] error(::String) at ./error.jl:33
[2] compilecache(::Base.PkgId, ::String) at ./loading.jl:1272
[3] _require(::Base.PkgId) at ./loading.jl:1029
[4] require(::Base.PkgId) at ./loading.jl:927
[5] require(::Module, ::Symbol) at ./loading.jl:922 Maybe I should redirect this to their issue board? |
You could, but it's likely that ARM is out of scope for CpuId.jl, in which case VectorizationBase would need an alternative means of getting info about the host computer. |
Could you see if this works? |
The v0.8.1 seems to work fine. Thank you! (@v1.4) pkg> status
Status `~/.julia/environments/v1.4/Project.toml`
...
[bdcacae8] LoopVectorization v0.6.30
...
(@v1.4) pkg> update
...
[bdcacae8] ↑ LoopVectorization v0.6.30 ⇒ v0.8.1
...
julia> using LoopVectorization
[ Info: Precompiling LoopVectorization [bdcacae8-1622-11e9-2a5c-532679323890]
# pass 👍 |
@chriselrod @ozmaden Is this issue resolved? |
I think so, but I'd like to improve ARM support, especially as more SVE CPUs start appearing (recently Neoverse and A64FX). |
Haven't had problems since the last comment in May, so I think it is resolved |
Hi, firstly thanks for the work on the package.
I'm trying to get some code using LoopVectorization.jl running on the NVIDIA Jetson AGX Xavier.
Unfortunately it fails in the precompilation stage in REPL. Below is the complete stacktrace invoked by only including this package.
I'm using Julia 1.4.0 and LoopVectorization v0.6.30. The Jetson has a 64-Bit ARM CPU.
Any help would be appreciated.
The text was updated successfully, but these errors were encountered: