Precompilation error on the Nvidia Jetson platform #101

coezmaden · 2020-04-26T14:32:06Z

Hi, firstly thanks for the work on the package.

I'm trying to get some code using LoopVectorization.jl running on the NVIDIA Jetson AGX Xavier.

Unfortunately it fails in the precompilation stage in REPL. Below is the complete stacktrace invoked by only including this package.

I'm using Julia 1.4.0 and LoopVectorization v0.6.30. The Jetson has a 64-Bit ARM CPU.

(@v1.4) pkg> status
Status ~/.julia/environments/v1.4/Project.toml
.
.
[bdcacae8] LoopVectorization v0.6.30
.
.

julia> using LoopVectorization
[ Info: Precompiling LoopVectorization [bdcacae8-1622-11e9-2a5c-532679323890]
ERROR: LoadError: could not open file /home/coz/.julia/packages/VectorizationBase/FfrB7/src/cpu_info.jl
Stacktrace:
[1] include(::Module, ::String) at ./Base.jl:377
[2] include(::String) at /home/coz/.julia/packages/VectorizationBase/FfrB7/src/VectorizationBase.jl:1
[3] top-level scope at /home/coz/.julia/packages/VectorizationBase/FfrB7/src/VectorizationBase.jl:215
[4] include(::Module, ::String) at ./Base.jl:377
[5] top-level scope at none:2
[6] eval at ./boot.jl:331 [inlined]
[7] eval(::Expr) at ./client.jl:449
[8] top-level scope at ./none:3
in expression starting at /home/coz/.julia/packages/VectorizationBase/FfrB7/src/VectorizationBase.jl:215
ERROR: LoadError: Failed to precompile VectorizationBase [3d5dd08c-fd9d-11e8-17fa-ed2836048c2f] to /home/coz/.julia/compiled/v1.4/VectorizationBase/Dto5m_LPpax.ji.
Stacktrace:
[1] error(::String) at ./error.jl:33
[2] compilecache(::Base.PkgId, ::String) at ./loading.jl:1272
[3] _require(::Base.PkgId) at ./loading.jl:1029
[4] require(::Base.PkgId) at ./loading.jl:927
[5] require(::Module, ::Symbol) at ./loading.jl:922
[6] include(::Module, ::String) at ./Base.jl:377
[7] top-level scope at none:2
[8] eval at ./boot.jl:331 [inlined]
[9] eval(::Expr) at ./client.jl:449
[10] top-level scope at ./none:3
in expression starting at /home/coz/.julia/packages/LoopVectorization/zXjmq/src/LoopVectorization.jl:3
ERROR: Failed to precompile LoopVectorization [bdcacae8-1622-11e9-2a5c-532679323890] to /home/coz/.julia/compiled/v1.4/LoopVectorization/4TogI_LPpax.ji.
Stacktrace:
[1] error(::String) at ./error.jl:33
[2] compilecache(::Base.PkgId, ::String) at ./loading.jl:1272
[3] _require(::Base.PkgId) at ./loading.jl:1029
[4] require(::Base.PkgId) at ./loading.jl:927
[5] require(::Module, ::Symbol) at ./loading.jl:922

Any help would be appreciated.

chriselrod · 2020-04-26T15:14:59Z

If you're willing to help, I think we can get things to work.

First of all, it looks like VectorizationBase failed to build. While building, it creates a file (using CpuId.jl) with a bunch of constants describing the host CPU.

CpuId.jl looks like it is Intel and AMD only (or x86 only?).
So the first thing needed would be a build script that is able to detect it's an ARM processor, and then if so correctly fill out these fields:

const REGISTER_SIZE = $register_size # bytes per register
const REGISTER_COUNT = $register_count # how many floating point registers?
const REGISTER_CAPACITY = $register_capacity # not used, but it was size * count
const FP256 = $(cpufeature(CpuId.FP256)) # not used, was supposed to distinguish zen1
const CACHELINE_SIZE = $(cachelinesize()) # 
const CACHE_SIZE = $cache_size # NTuple{3,Int} (L1, L2, L3), but I should generalize code that uses it for different numbers of cache levels 
const NUM_CORES = $num_cores # number of physical cores
const FMA3 = $(cpufeature(CpuId.FMA3)) # does it have fused multiply-add? This is used specifically for whether it has the `vfmadd231` instruction, to see if it can use asm call for that particular variant
const AVX2 = $(cpufeature(CpuId.AVX2)) # Does it have SIMD integer support?
const AVX512F = $(cpufeature(CpuId.AVX512F)) # Does it have AVX512?
const AVX512ER = $(cpufeature(CpuId.AVX512ER)) # does it have hardware exp2, and accurate hardware inverse and inverse square root?
const AVX512PF = $(cpufeature(CpuId.AVX512PF)) # avx512 prefetch extensions
const AVX512VL = $(cpufeature(CpuId.AVX512VL)) # do avx512 instructions work with shorter registers?
const AVX512BW = $(cpufeature(CpuId.AVX512BW)) # avx512 with 8- and 16-bit integer support?
const AVX512DQ = $(cpufeature(CpuId.AVX512DQ)) # avx512 with 32- and 64-bit integer support?
const AVX512CD = $(cpufeature(CpuId.AVX512CD)) # conflict detection, includes SIMD count-leading-zeroes

Do you happen to know a lot of low level details about ARM? Or how to query them?

Some of these constants could be renamed to generalize them across instruction sets, others can be split into general and specific versions (e.g., FMA3 for whether it has that specific instruction set and a more general HAS_FUSED_MULTIPLY_ADD which would be true for FMA4 on x86, and whatever the ARM equivalent is).

From there, we'd have to make sure SIMDPirates and SLEEFPirates work as intended.

coezmaden · 2020-05-01T11:24:49Z

Hi thanks for your prompt response. Unfortunately I've got no practical experience with ARM chips or any low level instruction set programming . However I'm willing to help if needed. It seems like as you have mentioned the problem lies within the CpuId.jl.

julia> using CpuId
[ Info: Precompiling CpuId [adafc99b-e345-5852-983c-f28acb93d879]
error: couldn't allocate output register for constraint '{ax}'
ERROR: Failed to precompile CpuId [adafc99b-e345-5852-983c-f28acb93d879] to /home/coz/.julia/compiled/v1.4/CpuId/vMZBF_LPpax.ji.
Stacktrace:
 [1] error(::String) at ./error.jl:33
 [2] compilecache(::Base.PkgId, ::String) at ./loading.jl:1272
 [3] _require(::Base.PkgId) at ./loading.jl:1029
 [4] require(::Base.PkgId) at ./loading.jl:927
 [5] require(::Module, ::Symbol) at ./loading.jl:922

Maybe I should redirect this to their issue board?

chriselrod · 2020-05-01T11:55:33Z

You could, but it's likely that ARM is out of scope for CpuId.jl, in which case VectorizationBase would need an alternative means of getting info about the host computer.

chriselrod · 2020-05-25T06:27:57Z

Could you see if this works?
JuliaSIMD/VectorizationBase.jl#9

coezmaden · 2020-05-25T20:28:23Z

Could you see if this works?
chriselrod/VectorizationBase.jl#9

The v0.8.1 seems to work fine. Thank you!

(@v1.4) pkg> status
Status `~/.julia/environments/v1.4/Project.toml`
...
  [bdcacae8] LoopVectorization v0.6.30
...

(@v1.4) pkg> update
...
  [bdcacae8] ↑ LoopVectorization v0.6.30 ⇒ v0.8.1
...

julia> using LoopVectorization
[ Info: Precompiling LoopVectorization [bdcacae8-1622-11e9-2a5c-532679323890]

# pass 👍

DilumAluthge · 2021-01-22T19:51:48Z

@chriselrod @ozmaden Is this issue resolved?

chriselrod · 2021-01-22T20:00:57Z

I think so, but I'd like to improve ARM support, especially as more SVE CPUs start appearing (recently Neoverse and A64FX).

coezmaden · 2021-01-22T20:04:47Z

Haven't had problems since the last comment in May, so I think it is resolved

coezmaden mentioned this issue May 1, 2020

Precompilation error on the Nvidia Jetson platform m-j-w/CpuId.jl#39

Closed

chriselrod closed this as completed Jan 22, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Precompilation error on the Nvidia Jetson platform #101

Precompilation error on the Nvidia Jetson platform #101

coezmaden commented Apr 26, 2020

chriselrod commented Apr 26, 2020

coezmaden commented May 1, 2020

chriselrod commented May 1, 2020

chriselrod commented May 25, 2020

coezmaden commented May 25, 2020

DilumAluthge commented Jan 22, 2021

chriselrod commented Jan 22, 2021

coezmaden commented Jan 22, 2021

Precompilation error on the Nvidia Jetson platform #101

Precompilation error on the Nvidia Jetson platform #101

Comments

coezmaden commented Apr 26, 2020

chriselrod commented Apr 26, 2020

coezmaden commented May 1, 2020

chriselrod commented May 1, 2020

chriselrod commented May 25, 2020

coezmaden commented May 25, 2020

DilumAluthge commented Jan 22, 2021

chriselrod commented Jan 22, 2021

coezmaden commented Jan 22, 2021