-
Notifications
You must be signed in to change notification settings - Fork 17
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add support for ILP64 Accelerate #113
Conversation
The new Accelerate released in macOS v13.3 provides two new interfaces; an upgraded LAPACK for the LP64 interface, and an all-new ILP64 interface (that uses the same upgraded LAPACK). These symbols are available from Accelerate with the suffix `$NEWLAPACK` and `$NEWLAPACK$ILP64`, respectively. Unfortunately, this is not a "true" suffix, as Apple has decided to drop the trailing underscore from the typical F77 names, meaning that a symbol such as `dgemm_` gets mangled to `dgemm$NEWLAPACK`, whereas a CBLAS symbol such as `cblas_zdotc_sub` gets mangled to `cblas_zdotc_sub$NEWLAPACK`. This means that we need to selectively erase the trailing underscore from some symbols when applying this Accelerate suffix. To do this, we add a new feature, enabled by default only on Apple builds, called `SYMBOL_TRIMMING`, which allows a `suffix_hint` to contain the ASCII "substitution character" `0x1a` as the first character of the suffix hint to mean "remove a trailing underscore when applying this suffix". To make dealing with suffix hints easier for command-line users, these suffix hints are available for use in `LBT_BACKING_LIBS` by listing libraries separated by suffix hints with an exclamation point, e.g. `libname!suffix`.
To test this branch out on your local Julia, run the following script: JULIA_PREFIX=/path/to/julia
# Back up normal LBT
for f in ${JULIA_PREFIX}/lib/julia/libblastrampoline*; do
cp -v ${f} ${f}.backup
done
# Build new LBT
cd src
make
# Install new LBT into Julia's lib/julia directory
cp build/libblastrampoline.5.dylib ${JULIA_PREFIX}/lib/julia/libblastrampoline.5.dylib
cp build/libblastrampoline.5.dylib ${JULIA_PREFIX}/lib/julia/libblastrampoline.dylib Then, when you launch Julia, load Accelerate via: using LinearAlgebra
# Load LP64 interface first
BLAS.lbt_forward("/System/Library/Frameworks/Accelerate.framework/Accelerate"; suffix_hint="\x1a\$NEWLAPACK", verbose=true, clear=true)
# Load ILP64 interface next
BLAS.lbt_forward("/System/Library/Frameworks/Accelerate.framework/Accelerate"; suffix_hint="\x1a\$NEWLAPACK\$ILP64", verbose=true)
# Ensure that an ILP64 interface was actually loaded:
config = BLAS.get_config()
if !any(lib.interface == :ilp64 for lib in config.loaded_libs)
@error("No ILP64 interfaces found; are you sure you're on macOS 13.3 or higher?!")
end
# Show LBT config:
display(BLAS.get_config()) |
Verified that this works for me, and for |
Can someone try running the LinearAlgebra tests with Accelerate loaded? |
Results of the LinearAlgebra tests. Looks pretty good. The
|
In the QR tests, it is a difference on the order of eps.
|
The pivoted cholesky failure is reproduced as follows:
Any ideas? @amontoison @dkarrasch |
@staticfloat I believe it is fine to merge this PR. |
I highly suspect a bug in the new BLAS / LAPACK routines of Apple Accelerate. julia> cpapd = cholesky(apdh, RowMaximum())
CholeskyPivoted{Float32, Matrix{Float32}, Vector{Int64}}
U factor with rank 10:
10×10 UpperTriangular{Float32, Matrix{Float32}}:
2.26012 -0.406325 0.432496 -0.68568 0.233053 -0.250507 -0.196566 -0.448386 0.0055555 0.234503
⋅ 1.80666 -0.240994 -0.19439 0.305877 0.509079 -0.384212 -1.02195 -1.08145 0.504138
⋅ ⋅ 1.69702 -0.555147 -0.052626 0.401659 -0.125056 -0.329646 -0.78331 -0.24981
⋅ ⋅ ⋅ 1.60077 0.510761 0.384224 1.07492 -0.0627441 -1.01282 -0.207023
⋅ ⋅ ⋅ ⋅ 1.4141 -0.21588 0.467681 0.617479 -0.430583 -0.206794
⋅ ⋅ ⋅ ⋅ ⋅ 1.35825 -0.0856505 0.182823 0.1893 -0.0248058
⋅ ⋅ ⋅ ⋅ ⋅ ⋅ 0.7197 0.116784 -0.014745 0.166019
⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ 0.361517 0.0519991 0.0500296
⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ 0.289647 0.155059
⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ 0.17166
permutation:
10-element Vector{Int64}:
4
1
2
7
10
6
5
9
8
3 |
@amontoison That is what I observed too with openblas, and I was just wondering if we are doing something non-standard or making assumptions in the way we call those routines right now. Great idea to check with MKL, because that suggests that it really is a bug. |
The one thing I want to do before merging this PR is to provide sane fallback behavior for the set/get threads API when the underlying library has no concept of threads. My plan is to make set threads do nothing, and get threads always return |
Does someone here know what LAPACK call is the culprit for the wrong pivoted cholesky results? Is it |
Yes, it seems that it's pstrf. |
Do the Intel macs also have the updated ILP64 BLAS and LAPACK? |
Yes. @staticfloat 's instructions worked fine on my 1999 Intel Mac. |
I'm reporting the |
This isolates a known LAPACK failure on the new Accelerate; use it to track the issue as Apple hopefully fixes it.
Because we now know that Accelerate has a problem with `dpstrf()`, let's not default to the new LAPACK version quite yet.
8022434
to
daa47a3
Compare
I suspect that the old LAPACK release that Apple used to ship (LAPACK 3.1 IIRC) didn't even have I assume the plan for now should be for LBT to support all the new stuff, but Julia to default to openblas, with users opting into using the new capabilities through AppleAccelerate.jl (like we do MKL.jl). |
The old LAPACK release did have
Yes. I think |
Will we have the release 5.6.0 of LBT with Julia 1.9? |
We should open the PR on Julia master and mark it for backporting. Even if not 1.9, I think we can have it in 1.9.1. |
Apple got back to me; they have identified the issue and are trying to get a fix into a future macOS version. |
Thanks @ViralBShah! I'm working with the HSL team to release of JuliaHSL package with a precompiled |
Wouldn't the licensing issue be the same if inside HSL.jl vs BB? Perhaps the right thing is for them to serve the JLL through a separate registry. |
The new Accelerate released in macOS v13.3 provides two new interfaces; an upgraded LAPACK for the LP64 interface, and an all-new ILP64 interface (that uses the same upgraded LAPACK). These symbols are available from Accelerate with the suffix
$NEWLAPACK
and$NEWLAPACK$ILP64
, respectively.Unfortunately, this is not a "true" suffix, as Apple has decided to drop the trailing underscore from the typical F77 names, meaning that a symbol such as
dgemm_
gets mangled todgemm$NEWLAPACK
, whereas a CBLAS symbol such ascblas_zdotc_sub
gets mangled tocblas_zdotc_sub$NEWLAPACK
. This means that we need to selectively erase the trailing underscore from some symbols when applying this Accelerate suffix.To do this, we add a new feature, enabled by default only on Apple builds, called
SYMBOL_TRIMMING
, which allows asuffix_hint
to contain the ASCII "substitution character"0x1a
as the first character of the suffix hint to mean "remove a trailing underscore when applying this suffix".To make dealing with suffix hints easier for command-line users, these suffix hints are available for use in
LBT_BACKING_LIBS
by listing libraries separated by suffix hints with an exclamation point, e.g.libname!suffix
.