-
-
Notifications
You must be signed in to change notification settings - Fork 48
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Option to install numpy built with Apple's Accelerate BLAS implementation #253
Comments
Conda-forge can only really depend on things it can distribute itself (with some exceedingly rare exceptions that are both rock-stable and also difficult/dangerous to replace; e.g. glibc). Since the accelerate sources are not available, and I don't think the binaries are redistributable (or ABI-stable enough without further metadata), I'm doubtful whether this is feasible. @isuruf might have a more definite answer. |
That's not the issue. The issue is that Accelerate's lapack version is 3.2.1 which is ancient. Therefore Accelerate can only provide blas which means only numpy can support it. scipy and many other packages cannot. |
That's interesting. I guess that reflects Apple's lack of support for Fortran. But yes, now that you mention it, the SciPy docs also say the same. |
Isuru found a way around this: conda-forge/blas-feedstock#82 In 2-3 hours, you should be able to try installing I'll start testing this in #252 once it becomes available as well... |
Seems to be broken at the moment. I created a fresh environment with Python 3.9 and
|
@ollie-bell would suggest sharing the full list of packages from the environment in case there is a clue in there. Also would include |
|
On osx-x86, the test suite runs in #252 through without failures. However, osx-arm64 (as for @ollie-bell) isn't being tested (only cross-compiled) because we don't have the hardware for it in CI. |
Does this issue show if you use |
@jakirkham the issue only shows with |
Should be fixed now |
@ollie-bell could you report back on the performance please? Do you still see any performance improvement? |
@ngam yes indeed. When I run the numpy linalg benchmarks I typically see 2-4x speed up with the new accelerate installation. I assume those benchmarks are robust and representative of real use! |
Hi! I landed on this issue after reading this post (see also a similar SE post). There, a simple SVD benchmark demonstrates more than 4x speedup when I'm hoping for a cleaner
? Do you regard this approach as stable in the long term? And will running Finally, what about related libraries such as |
Currently, OpenBLAS is the default for this type of switching, unless the maintainers decide otherwise to make Accelerate (vecLib) the default for OSX: https://conda-forge.org/docs/maintainer/knowledge_base.html#blas
I believe SciPy dropped support for Accelerate (vecLib) completely: https://github.com/scipy/scipy/wiki/Dropping-support-for-Accelerate Just note that enforcing |
I think this type of comment should push us to think of a global solution, where all maintainers are encouraged to link against netlib (cc @isuruf) instead of any other BLAS. And perhaps we could make it such that on osx-arm64 we default to Accelerate instead of OpenBLAS. But that is beyond me personally... |
Thanks @ngam for the great response!
This is particularly useful (if somewhat painful) to know. Does what you say about |
SciPy seems to be fine, I just tested it, because they don't hard-code to OpenBLAS. I have to admit, I am also still new to this and trying to better understand it. @isuruf, is it correct that in the case of scipy, even if they don't support Accelerate, when enforcing |
No, it will use Accelerate. That's why I said it's experimental and is not recommended. |
I don't know where you got this idea that this is not encouraged. In fact, almost all packages link against netlib. Only exceptions I know are pytorch and tensorflow. |
I wasn't aware --- I saw a number of packages link explicitly against openblas (e.g. julia, but that's maybe because that's an upstream decision?) |
Yes, julia is another exception because upstream requires ILP64 variants which are not available in other implementations. |
Let me see if we/I can run the test suite of scipy with accelerate. |
I'll save you the trouble. There are segfaults. |
Thanks @isuruf for your clarifications! Are there issues/PRs relevant to the SciPy front that I and other readers can follow? And, for the time being, is there a respectable solution in which SciPy et al are linked to OpenBLAS while NumPy is linked to accelerate? |
You can try building numpy from source. |
Indeed. And I already have, successfully. I guess I am curious to know about the PyData ecosystem's roadmap concerning these accelerations. The SciPy reasons for dropping support seem pretty damning, so I wonder if there are any reasons to be optimistic about benefitting from out-of-the-box accelerations using conda-forge in the future. |
SciPy specifically has faced some issues with M1 (at least their PyPI wheels: https://mail.python.org/archives/list/scipy-dev@python.org/thread/LLN2O4G2XI2MPILRW2XRRVCUK336WGKF/). There might be a more comprehensive solution soon... so yes, be optimistic! I would generally say, the whole M1 thing (I have two machines myself) is experimental and so patience is needed as developers refine stuff. If scientific computing performance is truly critical, I wouldn't expect it to be done on personal M1 machines anyway (more like HPC). I think OpenBLAS (at least through conda-forge) is definitely good enough for now; yes, we can get better performance, but we will have to wait a little longer :) |
@ngam SciPy's problems with Apple's Accelerate greatly pre-date M1: https://github.com/scipy/scipy/wiki/Dropping-support-for-Accelerate |
Yes, @dopplershift. Sorry if I made it sound like it was an M1-only issue. But the point is, on Intel Macs, one could use MKL BLAS which outperforms Accelerate BLAS. However, on M1 machines, Accelerate BLAS outperforms OpenBLAS. |
rgommers pointed that same exact link to me 😄 |
(I am not sure if SciPy actually supports MKL or not; I've been only focusing on the Accelerate issue on M1 Macs.) |
Should add it is probably not too surprising that Accelerate outperforms OpenBLAS on M1 given that OpenBLAS hasn't been tuned for that architecture ( OpenMathLib/OpenBLAS#2814 ). It is possible that changes after that work happens |
SciPy does support MKL. And ATLAS, and BLIS. |
Yeah we do have support for BLIS. Though it doesn't appear to be migrated yet. Added here ( conda-forge/conda-forge-pinning-feedstock#2444 ). Seems like there was some work upstream for M1, but Idk to what extent that has been included in releases. |
Is there a documentation how to achieve this? According to https://gist.github.com/MarkDana/a9481b8134cf38a556cf23e1e815dafb it seems the support for Accelerate was dropped. |
Please read this doc |
I did, I set my channels:
- conda-forge
dependencies:
- python=3.11
- blas*=*accelerate
- libblas=*=*accelerate
- numpy
- scipy
- pandas
- scikit-learn I ran the tests of https://gist.github.com/MarkDana/a9481b8134cf38a556cf23e1e815dafb on my M1 Max.
From https://numpy.org/doc/stable/user/building.html?highlight=blas#accelerated-blas-lapack-libraries. |
Have created a NumPy env with Accelerate locally. Here's what I see (using $ otool -L ~/miniforge/envs/np_accel/lib/python3.11/site-packages/numpy/linalg/_umath_linalg.cpython-311-darwin.so
$CONDA_PREFIX/lib/python3.11/site-packages/numpy/linalg/_umath_linalg.cpython-311-darwin.so:
@rpath/liblapack.3.dylib (compatibility version 0.0.0, current version 0.0.0)
@rpath/libblas.3.dylib (compatibility version 0.0.0, current version 0.0.0)
/usr/lib/libSystem.B.dylib (compatibility version 1.0.0, current version 1292.0.0)
$ otool -L ~/miniforge/envs/np_accel/lib/libblas.3.dylib
$CONDA_PREFIX/lib/libblas.3.dylib:
@rpath/libvecLibFort-ng.dylib (compatibility version 0.0.0, current version 0.0.0)
/System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib (compatibility version 1.0.0, current version 1.0.0, reexport)
@rpath/liblapack-netlib.3.9.0.dylib (compatibility version 0.0.0, current version 0.0.0, reexport)
@rpath/liblapacke-netlib.3.9.0.dylib (compatibility version 0.0.0, current version 0.0.0, reexport)
/usr/lib/libSystem.B.dylib (compatibility version 1.0.0, current version 1292.0.0) |
That said, the second point raised is NumPy is deprecating support for this option. So one can use Accelerate. However if one runs into issues, likely there won't be any support from NumPy. |
I would like to point to the Apple developer docs state to support "LAPACK 3.9.1" but "[t]o use the new interfaces, define ACCELERATE_NEW_LAPACK before including the Accelerate or vecLib headers." |
Yes, we're aware - NumPy can use Accelerate already, and SciPy is highly likely to re-add support once macOS 13.3 is released. |
FWIW macOS 13.3 was released yesterday. |
@ngam |
@qdwang not yet, but maybe soon. Let's move the bit about PyTorch to the pytorch-cpu-feedstock (https://github.com/conda-forge/pytorch-cpu-feedstock) |
Is there any hope that we can use Accelerate on MacOS 14.0? |
NumPy already supports the new Accelerate on macOS >= 13.3, and should have wheels built against Accelerate for macOS >= 14.0. So it's possible - does require updating the conda-forge machinery that does runtime switching of BLAS/LAPACK though I think. |
It's a bit tricky; we need an entirely new blas flavour, or some smart switching based on the |
This issue has lots of different related issues jumbled. Please open a new issue if you feel like the current |
After update to MacOS 15 it seems I cannot use numpy with accelerate (numpy installed via pip is using accelerate) : I might open this as a separate issue
And the checking with numpy
|
More of a feature request. Is there a plan to enable installation of numpy built against Apple's Accelerate BLAS implementation when on oxs-arm64? e.g. something similar to
conda install -c conda-forge numpy "libblas=*=*accelerate"
(based on the instructions here).This can be achieved by building numpy from source and installing via pip (see these instructions), but it would be great to have a clean conda installation to achieve the same thing.
The text was updated successfully, but these errors were encountered: