-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support for x86/ARM CPUs (e.g., Xeon, M1) #194
Comments
Hi! There is no plan for M1 support at the time. This will probably be trivial to add when CPU support is done -- but even that isn't a big priority and it wouldn't address the GPU either. If Apple had an LLVM target for their GPU, adding support would be pretty easy, but it seems like as things stand the only way to program it is with Metal. |
@ptillet It's ok with the CPU version for M1, there's an Apple Matrix coprocessor (AMX) inside the M1 chip (an interesting article), the vector/matrix computations are all delivered automatically (maybe secretly) to AMX. According to my experience, Apple's CPU core is much faster than my old desktop GPU (GeForce RTX 2070 8GB) in machine learning training tasks with small datasets (less than 1GB, e.g., sparse data, graph data, small image datasets), this is done without any help of GPUs or Neural Engine inside M1. |
Out of curiosity, I built titron from source using the llvm compilers installed in a conda-forge environment on a macOS/M1 laptop (
Traceback (most recent call last):
File "/Users/ogrisel/code/triton/python/tutorials/01-vector-add.py", line 17, in <module>
import triton
File "/Users/ogrisel/code/triton/python/triton/__init__.py", line 10, in <module>
from .runtime import Config, autotune, heuristics, JITFunction, KernelInterface
File "/Users/ogrisel/code/triton/python/triton/runtime/__init__.py", line 1, in <module>
from .autotuner import Config, Heuristics, autotune, heuristics # noqa: F401
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/ogrisel/code/triton/python/triton/runtime/autotuner.py", line 7, in <module>
from ..testing import do_bench
File "/Users/ogrisel/code/triton/python/triton/testing.py", line 9, in <module>
import triton._C.libtriton.triton as _triton
ImportError: dlopen(/Users/ogrisel/code/triton/python/triton/_C/libtriton.so, 0x0002): symbol not found in flat namespace '__ZN4llvm24DisableABIBreakingChecksE' |
Note that even a generic CPU / openmp backend (without architecture specific optimizations) would be useful as a fallback mechanisms for libraries that would like to use triton without introducing a hard dependency to a GPU runtime. This would also help to make it possible to use free CI services to run the test suite for the kernels on CPU for cheap in individual pull requests and then couple that with an on-demand, more expensive CI worker with a GPU. |
[ROCM] Use `llvm::dbgs()` and `llvm::errs()` streams instead of `std::cout`.
Any plan for M1/M1X? It's 2023 now |
No plan for CPU support, but you can use the |
Port `test_block_pointer.py` to XPU and add it to CI and `test-triton.sh`. Signed-off-by: Tiotto, Ettore <ettore.tiotto@intel.com>
Hi there,
Is there any future plan for macOS support?
The text was updated successfully, but these errors were encountered: