-
Notifications
You must be signed in to change notification settings - Fork 7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
SGEMM is tangled and cumbersome to edit #86
Comments
https://github.com/pytorch/pytorch/blob/master/caffe2/utils/math_cpu.cc uses an EIGEN fallback. Otherwise, it takes the MKL (for GemmBatched), some BLAS provider (for explicit Batch + Gemm). https://github.com/microsoft/onnxjs/blob/8143e052c05c4b08798889c08a88c760b0238eab/src/wasm-ops/gemm.cpp#L22-L79 should be the same as https://github.com/pytorch/pytorch/blob/3cbf308a1fcc13aa9411740ce0a56fc18eb5e302/caffe2/utils/math_cpu.cc#L80. The commented PyTorch code indicates Eigen to be used as cross-platform fallback. We can omit USE_ONNX_SGEMM and simply call it USE_EIGEN to skip cloning more submodules than necessary?
There probably isn't. It's a C-API and will conflict between OpenBLAS/MKL. I'm stupid. 🤦 We should still be able to flatten a bit of nesting though and redundant layers in between. |
I have managed to do something at https://github.com/jerinphilip/sgemm/blob/c1e0f921b06882e75bedce5381772191b5d81a46/src/gemm.h, mixed feelings on whether the state of source is better or worse. Good news is there's the necessary foundation to measure and improve now. I have also managed to skip onnx-sgemm to directly tap into Eigen instead. ThinkPad (AVX2, MKL):
Oracle-Cloud ARM:
|
Android (Termux)
|
There are several problems with the existing sgemm integration. The main problem I find is an if-else/ifdef-else ladder determines BLAS provider where we verbose provide precedence and also check for existence.
This begins in
CMakeLists.txt
. Then trickles down into CPP source. sgemm routes through an sgemm which follows the BLAS defined API to MKL to CBLAS to ONNX SGEMM under ifdefs and multiple wraps. There is also aProdBatched
giving way to aProdBatchedOld
as another level of indirection?At two ifdefs / ifs are acceptable. At more than two some form of switch/dispatch needs to appear to reduce headhurt. The suggestion to do f32 sgemm with ruy in #79 (review) is also the addition of a fourth provider.
There are ODR compatible ways for multiple sgemm to exist, and what seems to perhaps better in my opinion is a
provider::sgemm(...)
with sgemm following a standard API. It is possible to allow multiple providers (MKL, OpenBLAS, Accelerate etc) to exist in an ODR compatible way and give them ranks and a mechanism to prioritize which one at runtime (maybe decide at compile-time even).The text was updated successfully, but these errors were encountered: