Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SGEMM is tangled and cumbersome to edit #86

Open
jerinphilip opened this issue Mar 28, 2022 · 3 comments
Open

SGEMM is tangled and cumbersome to edit #86

jerinphilip opened this issue Mar 28, 2022 · 3 comments

Comments

@jerinphilip
Copy link

There are several problems with the existing sgemm integration. The main problem I find is an if-else/ifdef-else ladder determines BLAS provider where we verbose provide precedence and also check for existence.

This begins in CMakeLists.txt. Then trickles down into CPP source. sgemm routes through an sgemm which follows the BLAS defined API to MKL to CBLAS to ONNX SGEMM under ifdefs and multiple wraps. There is also a ProdBatched giving way to a ProdBatchedOld as another level of indirection?

At two ifdefs / ifs are acceptable. At more than two some form of switch/dispatch needs to appear to reduce headhurt. The suggestion to do f32 sgemm with ruy in #79 (review) is also the addition of a fourth provider.

There are ODR compatible ways for multiple sgemm to exist, and what seems to perhaps better in my opinion is a provider::sgemm(...) with sgemm following a standard API. It is possible to allow multiple providers (MKL, OpenBLAS, Accelerate etc) to exist in an ODR compatible way and give them ranks and a mechanism to prioritize which one at runtime (maybe decide at compile-time even).

@jerinphilip
Copy link
Author

jerinphilip commented Mar 29, 2022

https://github.com/pytorch/pytorch/blob/master/caffe2/utils/math_cpu.cc uses an EIGEN fallback. Otherwise, it takes the MKL (for GemmBatched), some BLAS provider (for explicit Batch + Gemm).

https://github.com/microsoft/onnxjs/blob/8143e052c05c4b08798889c08a88c760b0238eab/src/wasm-ops/gemm.cpp#L22-L79 should be the same as https://github.com/pytorch/pytorch/blob/3cbf308a1fcc13aa9411740ce0a56fc18eb5e302/caffe2/utils/math_cpu.cc#L80. The commented PyTorch code indicates Eigen to be used as cross-platform fallback. We can omit USE_ONNX_SGEMM and simply call it USE_EIGEN to skip cloning more submodules than necessary?

There are ODR compatible ways for multiple sgemm to exist, and what seems to perhaps better in my opinion is a provider::sgemm(...) with sgemm following a standard API.

There probably isn't. It's a C-API and will conflict between OpenBLAS/MKL. I'm stupid. 🤦 We should still be able to flatten a bit of nesting though and redundant layers in between.

@jerinphilip
Copy link
Author

I have managed to do something at https://github.com/jerinphilip/sgemm/blob/c1e0f921b06882e75bedce5381772191b5d81a46/src/gemm.h, mixed feelings on whether the state of source is better or worse. Good news is there's the necessary foundation to measure and improve now. I have also managed to skip onnx-sgemm to directly tap into Eigen instead.

ThinkPad (AVX2, MKL):

+ ./benchmark --batchSize 1000 --provider eigen

real    1m29.110s
user    1m10.299s
sys     0m18.329s

+ ./benchmark --batchSize 1000 --provider ruy

real    1m7.825s
user    0m52.234s
sys     0m15.150s

+ ./benchmark --batchSize 1000 --provider mkl

real    0m43.771s
user    0m31.833s
sys     0m11.817s

Oracle-Cloud ARM:

+ ./benchmark --batchSize 100 --provider blas

real    0m9.388s
user    0m17.695s
sys     0m19.640s

+ ./benchmark --batchSize 100 --provider ruy

real    0m11.816s
user    0m19.854s
sys     0m24.024s

+ ./benchmark --batchSize 100 --provider eigen

real    3m7.278s
user    3m14.561s
sys     0m24.816s

@jerinphilip
Copy link
Author

Android (Termux)

$ bash ../scripts/run.sh
+ ./benchmark --batchSize 1000 --provider eigen

real    5m27.436s
user    1m35.707s
sys     0m20.999s
+ ./benchmark --batchSize 1000 --provider ruy

real    5m12.531s
user    1m25.261s
sys     0m30.092s

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant