-
Notifications
You must be signed in to change notification settings - Fork 9.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
sycl : Reenabled mmvq path for the SYCL Nvidia Backend #8372
sycl : Reenabled mmvq path for the SYCL Nvidia Backend #8372
Conversation
@OuadiElfarouki @joeatodd @AidanBeltonS tagging for the discussion wrt the backend. @airMeng @NeoZhangJianyu: What do you think of this? Is the change ok for the multi-device implementation? I considered the alternative of re-adding a macro to "enable mmvq", but a dynamic check seems simpler as we already have plenty of Macros. |
ggml/src/ggml-sycl.cpp
Outdated
@@ -3658,6 +3658,10 @@ static void ggml_sycl_mul_mat(ggml_backend_sycl_context & ctx, const ggml_tensor | |||
use_mul_mat_q = use_mul_mat_q && (src1->ne[1] <= MMQ_MAX_BATCH_SIZE); | |||
#endif // SYCL_USE_XMX | |||
|
|||
// mmvq path is faster in the Nvidia backend but slower on the Intel backend |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// mmvq path is faster in the Nvidia backend but slower on the Intel backend
Suggest removing this comment:
-
No need such detailed comment for 2 lines codes.
It's common sense to optimize by adding branch for hardware type. -
We should avoid mention the brand of commercial company in code.
Suggest using CUDA and SYCL for them if needed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I disagree with 1. I agree that it makes sense to optimize for hardware type, but the reasoning and the change in the dispatch is not necessarily obvious to anyone that hasn't been actively developing the backend.
I agree with 2.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK! I see the update.
Intel and Nvidia backends have different "good" paths in the mulmat dispatch.
Recent changes in the SYCL backend for mulmat improved performance for the Level Zero backend, but affected negatively to the CUDA backend by disabling the
mmvq
path (which forcesdmmv
).This patch proposes a change for the preferred path for
sycl::backend::ext_oneapi_cuda
, without affecting any changes made to the Intel backend, as reenablingmmvq
showed a significant performance drop.Summary of performance changes:
Hardware: Nvidia A100
llama-bench params: tg 128, ngl 81.
Benchmarking
Rows can be seen in pairs (PR above, master below) to compare performance.