[WIP, Kernel] (3/N) Machete W4A8 #8046

LucasWilkinson · 2024-08-30T20:44:21Z

No description provided.

github-actions · 2024-08-30T20:44:32Z

👋 Hi! Thank you for contributing to the vLLM project.
Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which consists a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of default ones by unblocking the steps in your fast-check build on Buildkite UI.

Once the PR is approved and ready to go, please make sure to run full CI as it is required to merge (or just use auto-merge).

To run full CI, you can do one of these:

Comment /ready on the PR
Add ready label to the PR
Enable auto-merge.

🚀

cli99 · 2024-09-17T15:56:52Z

Hi @LucasWilkinson, I ran the w4a8 benchmark in the PR, the gemm perf from machete is 15-20% slower than the marline kernels. Is that expected? Thanks.

LucasWilkinson · 2024-09-17T16:46:27Z

@cli99 Thanks for your interest in the kernels! For starters I will preface by saying this is work-in-progress PR so I still need to do some performance tuning (the updated heuristic in #7701 will likely help this PR once its merged). We do expect that for an M dim (batch-size * seq-len) <= 64 for Marlin to outperform this current PR, but you should see speedups for M > 64 and larger speedups at M >= 128 for most shapes. What were the shapes that you tested?

zkf331 · 2024-10-21T02:23:44Z

Hi, @LucasWilkinson, thank you for your amazing work! I wanted to install machete-w4a8 for testing, and the installation was successful. However, when I ran benchmark_machete.py, I encountered the following issue:

  File "~/vllm-0.6.3-machete-w4a8/./benchmarks/kernels/benchmark_machete.py", line 255, in <lambda>
    return lambda: ops.machete_mm(
  File "~/vllm-0.6.3-machete-w4a8/vllm/_custom_ops.py", line 43, in wrapper
    raise NotImplementedError(msg % (fn.__name__, e)) from e
NotImplementedError: Error in calling custom op machete_mm: Could not run '_C::machete_mm' with arguments from the 'CUDA' backend. This could be because the
operator doesn't exist for this backend, or was omitted during the selective/custom build process (if using custom build). If you are a Facebook employee usi
ng PyTorch on mobile, please visit https://fburl.com/ptmfixes for possible resolutions. '_C::machete_mm' is only available for these backends: [Meta, BackendSelect, Python, FuncTorchDynamicLayerBackMode, Functionalize, Named, Conjugate, Negative, ZeroTensor, ADInplaceOrView, AutogradOther, AutogradCPU, AutogradCUDA, AutogradXLA, AutogradMPS, AutogradXPU, AutogradHPU, AutogradLazy, AutogradMeta, Tracer, AutocastCPU, AutocastXPU, AutocastCUDA, FuncTorchBatched, Batched
NestedTensor, FuncTorchVmapMode, Batched, VmapMode, FuncTorchGradWrapper, PythonTLSSnapshot, FuncTorchDynamicLayerFrontMode, PreDispatch, PythonDispatcher].

Environment configuration:

GPU: H800*80G
Torch: '2.4.0+cu121'
Python: 3.10.12

Could you please provide some suggestions?

LucasWilkinson · 2024-10-30T20:58:34Z

Superseded by: #9855

LucasWilkinson changed the title ~~[WIP, Kernel] Machete W4A8~~ [WIP, Kernel] (3/N) Machete W4A8 Aug 30, 2024

LucasWilkinson force-pushed the lwilkinson/machete-w4a8 branch from 358a67a to 76a5c3d Compare October 2, 2024 02:11

LucasWilkinson added 16 commits October 7, 2024 04:14

WIP

f5dd53f

machete qqq support

691862b

working

3bf434e

fp8 passing

a1cec07

formatting

22a2234

update benchmark

ddafe9f

update benchmarks

48c92a7

tests passing

3a1ad23

formatting

18f10b6

update benchmarking

79c6553

fix sweep

2756543

cleanup machete bench

6e8c6db

add 405b shapes

dcbd0fc

add profile wrapper

d89c07e

format

007889e

fix op-check

02fb2c0

LucasWilkinson force-pushed the lwilkinson/machete-w4a8 branch from 261b1c2 to 02fb2c0 Compare October 7, 2024 04:18

LucasWilkinson added 2 commits October 23, 2024 15:39

Merge remote-tracking branch 'origin/main' into lwilkinson/machete-w4a8

e2c53c9

clean up

9abe624

LucasWilkinson closed this Oct 30, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[WIP, Kernel] (3/N) Machete W4A8 #8046

[WIP, Kernel] (3/N) Machete W4A8 #8046

LucasWilkinson commented Aug 30, 2024

github-actions bot commented Aug 30, 2024

cli99 commented Sep 17, 2024

LucasWilkinson commented Sep 17, 2024

zkf331 commented Oct 21, 2024

LucasWilkinson commented Oct 30, 2024

[WIP, Kernel] (3/N) Machete W4A8 #8046

[WIP, Kernel] (3/N) Machete W4A8 #8046

Conversation

LucasWilkinson commented Aug 30, 2024

github-actions bot commented Aug 30, 2024

cli99 commented Sep 17, 2024

LucasWilkinson commented Sep 17, 2024

zkf331 commented Oct 21, 2024

LucasWilkinson commented Oct 30, 2024