[BUG] Set Device when kernel be applied into Multiple GPUs. #155

LeiWang1999 · 2024-08-28T14:37:14Z

This pull request includes several changes to improve device compatibility and streamline the weight transformation process in the bitblas module. The most important changes involve modifying various functions to ensure they correctly handle the device context for GPU operations.

Device Compatibility Improvements:

bitblas/module/__init__.py: Updated the repack_from_gptq method to accept a device parameter, allowing the weight transformation to be performed on the specified device.
bitblas/ops/general_matmul/__init__.py: Modified the transform_weight method to use the device of the input weight tensor instead of defaulting to CUDA. [1] [2]
bitblas/ops/general_matmul/__init__.py: Adjusted the forward method to set and use the correct CUDA stream based on the device of the input tensor A.

Code Simplification:

bitblas/gpu/matmul_mma_dequantize.py: Removed obsolete condition for zeros_mode and simplified the condition for with_scaling.

LeiWang1999 · 2024-08-28T14:54:12Z

Performance:

1x8192x8192

Lib + SetDevice

kernel only: 0.011571199999999999
Profile mixed-precision matrix multiplication
op:
num_repeats = 28149
0.021342752822495984

Lib only

kernel only: 0.0113664
Profile mixed-precision matrix multiplication
num_repeats = 42799
op: 0.011026431735468434

Torch Func

kernel only: 0.0113664
Profile mixed-precision matrix multiplication
op:

num_repeats = 19784
0.03557460924126193

LeiWang1999 · 2024-08-28T14:54:42Z

it's a bit cost..

LeiWang1999 · 2024-08-28T15:13:12Z

We suggest performing the setDevice operation outside the kernel, as in multi-GPU environments, multiple contexts need to be switched, not just BitBLAS.

LeiWang1999 added 4 commits August 28, 2024 14:06

Merge branch 'main' of https://github.com/microsoft/BitBLAS into main

2d4d44d

remove debug print

390ad18

Refactor Matmul class for improved readability and maintainability

dcf3a2e

Refactor Matmul class for improved readability and maintainability

42b4213

revert set device

d3674ec

lint fix

02176b2

LeiWang1999 merged commit 393c53e into microsoft:main Aug 28, 2024
6 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BUG] Set Device when kernel be applied into Multiple GPUs. #155

[BUG] Set Device when kernel be applied into Multiple GPUs. #155

LeiWang1999 commented Aug 28, 2024

LeiWang1999 commented Aug 28, 2024

LeiWang1999 commented Aug 28, 2024

LeiWang1999 commented Aug 28, 2024

[BUG] Set Device when kernel be applied into Multiple GPUs. #155

[BUG] Set Device when kernel be applied into Multiple GPUs. #155

Conversation

LeiWang1999 commented Aug 28, 2024

Device Compatibility Improvements:

Code Simplification:

LeiWang1999 commented Aug 28, 2024

LeiWang1999 commented Aug 28, 2024

LeiWang1999 commented Aug 28, 2024