Support int_scaled_mm on CPU #121

Xia-Weiwen · 2024-04-04T06:31:57Z

Description
int_scaled_mm is supported on CUDA only now. This PRs adds support for CPU.
The op is implemented by torch._int_mm, whose CPU version has been added to PyTorch recently by pytorch/pytorch#121792.
With this patch, SmoothQuant can go with int_scaled_mm on CPU with Inductor.
Example code:

import torch
from torchao.quantization.smoothquant import swap_linear_with_smooth_fq_linear, smooth_fq_linear_to_inference
# convert linear modules to smoothquant
# linear module in calibration mode
swap_linear_with_smooth_fq_linear(model)
model.train()
# Calibrate the model
for data in dataset:
    model(*data)
# set it to inference mode
smooth_fq_linear_to_inference(model.eval())
with torch.no_grad():
    optimized_model = torch.compile(model)
    _ = optimized_model(*example_inputs)
    _ = optimized_model(*example_inputs)

Run with TORCHAO_AUTOTUNER_ENABLE=1 and the following is found in the generated code:

auto buf3 = op_torchao_int_scaled_matmul_.call(reinterpret_tensor(buf1, {16L, 1024L}, {1024L, 1L}, 0L), _frozen_param3, reinterpret_tensor(buf2, {16L, 1024L}, {1L, 0L}, 0L));

Test plan
python test/kernel/test_autotuner.py -k test_int_scaled_mm

Xia-Weiwen · 2024-04-04T06:33:41Z

CC @jgong5 @leslie-fang-intel

test/kernel/test_autotuner.py

maktukmak · 2024-09-19T20:35:02Z

@Xia-Weiwen , @cpuhrsch , can you also add _scaled_mm for fp8 matmul? It would be very useful for FP quantization methods and mixed precision training. Currently, torch.matmul runs with FP8 inputs on the CPU but the result overflows. Probably, the accumulation dtype is FP8. _scaled_mm can solve this problem.

Xia-Weiwen · 2024-09-20T00:46:42Z

@Xia-Weiwen , @cpuhrsch , can you also add _scaled_mm for fp8 matmul? It would be very useful for FP quantization methods and mixed precision training. Currently, torch.matmul runs with FP8 inputs on the CPU but the result overflows. Probably, the accumulation dtype is FP8. _scaled_mm can solve this problem.

Hi @jgong5 @yanbing-j Could you please comment about FP8? Thanks.

yanbing-j · 2024-09-23T05:39:11Z

@Xia-Weiwen , @cpuhrsch , can you also add _scaled_mm for fp8 matmul? It would be very useful for FP quantization methods and mixed precision training. Currently, torch.matmul runs with FP8 inputs on the CPU but the result overflows. Probably, the accumulation dtype is FP8. _scaled_mm can solve this problem.

Hi @jgong5 @yanbing-j Could you please comment about FP8? Thanks.

At present, we are preparing to add CPU support of _scaled_mm using fp8 matmul. Hope this feature can be addressed in the near future.

Support int_scaled_mm on CPU

536ea7d

facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Apr 4, 2024

Merge branch 'main' into cpu_int_scaled_mm

a78f398

leslie-fang-intel approved these changes Apr 5, 2024

View reviewed changes

Merge branch 'main' into cpu_int_scaled_mm

d97e3c4

cpuhrsch reviewed Apr 5, 2024

View reviewed changes

test/kernel/test_autotuner.py Outdated Show resolved Hide resolved

Update test/kernel/test_autotuner.py

26800a4

cpuhrsch merged commit fc5d2c8 into pytorch:main Apr 5, 2024
7 checks passed

Xia-Weiwen mentioned this pull request Apr 8, 2024

Remove int_scaled_mm's dependency on triton for cpu #128

Merged

dbyoung18 pushed a commit to dbyoung18/ao that referenced this pull request Jul 31, 2024

Support int_scaled_mm on CPU (pytorch#121)

d4b112f

Xia-Weiwen deleted the cpu_int_scaled_mm branch September 20, 2024 00:44

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support int_scaled_mm on CPU #121

Support int_scaled_mm on CPU #121

Xia-Weiwen commented Apr 4, 2024 •

edited

Loading

Xia-Weiwen commented Apr 4, 2024

maktukmak commented Sep 19, 2024 •

edited

Loading

Xia-Weiwen commented Sep 20, 2024

yanbing-j commented Sep 23, 2024

Support int_scaled_mm on CPU #121

Support int_scaled_mm on CPU #121

Conversation

Xia-Weiwen commented Apr 4, 2024 • edited Loading

Xia-Weiwen commented Apr 4, 2024

maktukmak commented Sep 19, 2024 • edited Loading

Xia-Weiwen commented Sep 20, 2024

yanbing-j commented Sep 23, 2024

Xia-Weiwen commented Apr 4, 2024 •

edited

Loading

maktukmak commented Sep 19, 2024 •

edited

Loading