Plans for 8da4w quantization #883

sanchitintel · 2024-09-12T23:05:24Z

Hi,

From #430, it seems that 8da4w is primarily for Executorch, and is set to be deprecated. Please advise if there are any plans to enable it for CUDA & CPU as well, such that int4 weights could be converted to int8 just before computation?

Thanks!

cc @jerryzh168

Xia-Weiwen · 2024-09-13T01:02:50Z

Hi @jerryzh168 I saw your pointer here:
https://github.com/pytorch/ao/tree/main/torchao/quantization#to-be-deprecated-a8w8-dynamic-quantization
However, we need 8da4w for CPU and XPU and we don't want it deprecated. May I know any concern from your side? Thanks

CC. @jgong5 @leslie-fang-intel

jerryzh168 · 2024-09-13T02:26:32Z

this is used for executorch before, but it seems that we have people adding kernels here: #880. we are open to adding kernels for this

Xia-Weiwen · 2024-09-13T02:28:12Z

this is used for executorch before, but it seems that we have people adding kernels here: #880. we are open to adding kernels for this

Got it. Thanks for the clarification!

jerryzh168 · 2024-09-13T02:28:19Z

@Xia-Weiwen that link is talking about the quantizer API since we are updating to the quantize_ API, we'll be using something like https://github.com/pytorch/ao/tree/main/torchao/quantization#a8w8-dynamic-quantization but with

ao/torchao/quantization/quant_api.py

Line 83 in 8236a87

"int8_dynamic_activation_int4_weight",

as the second argument. i.e.:

quantize_(model, int8_dynamic_activation_int4_weight())

Xia-Weiwen · 2024-09-13T02:45:53Z

Thanks for the pointers!

sanchitintel · 2024-09-13T21:04:19Z

Thanks for clarifying, @jerryzh168!

…el (pytorch#883) * [Dist][Inference] Explore checkpoint loading

sanchitintel closed this as completed Sep 13, 2024

yanbing-j pushed a commit to yanbing-j/ao that referenced this issue Dec 9, 2024

[Dist][Inference] Enable distributed checkpoint loading for large mod…

f808b70

…el (pytorch#883) * [Dist][Inference] Explore checkpoint loading

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Plans for 8da4w quantization #883

Plans for 8da4w quantization #883

sanchitintel commented Sep 12, 2024

Xia-Weiwen commented Sep 13, 2024

jerryzh168 commented Sep 13, 2024

Xia-Weiwen commented Sep 13, 2024

jerryzh168 commented Sep 13, 2024 •

edited

Loading

Xia-Weiwen commented Sep 13, 2024

sanchitintel commented Sep 13, 2024

Plans for 8da4w quantization #883

Plans for 8da4w quantization #883

Comments

sanchitintel commented Sep 12, 2024

Xia-Weiwen commented Sep 13, 2024

jerryzh168 commented Sep 13, 2024

Xia-Weiwen commented Sep 13, 2024

jerryzh168 commented Sep 13, 2024 • edited Loading

Xia-Weiwen commented Sep 13, 2024

sanchitintel commented Sep 13, 2024

jerryzh168 commented Sep 13, 2024 •

edited

Loading