Lower quant/dequant torch op to StableHLO #5763

lsy323 · 2023-11-02T16:59:53Z

The following torch ops can be lowered to StableHLO with this diff:

quantize_per_tensor
quantize_per_channel
dequantize_per_tensor
dequantize_per_channel

User Experience

The GraphModule generated from PT2E quantization can be exported to StableHLO, or tf.saved_model using the existing exporting API without any additional change on model code, or exporting script. STABLEHLO_BYTECODE_FROM_PRETTYPRINT needs to be set to 1 to workaround a StableHLO bytecode serialization issue.

Current workflow

Register xla qdq ops to 'XLA' dispatch key. So the qdq ops will be dispatched to xla impl during LTC tracing.
During lowering, qdq ops are lowered to a custom call to stablehlo.uniform_quantize/dequantize in HLO. The qparams are stored in the custom call config str. The config str can be deserialized to mlir DictAttr directly.
HLO->StableHLO converter will convert custom call to stablehlo.uniform_quantize/dequantize

Changes

Allow save_torch_module_as_tf_saved_model to take GraphModule as well, since PT2E outputs a GraphModule.
Added 2 patches. One is to add support to HLO->StableHLO converter for stablehlo.uniform_quantize/dequantize conversion, originally authored by @sdasgup3. Another is to workaround a StableHLO bytecode serialization issue mentioned above. Both won't be needed if HLO qdtype representation is added.
Added new xla quantize_tensor/dequantize_tensor ops for qdq ops lowering. the xla quantize/dequantize op lowers to custom call to stablehlo.uniform_quantize/dequantize
Test script including exporting per-tensor/channel qdq ops and PT2E quantized resnet18 model.

Future Work

When qdtype is added to HLO, the lowering logic need to be updated and will be more concise than the current one.

cc @sdasgup3 @GleasonK @paulinesho

WORKSPACE

lowering to HLO custom call.

…ation

refactor add quant util rename test script

clean up quant op

torch_xla/tf_saved_model_integration.py

torch_xla/csrc/runtime/stablehlo_helper.cc

JackCaoG · 2023-11-27T19:37:41Z

test/stablehlo/test_pt2e_qdq.py

+    # Step 1: export resnet18
+    args = (torch.randn(1, 3, 224, 224),)
+    m = torchvision.models.resnet18().eval()
+    m = capture_pre_autograd_graph(m, args)


is there a reason we use this instead of torch.export?

ok I saw the export below, but still confuse what this function does to the module.

Here the graph is captured for PT2E to further process. PT2E doesn't work with graph captured from torch.exported (just tried locally), it needs to capture the graph in this way.

The export down below is for PyTorch -> StableHLO exporting, our API only works on exported program

test/stablehlo/test_pt2e_qdq.py

torch_xla/csrc/init_python_bindings.cpp

torch_xla/csrc/ops/dequant_tensor.h

torch_xla/csrc/quant_util.h

test/stablehlo/test_pt2e_qdq.py

…raph module

lsy323 · 2023-11-28T07:50:52Z

Update:

Addressed review comments
Enhanced testing script to check the qparam of qdq stablehlo ops, numbers of qdq ops
Added more assertions to the torch_xla qdq ops, including scale, zero_point shape, zero_point dtype matches int dtype of quantized type, scale values are all positive

test/stablehlo/test_pt2e_qdq.py

(de)quantize_per_tensor/channel ops from PT2E quantization workflow are lowered to stablehlo uniform_dequantize/quantize. --------- Co-authored-by: Siyuan Liu <lsiyuan@google.coim>

lsy323 requested review from miladm, qihqi and JackCaoG November 2, 2023 16:59

JackCaoG reviewed Nov 2, 2023

View reviewed changes

WORKSPACE Show resolved Hide resolved

Siyuan Liu and others added 17 commits November 18, 2023 00:38

Add q/dq custom op in torch_xla

ea31279

Add stablehlo converter change to support quantized tensor. Update

c6bc19e

lowering to HLO custom call.

Add per-channel quant, add resnet test

79632c2

add stablehlo patch for serializing qdq into bytecode

108992b

add support of exporting qdq to tf.saved_model

1adc299

make xla quant/dequant op compatible with decomp table

3efb881

format

9a6bda1

untrack some tests

23b2aa5

move resnet test under stablehlo test folder

7f62092

add patch for stablehlo.uniform_quantize/dequantize bytecode serializ…

a512241

…ation

rename files

3d00629

refactor add quant util rename test script

use python dispatcher for qdq

341c666

clean up quant op

fix compile

c779522

fix linter

7f79a49

add per tensor/channel tests

fbb68e0

format

163de14

have all scalers have positive values

c329b63

lsy323 force-pushed the lsiyuan/quant-dequant-dispatch branch from e70be80 to c329b63 Compare November 18, 2023 00:39

add comment for STABLEHLO_BYTECODE_FROM_PRETTYPRINT

9177ae0

lsy323 requested a review from JackCaoG November 27, 2023 18:19

qihqi approved these changes Nov 27, 2023

View reviewed changes

torch_xla/tf_saved_model_integration.py Outdated Show resolved Hide resolved

torch_xla/csrc/runtime/stablehlo_helper.cc Show resolved Hide resolved