Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

why we should use apply_custom_rules_to_quantizer? #8

Closed
dahaiyidi opened this issue Nov 1, 2023 · 1 comment
Closed

why we should use apply_custom_rules_to_quantizer? #8

dahaiyidi opened this issue Nov 1, 2023 · 1 comment
Assignees
Labels

Comments

@dahaiyidi
Copy link

dahaiyidi commented Nov 1, 2023

In quantize.py I find the following function. And it is used in qat.py. why should we find the quantizer_pairs? why should we set:
major = bottleneck.cv1.conv._input_quantizer
bottleneck.addop._input0_quantizer = major
bottleneck.addop._input1_quantizer = major

`

def apply_custom_rules_to_quantizer(model : torch.nn.Module, export_onnx : Callable):

# apply rules to graph
export_onnx(model, "quantization-custom-rules-temp.onnx")
pairs = find_quantizer_pairs("quantization-custom-rules-temp.onnx")
print(pairs)
for major, sub in pairs:
    print(f"Rules: {sub} match to {major}")
    get_attr_with_path(model, sub)._input_quantizer = get_attr_with_path(model, major)._input_quantizer  # why use the same input_quantizer??
os.remove("quantization-custom-rules-temp.onnx")

for name, bottleneck in model.named_modules():
    if bottleneck.__class__.__name__ == "Bottleneck":
        if bottleneck.add:
            print(f"Rules: {name}.add match to {name}.cv1")
            major = bottleneck.cv1.conv._input_quantizer
            bottleneck.addop._input0_quantizer = major
            bottleneck.addop._input1_quantizer = major

`

Thanks.

@zerollzeng zerollzeng assigned zerollzeng and unassigned zerollzeng Nov 16, 2023
@liuanqi-libra7
Copy link
Collaborator

If we use https://github.com/NVIDIA-AI-IOT/cuDLA-samples/tree/main/export#option1, the generated model can also run on the GPU. However, If the Q&DQ nodes of these tensors are inconsistent, there are a lot of useless int8->fp16 and fp16->int8 data convert in our QAT model. This will slow down the model inference speed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

4 participants