You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
In quantize.py I find the following function. And it is used in qat.py. why should we find the quantizer_pairs? why should we set:
major = bottleneck.cv1.conv._input_quantizer
bottleneck.addop._input0_quantizer = major
bottleneck.addop._input1_quantizer = major
# apply rules to graph
export_onnx(model, "quantization-custom-rules-temp.onnx")
pairs = find_quantizer_pairs("quantization-custom-rules-temp.onnx")
print(pairs)
for major, sub in pairs:
print(f"Rules: {sub} match to {major}")
get_attr_with_path(model, sub)._input_quantizer = get_attr_with_path(model, major)._input_quantizer # why use the same input_quantizer??
os.remove("quantization-custom-rules-temp.onnx")
for name, bottleneck in model.named_modules():
if bottleneck.__class__.__name__ == "Bottleneck":
if bottleneck.add:
print(f"Rules: {name}.add match to {name}.cv1")
major = bottleneck.cv1.conv._input_quantizer
bottleneck.addop._input0_quantizer = major
bottleneck.addop._input1_quantizer = major
`
Thanks.
The text was updated successfully, but these errors were encountered:
If we use https://github.com/NVIDIA-AI-IOT/cuDLA-samples/tree/main/export#option1, the generated model can also run on the GPU. However, If the Q&DQ nodes of these tensors are inconsistent, there are a lot of useless int8->fp16 and fp16->int8 data convert in our QAT model. This will slow down the model inference speed.
In quantize.py I find the following function. And it is used in qat.py. why should we find the quantizer_pairs? why should we set:
major = bottleneck.cv1.conv._input_quantizer
bottleneck.addop._input0_quantizer = major
bottleneck.addop._input1_quantizer = major
`
def apply_custom_rules_to_quantizer(model : torch.nn.Module, export_onnx : Callable):
`
Thanks.
The text was updated successfully, but these errors were encountered: