unnecessary float() variables cause quantization to fail. #281

BmanClark · 2024-03-22T15:32:01Z

I'm quantizing the MI-GAN network that I have previously converted to tflite successfully with your help (thank-you!)

I'm basing my conversion off your https://github.com/alibaba/TinyNeuralNetwork/blob/main/examples/quantization/post.py script, and I've managed to work my way through some difficulties (especially convincing pytorch not to quietly convert floats to doubles, which then stop the conversion - a force double to float before quantizing switch might be a nice feature).

My current issue though seems an unnecessary one. I'm hitting:
"Creation of quantized tensor requires quantized dtype like torch.quint8" which appears to come from https://github.com/pytorch/pytorch/blob/main/aten/src/ATen/native/quantized/TensorFactories.cpp line 115 or 128 or similar.

This is because in (Traceback):
converter.convert()
File "[longpath]/tinynn/converter/base.py", line 476, in convert
self.init_jit_graph()
File "[longpath]/tinynn/converter/base.py", line 228, in init_jit_graph
script = torch.jit.trace(self.model, self.dummy_input)
it reads a python script version of the model created earlier by quantizer.quantize() (generator_q.py.txt

.txt added to allow upload), which contains:

        float_0_f = fake_quant_1.float()                     # line that causes failure
        fake_quant_1 = None

[earlier: self.fake_quant_1 = torch.quantization.QuantStub() ]
float_0_f is then never used, so could just not exist, rather than cause a failure. There are dozens of these, and none of the float values are used.

Is there a way to stop these values being created? I can't immediately see where they come from to know if I can adjust the model or similar, but as they're unused, is there a way to have them automatically pruned?

A side question - is there a way to convert expected inputs from float to int as well? I have image input to the network that I have to convert from 0-256 to -1.0-1.0, so if there was a way to convert to sticking with integer, that would also be useful.

The text was updated successfully, but these errors were encountered:

zk1998 · 2024-03-22T15:32:34Z

您好，我已经收到您的邮件，我会尽快查看并在第一时间回复您。

peterjc123 · 2024-03-24T03:41:07Z

Is there a way to stop these values being created? I can't immediately see where they come from to know if I can adjust the model or similar, but as they're unused, is there a way to have them automatically pruned?

Would you please try quantizer = PostQuantizer(..., config={'extra_tracer_opts': {"eliminate_dead_graph": True}})?

A side question - is there a way to convert expected inputs from float to int as well? I have image input to the network that I have to convert from 0-256 to -1.0-1.0, so if there was a way to convert to sticking with integer, that would also be useful.

converter = TFLiteConverter(..., fuse_quant_dequant=True) might work for your case.

BmanClark · 2024-03-25T09:53:48Z

Thank you for the tips! That's got me a bit further, although now I'm getting:
RuntimeError: createStatus == pytorch_qnnp_status_success INTERNAL ASSERT FAILED at "../aten/src/ATen/native/quantized/cpu/BinaryOps.cpp":203, please report a bug to PyTorch. failed to create QNNPACK Add operator
So I'll go and do that and report the bug to PyTorch, after I've made sure I'm on latest PyTorch.
The eliminate_dead_graph seems like a very useful option people are likely to want, and might want to be in the example or more prominent in some way?

And I look forward to trying the fuse_quant_dequant=True. Do I make the dummy input int for that final conversion step as well presumably? Does it cause any issues with PTQ calibration as the expected ranges will be different?

peterjc123 · 2024-03-25T11:12:01Z

Thank you for the tips! That's got me a bit further, although now I'm getting:
RuntimeError: createStatus == pytorch_qnnp_status_success INTERNAL ASSERT FAILED at "../aten/src/ATen/native/quantized/cpu/BinaryOps.cpp":203, please report a bug to PyTorch. failed to create QNNPACK Add operator

This may happen if

you forget to load the weights
the values you try to add are very different e.g. 1e-8 + 1.0 I don't think it should be quantized in that case.

The eliminate_dead_graph seems like a very useful option people are likely to want, and might want to be in the example or more prominent in some way?

Sure, I can add this.

And I look forward to trying the fuse_quant_dequant=True. Do I make the dummy input int for that final conversion step as well presumably? Does it cause any issues with PTQ calibration as the expected ranges will be different?

Nope, the input for the conversion step won't affect the PTQ calibration, because the quantized ranges are frozen when you call quantizer.convert().

BmanClark · 2024-03-25T11:44:31Z

This may happen if

you forget to load the weights

the values you try to add are very different e.g. 1e-8 + 1.0 I don't think it should be quantized in that case.

I'm definitely loading the weights, so I guess I'll be looking for if there's something like 2. And then if it's important or if it's something that should be 0, but is hitting float rounding errors or similar.
Thanks for the information!

BmanClark · 2024-03-25T18:03:32Z

I think it must be case 2, as I eventually found in the output:
Error in QNNPACK: failed to create add operator with 8.319039e-06 A-to-output scale ratio: scale ratio must be in [2**-14, 2**8) range
I'm guessing you think that this means the network isn't a good candidate for quantization? Surely if you're adding 2 things on very different scales the smaller is going to be ignored in float32 or int8? But maybe it makes quantization have impossible calculations I guess.

PyTorch don't make it easy to report bugs (not sure I can make a minimal reproducer), and from what you say it might not be a bug anyway? I've tried to ask a question on PyTorch forums, but that's awaiting moderation...

Thanks for your help (and if you've got any other ideas!), but it looks like this one might just be a quantization fail?

peterjc123 · 2024-03-26T02:42:06Z

@BmanClark I guess you may print out the input/output scales of the add operations to find out something weird. You can do that on a model after quantizer.convert.

peterjc123 added the question Further information is requested label Mar 23, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

unnecessary float() variables cause quantization to fail. #281

unnecessary float() variables cause quantization to fail. #281

BmanClark commented Mar 22, 2024

zk1998 commented Mar 22, 2024 via email

peterjc123 commented Mar 24, 2024

BmanClark commented Mar 25, 2024

peterjc123 commented Mar 25, 2024 •

edited

Loading

BmanClark commented Mar 25, 2024 •

edited

Loading

BmanClark commented Mar 25, 2024

peterjc123 commented Mar 26, 2024

unnecessary float() variables cause quantization to fail. #281

unnecessary float() variables cause quantization to fail. #281

Comments

BmanClark commented Mar 22, 2024

zk1998 commented Mar 22, 2024 via email

peterjc123 commented Mar 24, 2024

BmanClark commented Mar 25, 2024

peterjc123 commented Mar 25, 2024 • edited Loading

BmanClark commented Mar 25, 2024 • edited Loading

BmanClark commented Mar 25, 2024

peterjc123 commented Mar 26, 2024

peterjc123 commented Mar 25, 2024 •

edited

Loading

BmanClark commented Mar 25, 2024 •

edited

Loading