Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

quantized models #14

Open
marsupialtail opened this issue Feb 24, 2021 · 1 comment
Open

quantized models #14

marsupialtail opened this issue Feb 24, 2021 · 1 comment

Comments

@marsupialtail
Copy link

Does this support quantized models by any chance?

@priyanksonis
Copy link

@marsupialtail yes, I tried , you can quantize the onnx models like this, then use quantized model for inference

from onnxruntime.quantization import quantize_dynamic, QuantType

#quantize encoder
model_input = "temp1/t5-own--encoder.onnx"
model_output = "temp1_compressed_onnxruntime/t5-own--encoder.onnx"

quantize_dynamic(model_input, model_output, weight_type=QuantType.QInt8)

#quantize decoder
model_input = "temp1/t5-own--decoder-with-lm-head.onnx"
model_output = "temp1_compressed_onnxruntime/t5-own--decoder-with-lm-head.onnx"

quantize_dynamic(model_input, model_output, weight_type=QuantType.QInt8)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants