INC ONNX Runtime 3.x API design #1532
Unanswered
mengniwang95
asked this question in
General
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
INC ONNX Runtime 3.x API Design
Target
Main principles
autotune
is the exposed user interface API, which requires a set of configurations._quantize
is an internal API.GPTQConfig
and autotune will use a set of configurations.Repo Architecture
autotune
are imported here.calculate_scale_zp
.Previous Design
StaticQuant & SmoothQuant
Weight-only Quantization
New Design
StaticQuant & SmoothQuant
Configuration
The argument to config is data or a list of data. If the parameters can be assembled into different configurations, the returned obj will be a list of configurations used for autotuning.
Quantize Interface
Weight-only Quantization
Configuration
Quantize Interface
Beta Was this translation helpful? Give feedback.
All reactions