Replies: 1 comment
-
A model with 2 million parameters is generally not classified as a large model. Running a single ONNX module may simplifies the inference pipeline, especially for smaller models where the overhead of splitting might outweigh the benefits. ONNX Runtime is designed to optimize inference performance, which can be particularly beneficial when running a single model instance. A single ONNX module may handle batch processing more efficiently, as it can optimize memory access patterns and computational resources without needing to synchronize multiple submodules. In case of multiple ONNX submodule, if one module becomes a bottleneck, due to being more computationally intensive or poorly optimized, it can slow down the entire inference process. Sometimes, due to the high computational demands of the model, it may monopolize CPU resources, leading to reduced performance for other processes running on the same core. This can result in slower overall system performance during inference. |
Beta Was this translation helpful? Give feedback.
-
Say I have a model of ~2M parameters. What are the major pro/cons in terms of inference speed on single core cpu of running this model in one single onnx and breaking it into smaller submodules?
Thanks! Looking forward to your insights.
Beta Was this translation helpful? Give feedback.
All reactions