Multi-Adaptor Support for Edge Devices #7755
zhipenghan
started this conversation in
Ideas
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Idea proposal:
To support multiple fine-tuned adaptors simultaneously on memory-constrained edge devices, enabling users to leverage diverse capabilities of Small Language Models (SLMs) while optimizing resource utilization.
Problem Statement:
On edge devices, general SLMs often struggle with specific tasks, and hosting separate SLMs for each downstream task is costly. Engineers typically fine-tune models to enhance their capabilities, but this approach has limitations.
Proposal:
Allow users to host a base model and a series of adaptors that augment the base model's capabilities. This approach would unlock the potential for customizing SLM capabilities, enabling users to:
I try to research and find ONNX runtime has similar design but the performance is not good enough.
[Performance] A way to share weights between sessions · Issue #15301 · microsoft/onnxruntime (github.com)
[Performance] Share weights between sessions to accelerate inference · Issue #20172 · microsoft/onnxruntime (github.com)
Beta Was this translation helpful? Give feedback.
All reactions