LoRAX: Dynamic loading and optimized inference of LoRA adapter models. #505
Labels
AI-Chatbots
Topics related to advanced chatbot platforms integrating multiple AI models
Algorithms
Sorting, Learning or Classifying. All algorithms go here.
finetuning
Tools for finetuning of LLMs e.g. SFT or RLHF
llm-applications
Topics related to practical applications of Large Language Models in various fields
llm-inference-engines
Software to run inference on large language models
llm-serving-optimisations
Tips, tricks and tools to speedup inference of large language models
MachineLearning
ML Models, Training and Inference
PEFT
Parameter Efficient Fine Tuning of LLMs e.g. LoRA Low Rank Adapter
LoRAX Docs
Multi-LoRA inference server that scales to 1000s of fine-tuned LLMs
📖 What is LoRAX?
LoRAX (LoRA eXchange) is a framework that allows users to serve thousands of fine-tuned models on a single GPU, dramatically reducing the cost of serving without compromising on throughput or latency.
🌳 Features
URL: https://predibase.github.io/lorax/?h=cpu#features
Suggested labels
{ "label-name": "LoRA Framework", "description": "A powerful framework for serving fine-tuned models on a single GPU efficiently.", "repo": "llm-inference-engines", "confidence": 98.7 }
The text was updated successfully, but these errors were encountered: