vLLM is an open-source inference and serving engine designed to optimize the performance of large language models (LLMs). It achieves high throughput and memory efficiency with optimizations like PagedAttention, Dynamic Batching etc leading to Efficient Resource Utilization.
Platform Specific Instuctions and scripts used for LLM-Inference-Bench