[Feature] vLLM enablement for 22 GenAI examples #1436

joshuayao · 2025-01-21T00:42:22Z

Priority

P1-Stopper

OS type

Ubuntu

Hardware type

Gaudi3

Running nodes

Single Node

Description

Feature Objective:
Set vLLM as the default serving framework on Gaudi for all remaining GenAI examples to leverage its optimized performance characteristics, thereby improving throughput and reducing latency in inference tasks.

Feature Details:

Replace TGI with vLLM as the default serving backend for inference on Gaudi devices.
Update serving configurations to align with vLLM's architecture for inference.
Perform performance benchmarking to validate vLLM's superiority in terms of TTFT, TPOT and scalability on Gaudi hardware.

Expected Outcome:
Adopting vLLM as the default framework improves the user experience by significantly lowering latency while exceeding the current TGI throughput levels on Gaudi.

joshuayao added the feature New feature or request label Jan 21, 2025

joshuayao added this to the v1.3 milestone Jan 21, 2025

joshuayao added this to OPEA Jan 21, 2025

joshuayao assigned lvliang-intel Jan 21, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Feature] vLLM enablement for 22 GenAI examples #1436

[Feature] vLLM enablement for 22 GenAI examples #1436

joshuayao commented Jan 21, 2025

[Feature] vLLM enablement for 22 GenAI examples​ #1436

[Feature] vLLM enablement for 22 GenAI examples​ #1436

Comments

joshuayao commented Jan 21, 2025

Priority

OS type

Hardware type

Running nodes

Description

[Feature] vLLM enablement for 22 GenAI examples #1436

[Feature] vLLM enablement for 22 GenAI examples #1436