Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature] vLLM enablement for 22 GenAI examples​ #1436

Open
joshuayao opened this issue Jan 21, 2025 · 0 comments
Open

[Feature] vLLM enablement for 22 GenAI examples​ #1436

joshuayao opened this issue Jan 21, 2025 · 0 comments
Assignees
Labels
feature New feature or request
Milestone

Comments

@joshuayao
Copy link
Collaborator

Priority

P1-Stopper

OS type

Ubuntu

Hardware type

Gaudi3

Running nodes

Single Node

Description

Feature Objective:
Set vLLM as the default serving framework on Gaudi for all remaining GenAI examples to leverage its optimized performance characteristics, thereby improving throughput and reducing latency in inference tasks.

Feature Details:

Replace TGI with vLLM as the default serving backend for inference on Gaudi devices.
Update serving configurations to align with vLLM's architecture for inference.
Perform performance benchmarking to validate vLLM's superiority in terms of TTFT, TPOT and scalability on Gaudi hardware.

Expected Outcome:
Adopting vLLM as the default framework improves the user experience by significantly lowering latency while exceeding the current TGI throughput levels on Gaudi.

@joshuayao joshuayao added the feature New feature or request label Jan 21, 2025
@joshuayao joshuayao added this to the v1.3 milestone Jan 21, 2025
@joshuayao joshuayao added this to OPEA Jan 21, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature New feature or request
Projects
Status: No status
Development

No branches or pull requests

2 participants