Skip to content

Commit

Permalink
Update README.md for Multiplatforms (#707)
Browse files Browse the repository at this point in the history
* Update README.md for Multiplatforms

Update README.md for Gaudi & Multiplatforms

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Update comps/llms/text-generation/vllm/ray/README.md

Co-authored-by: Malini Bhandaru <malini.bhandaru@intel.com>

* Update comps/llms/text-generation/vllm/ray/README.md

Co-authored-by: Malini Bhandaru <malini.bhandaru@intel.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Update comps/llms/text-generation/vllm/ray/README.md

Co-authored-by: Malini Bhandaru <malini.bhandaru@intel.com>

---------

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Malini Bhandaru <malini.bhandaru@intel.com>
  • Loading branch information
3 people authored Sep 19, 2024
1 parent 3a31295 commit ef90fbb
Showing 1 changed file with 3 additions and 3 deletions.
6 changes: 3 additions & 3 deletions comps/llms/text-generation/vllm/ray/README.md
Original file line number Diff line number Diff line change
@@ -1,10 +1,10 @@
# VLLM-Ray Endpoint Service

[Ray](https://docs.ray.io/en/latest/serve/index.html) is an LLM serving solution that makes it easy to deploy and manage a variety of open source LLMs, built on [Ray Serve](https://docs.ray.io/en/latest/serve/index.html), has native support for autoscaling and multi-node deployments, which is easy to use for LLM inference serving on Intel Gaudi2 accelerators. The Intel Gaudi2 accelerator supports both training and inference for deep learning models in particular for LLMs. Please visit [Habana AI products](<(https://habana.ai/products)>) for more details.
[Ray](https://docs.ray.io/en/latest/serve/index.html) is an LLM serving solution that makes it easy to deploy and manage a variety of open source LLMs. Built on [Ray Serve](https://docs.ray.io/en/latest/serve/index.html) it has native support for autoscaling and multi-node deployments, and is easy to use for LLM inference serving across multiple platforms.

[vLLM](https://github.com/vllm-project/vllm) is a fast and easy-to-use library for LLM inference and serving, it delivers state-of-the-art serving throughput with a set of advanced features such as PagedAttention, Continuous batching and etc.. Besides GPUs, vLLM already supported [Intel CPUs](https://www.intel.com/content/www/us/en/products/overview.html) and [Gaudi accelerators](https://habana.ai/products).
[vLLM](https://github.com/vllm-project/vllm) is a fast and easy-to-use library for LLM inference and serving, it delivers state-of-the-art serving throughput with a set of advanced features such as PagedAttention and Continuous Batching among others. Besides GPUs, vLLM supports [Intel CPUs](https://www.intel.com/content/www/us/en/products/overview.html) and [Intel Gaudi accelerators](https://habana.ai/products).

This guide provides an example on how to launch vLLM with Ray serve endpoint on Gaudi accelerators.
This guide provides an example on how to launch vLLM with Ray serve endpoint on [Intel Gaudi2 Accelerator](https://www.intel.com/content/www/us/en/products/details/processors/ai-accelerators/gaudi-overview.html).

## Set up environment

Expand Down

0 comments on commit ef90fbb

Please sign in to comment.